CN114356340A - Neural network compiling method and device, computer equipment and storage medium - Google Patents

Neural network compiling method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114356340A
CN114356340A CN202111652713.2A CN202111652713A CN114356340A CN 114356340 A CN114356340 A CN 114356340A CN 202111652713 A CN202111652713 A CN 202111652713A CN 114356340 A CN114356340 A CN 114356340A
Authority
CN
China
Prior art keywords
operation statement
statement
target
variable
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111652713.2A
Other languages
Chinese (zh)
Inventor
周桓
田志仲
何山
勾志宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Power Tensors Intelligent Technology Co Ltd
Original Assignee
Shanghai Power Tensors Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Power Tensors Intelligent Technology Co Ltd filed Critical Shanghai Power Tensors Intelligent Technology Co Ltd
Priority to CN202111652713.2A priority Critical patent/CN114356340A/en
Publication of CN114356340A publication Critical patent/CN114356340A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The present disclosure provides a compiling method, an apparatus, a computer device and a storage medium of a neural network, wherein the method comprises: analyzing the intermediate program to obtain an internal data structure of the intermediate program; the internal data structure comprises objects and incidence relations among the objects; the object includes: an operation statement, and a variable definition statement corresponding to the operation statement; the intermediate program is a program which is written by converting an original program of a target neural network written based on a target high-level language into a preset intermediate language by using conversion relation information between the preset intermediate language and the target high-level language; generating configuration information of the operation statement based on the internal data structure; and generating a machine instruction of the data processing chip when the data processing chip operates the target neural network based on the instruction packaging relation of the operation statement to the data processing chip and the configuration information.

Description

Neural network compiling method and device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a neural network compiling method and apparatus, a computer device, and a storage medium.
Background
An effective way for improving the reasoning performance of a neural network during accelerating a chip by Artificial Intelligence (AI), realizes common calculation in the neural network on a hardware level, and improves the reasoning process of the network. In order to reason a neural network on the AI accelerator chip, a compiler is required to compile the original neural network into instructions of the AI accelerator card chip.
However, the conventional compiling framework is mainly oriented to a general hardware accelerator, that is, more applicable to a machine instruction that can be split into simple arithmetic operations, but not to a machine instruction that directly adopts arithmetic operations such as convolution operations, and therefore, is not applicable to compiling a neural network on a data processing chip such as an AI acceleration chip.
Disclosure of Invention
The embodiment of the disclosure at least provides a compiling method and device of a neural network, computer equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a compiling method for a neural network, including: analyzing the intermediate program to obtain an internal data structure of the intermediate program; the internal data structure comprises objects and incidence relations among the objects; the object includes: an operation statement, and a variable definition statement corresponding to the operation statement; the intermediate program is a program which is written by converting an original program of a target neural network written based on a target high-level language into a preset intermediate language by using conversion relation information between the preset intermediate language and the target high-level language; generating configuration information of the operation statement based on the internal data structure; and generating a machine instruction of the data processing chip when the data processing chip operates the target neural network based on the instruction packaging relation of the operation statement to the data processing chip and the configuration information.
In an optional embodiment, the configuration information of the operation statement includes: a target storage address of a variable corresponding to the operation statement; generating configuration information of each of the operation statements based on the internal data structure, including: for a target operation statement in the operation statements, based on a variable definition statement of a variable corresponding to the target operation statement, allocating a corresponding target storage address to the variable corresponding to the target operation statement; the target storage address includes a storage address in memory and/or a storage address in cache.
In an optional implementation manner, before generating the configuration information of each operation statement based on the internal data structure, the method further includes: determining an operation statement to be deleted, which repeatedly accesses a preset storage space, from the operation statements based on the types of the operation statements and variables corresponding to the operation statements; and deleting the operation statement to be deleted from the operation statement to obtain the target operation statement.
In an alternative embodiment, the type of the operation statement includes: variable storage, and variable reading; the determining, based on the type of the operation statement and the variable corresponding to the operation statement, an operation statement to be deleted, which repeatedly accesses a preset storage space, from the operation statement includes: for each first operation statement with the type stored as a variable, determining whether a second operation statement with the type read as the variable and the variable corresponding to the first operation statement is the same variable or not based on the variable corresponding to the first operation statement; wherein the first operation statement and the second operation statement respectively belong to different adjacent network layers; and if so, determining the first operation statement and the corresponding second operation statement as the operation statement to be deleted.
In an alternative embodiment, the variables include: the input variable, the output variable and the intermediate variable which correspond to each network layer respectively; the allocating, based on the variable definition statement of the variable corresponding to the target operation statement, a corresponding target storage address to the variable corresponding to the target operation statement includes: responding to a variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an input variable, and allocating a memory and a storage address in a cache for the variable corresponding to the target operation statement; responding to a variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an output variable, and allocating a memory and a storage address in a cache for the variable corresponding to the target operation statement; and responding to the variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an intermediate variable, and allocating a storage address in a cache for the variable corresponding to the target operation statement.
In an optional implementation manner, before generating the machine instruction of the data processing chip when the data processing chip runs the target neural network based on the instruction encapsulation relationship of the operation statement to the data processing chip and the configuration information, the method further includes: adding waiting information for each operation statement according to the incidence relation between each operation statement and other operation statements aiming at each operation statement; the waiting information is used for indicating the execution sequence of each operation instruction.
In an optional implementation manner, after adding wait information to each operation statement according to an association relationship between each operation statement and another operation statement for each operation statement, the method further includes: generating first debugging information based on an operation statement and waiting information added for the operation statement; the first debugging information is used for debugging an original program of the target neural network.
In an optional implementation manner, after generating the configuration information of the operation statement based on the internal data structure, the method further includes: generating second debugging information based on the operation statement and the configuration information corresponding to the operation statement; the second debugging information is used for debugging an original program of the target neural network.
In an optional implementation manner, the compiling method further includes: responding to packaging operation of a plurality of machine instructions of a plurality of data processing chips, and generating an operation statement corresponding to the packaging operation; and establishing a conversion relation between the operation statement and the target high-level language.
In a second aspect, an embodiment of the present disclosure further provides a compiling apparatus for a neural network, including: the analysis module is used for analyzing the intermediate program to obtain an internal data structure of the intermediate program; the internal data structure comprises objects and incidence relations among the objects; the object includes: an operation statement, and a variable definition statement corresponding to the operation statement; the intermediate program is a program which is written by converting an original program of a target neural network written based on a target high-level language into a preset intermediate language by using conversion relation information between the preset intermediate language and the target high-level language; a first generation module, configured to generate configuration information of the operation statement based on the internal data structure; and the second generation module is used for generating a machine instruction of the data processing chip when the data processing chip operates the target neural network based on the instruction packaging relation of the operation statement to the data processing chip and the configuration information.
In an optional embodiment, the configuration information of the operation statement includes: a target storage address of a variable corresponding to the operation statement; the first generating module, when generating the configuration information of each operation statement based on the internal data structure, is configured to: for a target operation statement in the operation statements, based on a variable definition statement of a variable corresponding to the target operation statement, allocating a corresponding target storage address to the variable corresponding to the target operation statement; the target storage address includes a storage address in memory and/or a storage address in cache.
In an alternative embodiment, the first generating module, before generating the configuration information of each of the operation statements based on the internal data structure, is further configured to: determining an operation statement to be deleted, which repeatedly accesses a preset storage space, from the operation statements based on the types of the operation statements and variables corresponding to the operation statements; and deleting the operation statement to be deleted from the operation statement to obtain the target operation statement.
In an alternative embodiment, the type of the operation statement includes: variable storage, and variable reading; the first generation module is used for determining an operation statement to be deleted for repeatedly accessing a preset storage space from the operation statements based on the types of the operation statements and the variables corresponding to the operation statements, and is used for: for each first operation statement with the type stored as a variable, determining whether a second operation statement with the type read as the variable and the variable corresponding to the first operation statement is the same variable or not based on the variable corresponding to the first operation statement; wherein the first operation statement and the second operation statement respectively belong to different adjacent network layers; and if so, determining the first operation statement and the corresponding second operation statement as the operation statement to be deleted.
In an alternative embodiment, the variables include: the input variable, the output variable and the intermediate variable which correspond to each network layer respectively; the first generating module, when allocating a corresponding target storage address to a variable corresponding to the target operation statement based on the variable definition statement of the variable corresponding to the target operation statement, is configured to: responding to a variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an input variable, and allocating a memory and a storage address in a cache for the variable corresponding to the target operation statement; responding to a variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an output variable, and allocating a memory and a storage address in a cache for the variable corresponding to the target operation statement; and responding to the variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an intermediate variable, and allocating a storage address in a cache for the variable corresponding to the target operation statement.
In an optional embodiment, before generating the machine instruction of the data processing chip when running the target neural network based on the instruction encapsulation relationship of the operation statement to the data processing chip and the configuration information, the second generation module is further configured to: adding waiting information for each operation statement according to the incidence relation between each operation statement and other operation statements aiming at each operation statement; the waiting information is used for indicating the execution sequence of each operation instruction.
In an optional implementation manner, after adding, for each operation statement, wait information for each operation statement according to an association relationship between each operation statement and another operation statement, the second generation module is further configured to: generating first debugging information based on an operation statement and waiting information added for the operation statement; the first debugging information is used for debugging an original program of the target neural network.
In an optional embodiment, the first generating module, after generating the configuration information of the operation statement based on the internal data structure, is further configured to: generating second debugging information based on the operation statement and the configuration information corresponding to the operation statement; the second debugging information is used for debugging an original program of the target neural network.
In an optional implementation manner, the compiling apparatus further includes a processing module, configured to: responding to packaging operation of a plurality of machine instructions of a plurality of data processing chips, and generating an operation statement corresponding to the packaging operation; and establishing a conversion relation between the operation statement and the target high-level language.
In a third aspect, this disclosure also provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.
According to the compiling method of the neural network provided by the embodiment of the disclosure, the internal data structure obtained by analyzing the intermediate program can be used for further generating the configuration information of the operation statement, and then the machine instruction is generated by using the instruction packaging relation and the configuration information of the operation statement to the data processing chip. Therefore, the instruction packaging relation for packaging the instruction of the data processing chip by the preset intermediate language can be pre-established, so that the corresponding operation statement can be constructed by using the intermediate language according to the operation type which can be actually supported by the data processing chip, and in the process of converting the original program written by the target high-level language into the machine instruction, the original program is not required to be refined into basic machine instructions such as addition, subtraction, multiplication and the like, but is converted into the machine instruction corresponding to the operation type which can be supported by the data processing chip, so that the operation capability of the data processing chip is more fully utilized.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 is a flowchart illustrating a compiling method of a neural network provided by an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a compiler provided by an embodiment of the present disclosure;
FIG. 3 illustrates an example diagram for determining an operation statement to be deleted provided by an embodiment of the present disclosure;
fig. 4 is a flowchart illustrating a compiling time according to an embodiment of the disclosure;
fig. 5 is a schematic diagram illustrating a compiling apparatus of a neural network provided in an embodiment of the present disclosure;
fig. 6 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
Research shows that in the data processing chip, the AI accelerating chip is an effective way to improve the neural network reasoning performance, and in order to reason a network on the AI accelerating chip, the compiler needs to compile the original network into the instruction of the AI accelerating card chip. According to the hardware design of the AI acceleration chip, tens of register parameters need to be configured for each specific calculation, and the synchronization problem of multiple calculations during parallel execution needs to be considered. The method for directly generating machine instructions by a compiler does not have the debuggability and the expandability of a large network, so an intermediate language is needed to connect the front end and the back end of the compiler: the front end is responsible for analyzing the network and realizing operators to generate an intermediate language; the back-end re-optimizes and generates machine instructions. In a currently common compiling framework, a corresponding intermediate language of the compiling framework is mainly oriented to a general hardware accelerator, and the general hardware accelerator generally executes a data processing task by using a basic machine instruction which can calculate a corresponding instruction for bases such as addition, subtraction, multiplication and the like; with the increasing of the types of operators in the neural network, in order to adapt to the increase of the types of the operators, more and more AI acceleration chips realize that more complex operations are solidified in hardware, such as convolution operation, pooling operation and the like; the current neural network compiling method can only compile the neural network into basic machine instructions, so that the computing capability of the AI acceleration chip cannot be fully utilized, and the neural network compiling method is not suitable for compiling the neural network on the AI acceleration chip.
The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
To facilitate understanding of the present embodiment, first, a compiling method of a neural network disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the compiling method of the neural network provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the neural network compiling method may be implemented by a processor calling computer readable instructions stored in a memory.
The following describes a method for compiling a neural network provided by an embodiment of the present disclosure.
Referring to fig. 1, a flowchart of a compiling method of a neural network provided in an embodiment of the present disclosure is shown, where the method includes steps S101 to S103, where:
s101: analyzing the intermediate program to obtain an internal data structure of the intermediate program; the internal data structure comprises objects and incidence relations among the objects; the object includes: an operation statement, and a variable definition statement corresponding to the operation statement; the intermediate program is a program which is written by converting an original program of a target neural network written based on a target high-level language into a preset intermediate language by using conversion relation information between the preset intermediate language and the target high-level language;
s102: generating configuration information of the operation statement based on the internal data structure;
s103: and generating a machine instruction of the data processing chip when the data processing chip operates the target neural network based on the instruction packaging relation of the operation statement to the data processing chip and the configuration information.
According to the method and the device for generating the machine instruction, after the original program of the target neural network is converted into the intermediate program written by the preset intermediate language, the internal data structure obtained by analyzing the intermediate program can be used for further generating the configuration information of the operation statement, and then the machine instruction is generated by using the instruction packaging relation and the configuration information of the operation statement on the data processing chip. Therefore, the instruction packaging relation for packaging the instruction of the data processing chip by the preset intermediate language can be pre-established, so that the corresponding operation statement can be constructed by using the intermediate language according to the operation type which can be actually supported by the data processing chip, and in the process of converting the original program written by the target high-level language into the machine instruction, the original program is not required to be refined into basic machine instructions such as addition, subtraction, multiplication and the like, but is converted into the machine instruction corresponding to the operation type which can be supported by the data processing chip, so that the operation capability of the data processing chip is more fully utilized.
Additionally, in embodiments of the present disclosure, the data processing tasks of the target neural network may include reasoning and/or training tasks.
The following describes details of S101 to S103.
With respect to the above S101, first, the preset intermediate language, the target high-level language, the target neural network, and the original program of the target neural network included in this step will be described.
First, for the target neural network, the target neural network may be different according to the actual application scenario, for example. For example, in an application scenario of image recognition, for example, an application scenario of object recognition classification, the corresponding target neural network may include, for example, a convolutional neural network; in the context of speech recognition, for example, the recognition of audio-converted words, the corresponding target neural network may comprise, for example, a recurrent neural network.
In the case of determining the target neural network, the original program of the target neural network may also be determined accordingly. For example, before the target neural network determined in different application scenarios is deployed on a specific hardware device, by adjusting parameters and the like of the determined target neural network, the target neural network for deployment on the hardware device, which is actually an original program of the target neural network, may be obtained.
When writing an original program of the target neural network, for example, the neural network may be developed by using a target high-level language, and after training the neural network, the neural network is converted into an inference network to represent, for example, a Convolutional neural network framework (Convolutional Architecture for Fast Feature Embedding, function) or other inference networks. In particular implementations, the target high-level language may include, for example, a computer programming language, such as a C language, Python language, or the like; the original program of the target neural network described above, written in a high-level language, for example, an inference network including the above description.
For a preset Intermediate Language, the Intermediate Language may include, for example, a Groot Intermediate Language (GIL). For the intermediate language GIL, the compiler can be divided into a front end and a back end, wherein the front end is used for processing the analysis of the target neural network without paying attention to the requirements of the hardware equipment to be deployed; the back end is used for optimizing the operation statement and generating a corresponding machine instruction by using the operation statement. Illustratively, referring to fig. 2, a schematic diagram of a compiler is provided in the embodiment of the present disclosure. The intermediate language GIL divides the compiler into a compiler front end and a compiler back end, and actually inputs the target neural network to the compiler front end under the condition that the target neural network is input to the compiler; the intermediate language GIL is used for processing, and machine instructions can be output by the back end of the compiler. In this way, the target neural network may be converted by the compiler into machine instructions that may be executed by the hardware device.
In addition, the intermediate language GIL also has better debuggability, and analysis results of the front end for the target neural network can be directly analyzed and optimized during debugging, for example, deletion of operation statements and the like, which can be specifically referred to the following description and is not described herein again.
For example, for an original program of the target neural network written in the target high-level language C, the original program of the target neural network may be converted into an intermediate program written in a preset intermediate language by using a conversion relationship between the target high-level language C and the preset intermediate language GIL. In one possible scenario, since in this step the GIL actually completes the work of the compiler front end, the resulting intermediate program can be represented as a front. GIL program, where "front" represents the front end and ". GIL" represents the completion of writing using the intermediate language GIL.
After the intermediate language corresponding to the target neural network is determined, the intermediate program can be analyzed to obtain an internal data structure of the intermediate program. And the internal data structure of the intermediate program is used for representing the abstract syntax structure of the program code corresponding to the intermediate program. The internal data structure of the intermediate program may be represented by an Abstract Syntax Tree (AST), for example, or may be represented by another data structure. Taking the internal abstract syntax tree as an example, the internal abstract syntax tree is an abstract representation of the syntax structure of the source code, and the syntax structure of the programming language is represented in a tree form, that is, the syntax structure of the front. If the internal abstract syntax tree is optimized, the specific process may include a process of replacing or deleting nodes on the tree, and the processing logic of the specific replacement or deletion process is related to the syntax structure corresponding to the node, for example, when storing and reading data, the optimization may be performed by deleting the related statements, and the reflection on the internal abstract syntax tree may correspond to the deletion of the nodes corresponding to the statements. And finally generating the machine instruction based on the optimized abstract syntax tree. The principle of such optimization will be explained and explained hereinafter.
Specifically, in the case of obtaining the internal data structure, the internal data structure includes the object and the association relationship information of the object. Wherein the object includes: the variable definition statement comprises an operation statement and a variable definition statement corresponding to the operation statement. Specifically, an object is a concept of encapsulation, which is an entity under the concept of a class; the plurality of objects may belong to the same class, for example. The operation statement may include, for example, a specific operation, or a specific variable storage and variable read type operation. The variable may include, for example, specific data to be stored, and for different data, the data type and the like are also different, and information such as the corresponding data type and the like may be determined by using a variable definition statement.
For the above S102, when the internal data structure is determined, the configuration information of the operation statement may be generated accordingly.
In a specific implementation, the configuration information of the operation statement includes, for example: the target storage address of the variable corresponding to the target operation statement and a waiting message added for the operation statement.
Wherein, some variables may be generated or used in the process of executing the target operation statement; before executing the target operation statement, a target storage address needs to be pre-allocated to a variable corresponding to the target operation statement; if the variable needs to be stored during the execution of the target operation statement, the variable may be stored in the storage space indicated by the target storage address based on the target storage address pre-allocated to the variable.
The waiting message refers to a description of specific information of other operation statements that need to be executed before executing a certain operation statement. Based on the waiting message, the execution sequence of different operation statements can be defined, so that the operation statements can be successfully executed according to the logic of the program.
Because redundant statements, such as repeatedly accessed operation statements, may exist in the internal data structure, before the step of determining the configuration information of the operation statements, for example, a language to be operated that can be deleted may be determined first to obtain a target operation statement, and generate configuration information corresponding to the target operation statement. In this way, unnecessary reading and reading operations can be reduced, and the obtained configuration information is more simplified.
Specifically, the target operation statement may be determined, for example, in the following manner: determining an operation statement to be deleted, which repeatedly accesses a preset storage space, from the operation statements based on the types of the operation statements and variables corresponding to the operation statements; and deleting the operation statement to be deleted from the operation statement to obtain the target operation statement.
Wherein the types of the operation statements comprise: variable storage, and variable reading. In specific implementation, when determining an operation statement to be deleted, which repeatedly accesses and stores a preset storage space, from operation statements, for example, for each first operation statement of which the type is variable storage, determining whether a second operation statement exists, which has the same variable as the variable corresponding to the first operation statement and is variable read, based on a variable corresponding to the first operation statement; wherein the first operation statement and the second operation statement respectively belong to different adjacent network layers; and if so, determining the first operation statement and the corresponding second operation statement as the operation statement to be deleted.
Exemplarily, referring to fig. 3, an exemplary diagram for determining an operation statement to be deleted is provided in an embodiment of the present disclosure. In the example diagram, two network layers are shown at the upper position, and the corresponding operation statements corresponding to the two network layers are described at the lower position. In this example, two adjacent network layers are included: network layer L1 and network layer L2. In this process, since the hardware computing unit of the AI acceleration chip can only access the cache and cannot directly access the memory, the corresponding output data for the network layer L1 is the input data of the network layer L2, and the data transmitted from the network layer L1 to the network layer L2 is, for example, the variable a. In the network layer L1, the variable a is read from, for example, a cache memory and calculated, and since the value corresponding to the variable a may change, the calculated variable a is represented as a'. After a' is obtained, it may be stored in memory. In the network layer L2, a' is read from memory and stored in cache for computation. That is, the network layer L1 and the network layer L2 can be operated, for example, by a flow indicated by solid arrows, specifically when calculating the variable a. For convenience of description, the corresponding operation statement label F1 is stored for the variable, and the corresponding operation statement label F2 is read for the variable.
For the internal data structure obtained by the analysis of the intermediate program, if a sufficient amount of data can be stored in the cache, the data can be directly read from the cache by the next network layer without storing the data in the memory, and the calculation for a' is performed. That is, according to the above example, for an operation statement whose type is variable storage, for example, F1, if the operation statement F1 is taken as a first operation statement, it can be determined whether there is a second operation statement whose corresponding variable is the same variable a and whose type is variable reading, that is, operation statement F2, based on a variable corresponding to the first operation statement, that is, a. Further, since the operation statement F1 and the operation statement F2 belong to two network layers, respectively, the first operation statement F1 and the second operation statement F2 that are determined can be both regarded as operation statements to be deleted. In fig. 3, in the form of a dashed box, boxes select to-be-deleted operation sentences F1 and F2 that can be deleted.
In the above example, following the logic before the delete operation: when the data is put into the cache, the data is stored in the memory again; the step of reading from the memory to the cache and then reading from the cache for processing may be omitted. The deleted statement is the relevant statement that stores data from cache to memory and rereads it from memory to cache.
In the case that the operation statement to be deleted is determined, the operation statement to be deleted may be deleted from the operation statement to obtain the target operation statement.
In a possible case, in the case of determining the target operation statement, a corresponding target storage address may also be allocated to a variable corresponding to the target operation statement. Specifically, for example, the following manner may be adopted: for a target operation statement in the operation statements, based on a variable definition statement of a variable corresponding to the target operation statement, allocating a corresponding target storage address to the variable corresponding to the target operation statement; the target storage address includes a storage address in memory and/or a storage address in cache. Wherein the configuration information of the operation statement comprises: and the target storage address of the variable corresponding to the operation statement.
In a specific implementation, for the target operation statement described above, a variable definition statement corresponding to a variable is used to determine an appropriate target memory address for different variables. Specifically, the variable definition statement for the variable corresponding to the target operation statement indicates that the variable corresponding to the target operation statement is an input variable, and a memory and a storage address in a cache are allocated to the variable corresponding to the target operation statement. And indicating the variable corresponding to the target operation statement as an output variable by the variable definition statement aiming at the variable corresponding to the target operation statement, and allocating a memory and a storage address in a cache for the variable corresponding to the target operation statement.
For example, the input data and the output data of each network layer are more suitable to be stored by using a memory because of a large number, and the intermediate data and the like generated in the calculation in each network layer can be stored by using a cache in order to ensure the efficiency of the calculation.
In a possible case, since the input data and/or the output data may also participate in the operation of the internal reference data, in this case, for example, a storage address in the cache may also be allocated to the input data and/or the output data, since the statement that allocates the storage address to the input data and/or the output data is not a statement that can be deleted, and is also included in the storing and reading operation in fig. 3, the efficiency is not affected by the statement that allocates the address, but the efficiency of the calculation may be improved by storing and reading the data in the cache.
That is, the variable definition statement for the variable corresponding to the target operation statement indicates that the variable corresponding to the target operation statement is an intermediate variable, and a storage address in a cache is allocated to the variable corresponding to the target operation statement.
In another embodiment of the present disclosure, for each operation statement, for example, wait information may be further added to each operation statement according to an association relationship between each operation statement and another operation statement; the waiting information is used for indicating the execution sequence of each operation instruction.
For a plurality of operation statements, in the prior art, a manual computation waiting technique is mainly adopted to determine the execution order of each operation statement. That is, the execution order corresponding to each operation statement is associated with each of the operation statements. Thus, if there is addition or deletion of operation statements, it is necessary to readjust the order of each operation statement, which is very inconvenient for the case of many operation statements.
In the embodiment of the present disclosure, association relationship information between operation statements is used, so that the association between one operation statement and multiple operation statements can be reduced, and only the association exists between the operation statement and the operation statement having association relationship information with the operation instruction. Therefore, when the operation statements are added or deleted, a plurality of operation statements can not be adjusted correspondingly due to the change of one operation statement, and the execution sequence error caused by rearranging the operation statements can be effectively avoided while the flexibility is improved.
For example, when adding the wait information, the wait information may indicate, for example, an execution order of two adjacent operation statements at the time of execution. For example, for the operation instruction P1 and the operation instruction P2, if the operation instruction P1 and the operation instruction P2 are executed sequentially, the waiting information W1 may be added to the operation instruction P1, and the waiting information W1 may indicate that the operation instruction P2 is executed continuously after the operation instruction P1 is completed, for example.
In addition, for each operation statement, after adding wait information for each operation statement according to the association relationship between each operation statement and other operation statements, for example, first debug information may be generated based on the operation statement and the wait information added for the operation statement; the first debugging information is used for debugging an original program of the target neural network.
Here, the principle of debugging the original program is similar to the debugging process of the program by program debugging software, and the original program is debugged again after waiting information is added to the operation statement. In principle, by adding the waiting information, errors of a plurality of operation statements in the execution sequence can be avoided, but verification is still required in a debugging mode to determine whether errors occur when the original program is executed according to the sequence after the waiting information is added, that is, whether errors occur during manual adding, so that the waiting information errors are caused.
Specifically, since when the wait information is added, an error in the added wait information may occur due to an erroneous operation at the time of manual addition. For example, when the wait information is determined for each operation statement according to the association between each operation statement and the other operation statements, the wait information corresponding to the other operation statements that do not have an association is erroneously set for a part of the operation statements. Therefore, with the operation statement and the wait information added for the operation statement, the first debug information can be generated. The first debug information may include, for example: and debugging the original program by a worker according to the first debugging information and the execution logic required by the program by using the addition principle of the waiting information described by the characters or the characters. The first debugging information can be used for assisting in judging whether the waiting information added by the error explained above exists or not so as to debug the original program of the target neural network, thereby reducing the occurrence of the error.
In this way, the configuration information of the operation statement can be generated based on the internal data structure.
In another embodiment of the present disclosure, after the configuration information of the operation statement is generated, second debugging information may be further generated based on the operation statement and the configuration information corresponding to the operation statement; the second debugging information is used for debugging an original program of the target neural network.
Here, similarly to the above-mentioned generation of the first debug information, with the obtained second debug information, the obtained configuration information may be further checked accordingly, so as to reduce the occurrence of errors. The second debugging information is used for debugging the original program by checking the configuration information when debugging the original program. The first debugging information is used for judging whether waiting information added by errors exists or not so as to debug the original program.
For the above S103, in the case of determining the configuration information, the instruction encapsulation relationship of the artificial intelligent data processing chip by using the operation statement and the configuration information may be used to generate the machine instruction when the data processing chip executes the inference task of the target neural network. Here, the data processing chip specifically includes an AI acceleration chip, and the configuration information has a specific influence on the operation statement, for example, a variable corresponding to the target operation statement is allocated to a sensed target storage address, and the like, which may be specifically referred to the description in S102. The operation statement may be used to generate a machine instruction, and specifically, refer to the following description, so that the configuration information may also affect the finally generated machine instruction by affecting the operation statement.
When the machine instruction is generated by using the configuration information, if the configuration information comprises: and the target storage address of the variable is carried in the generated machine instruction and is used for acquiring the variable from the target storage address when the AI acceleration chip executes the corresponding machine instruction.
If the configuration information includes: and waiting for the message, wherein the generated machine instruction can carry a corresponding waiting message to indicate the execution sequence of the AI acceleration chip to different machine instructions.
In addition, the wait message may also be used to determine the location of individual machine instructions in an instruction stream composed of different machine instructions. The sequence of the individual machine instructions in the instruction stream characterizes the execution sequence of the machine instructions.
When the configuration information of the operation statement is generated based on the internal data structure, the generation of the target memory address and the generation of the wait message are also performed based on the internal data structure. After the configuration information is generated, the corresponding target storage address and the waiting message are added to the corresponding operation statement on the basis of the internal data structure to form a final internal data structure, and then the machine instruction is generated by using the final internal data structure.
In a specific implementation, in the case of determining the configuration information, an instruction encapsulation relationship of an operation statement to the AI acceleration chip may be used to generate a machine instruction for each operation statement in sequence, where the generated machine instruction is also a machine instruction when the AI acceleration chip is actually deployed on a hardware device to execute an inference task of a target neural network. Because the machine instructions executable by the acceleration chip are different under the condition that the selected AI acceleration chips are different, the instruction packaging relation of the operation statement to the artificial intelligence AI acceleration chip is different for different AI acceleration chips.
Here, the instruction encapsulation relationship refers to a conversion relationship between an operation statement and a chip instruction of the AI accelerator chip. The instruction packaging relationship of the operation statement to the AI acceleration chip can be used for converting the operation statement into an instruction of the AI acceleration chip correspondingly.
In another embodiment of the present disclosure, for example, an operation statement corresponding to a packaging operation may also be generated in response to the packaging operation on a plurality of machine instructions of the multi-AI acceleration chip; and establishing a conversion relation between the operation statement and the target high-level language.
For example, when compiling the target neural network, for example, a different high-level language may be used, so that the target neural network has a correspondence relationship with the high-level language. For the AI acceleration chip, there is also a correspondence between the packaging operations of multiple machine instructions and the corresponding operation statements. In addition, when the target neural network is deployed on the AI acceleration chip, the target neural network also has a corresponding deployment relationship when the AI acceleration chip is deployed on the target neural network, for example, what data type of the target neural network can be deployed on the AI acceleration chip. In this way, by using the correspondence relationship described above, the conversion relationship between the operation statement and the target high-level language can be determined, so that when a new target neural network to be deployed is determined by using the target high-level language, the corresponding operation statement is determined by directly using the conversion relationship, which is more efficient.
In another embodiment of the present disclosure, a specific embodiment of compiling a neural network is further provided, and in an example, the data processing chip specifically includes an AI acceleration chip. Referring to fig. 4, a flowchart corresponding to a specific embodiment of compiling according to the embodiment of the present disclosure is shown; wherein.
S401: determining a target neural network and an AI acceleration chip;
in this embodiment, the target neural network may comprise, for example, a convolutional neural network; writing a target high-level language of the convolutional neural network into a C language; the AI acceleration chip may be selected according to actual requirements, and is not limited herein.
S402: determining an intermediate program corresponding to the target neural network;
in this embodiment, by transforming the original program written in the C language into the intermediate language GIL, an intermediate program corresponding to the target neural network, which is denoted as CNN _ front.
S403: analyzing the intermediate program to obtain an internal data structure of the intermediate program;
in this embodiment, the internal data structure of the intermediary program is represented using an internal abstract syntax tree.
S404: determining an operation statement to be deleted which can be deleted in the internal data structure, and deleting the operation statement to be deleted from the operation statement in the internal data structure to obtain a target operation statement;
in this embodiment, for example, the operation statement to be deleted may be determined by the type of the operation statement in the internal data structure and the corresponding variable, and the deletion processing of the operation statement to be deleted is correspondingly performed. In this way, the obtained target operation statement is more simplified than the initially obtained operation statement in the internal data structure, and the data volume in the subsequent operation can be reduced.
S405: distributing corresponding target storage addresses for variables corresponding to the target operation statements;
in this embodiment, the target memory addresses that can be allocated for variables include: allocating memory and storage addresses in a high-speed cache for input variables; allocating memory and storage addresses in a high-speed cache for the output variables; and allocating a storage address in the cache for the intermediate variable.
S406: adding waiting information for each operation statement;
in this embodiment, wait information may be added to each operation statement by using an association relationship between each operation statement and other operation statements.
In addition, regarding the above steps S404 to S406, only one possible order is provided in the present embodiment, and specifically, the deletion and the adjustment of the step order may be performed according to actual situations, which is not limited herein. In addition, in the above steps S404 to S406, after each step is finished, corresponding debugging information may be generated to further confirm whether the operation step is accurate.
S407: and taking the target storage address and the waiting message as configuration information of an operation statement, and generating a machine instruction of the AI acceleration chip when the target neural network is operated by using the configuration information.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, the embodiment of the present disclosure further provides a neural network compiling apparatus corresponding to the neural network compiling method, and since the principle of the apparatus in the embodiment of the present disclosure for solving the problem is similar to the neural network compiling method in the embodiment of the present disclosure, the apparatus implementation may refer to the method implementation, and repeated details are not repeated.
Referring to fig. 5, a schematic diagram of a compiling apparatus of a neural network provided in an embodiment of the present disclosure is shown, where the apparatus includes: an analysis module 51, a first generation module 52, and a second generation module 53; wherein the content of the first and second substances,
the analysis module 51 is configured to analyze the intermediate program to obtain an internal data structure of the intermediate program; the internal data structure comprises objects and incidence relations among the objects; the object includes: an operation statement, and a variable definition statement corresponding to the operation statement; the intermediate program is a program which is written by converting an original program of a target neural network written based on a target high-level language into a preset intermediate language by using conversion relation information between the preset intermediate language and the target high-level language;
a first generating module 52, configured to generate configuration information of the operation statement based on the internal data structure;
and a second generating module 53, configured to generate a machine instruction of the data processing chip when the target neural network is operated, based on the instruction encapsulation relationship of the operation statement to the artificial intelligence AI acceleration chip and the configuration information.
In an optional embodiment, the configuration information of the operation statement includes: a target storage address of a variable corresponding to the operation statement; the first generating module 52, when generating the configuration information of each operation statement based on the internal data structure, is configured to: for a target operation statement in the operation statements, based on a variable definition statement of a variable corresponding to the target operation statement, allocating a corresponding target storage address to the variable corresponding to the target operation statement; the target storage address includes a storage address in memory and/or a storage address in cache.
In an alternative embodiment, before generating the configuration information of each operation statement based on the internal data structure, the first generating module 52 is further configured to: determining an operation statement to be deleted, which repeatedly accesses a preset storage space, from the operation statements based on the types of the operation statements and variables corresponding to the operation statements; and deleting the operation statement to be deleted from the operation statement to obtain the target operation statement.
In an alternative embodiment, the type of the operation statement includes: variable storage, and variable reading; the first generating module 52 is configured to, when determining an operation statement to be deleted, which repeatedly accesses a preset storage space, from the operation statements based on the types of the operation statements and the variables corresponding to the operation statements, to: for each first operation statement with the type stored as a variable, determining whether a second operation statement with the type read as the variable and the variable corresponding to the first operation statement is the same variable or not based on the variable corresponding to the first operation statement; wherein the first operation statement and the second operation statement respectively belong to different adjacent network layers; and if so, determining the first operation statement and the corresponding second operation statement as the operation statement to be deleted.
In an alternative embodiment, the variables include: the input variable, the output variable and the intermediate variable which correspond to each network layer respectively; when allocating a corresponding target storage address to a variable corresponding to the target operation statement based on the variable definition statement of the variable corresponding to the target operation statement, the first generating module 52 is configured to: responding to a variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an input variable, and allocating a memory and a storage address in a cache for the variable corresponding to the target operation statement; responding to a variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an output variable, and allocating a memory and a storage address in a cache for the variable corresponding to the target operation statement; and responding to the variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an intermediate variable, and allocating a storage address in a cache for the variable corresponding to the target operation statement.
In an optional embodiment, before generating the machine instruction of the data processing chip in running the target neural network based on the instruction encapsulation relationship of the operation statement to the artificial intelligence AI acceleration chip and the configuration information, the second generation module 53 is further configured to: adding waiting information for each operation statement according to the incidence relation between each operation statement and other operation statements aiming at each operation statement; the waiting information is used for indicating the execution sequence of each operation instruction.
In an optional implementation manner, after adding, for each operation statement, wait information for each operation statement according to an association relationship between each operation statement and other operation statements, the second generating module 53 is further configured to: generating first debugging information based on an operation statement and waiting information added for the operation statement; the first debugging information is used for debugging an original program of the target neural network.
In an optional embodiment, after generating the configuration information of the operation statement based on the internal data structure, the first generating module 52 is further configured to: generating second debugging information based on the operation statement and the configuration information corresponding to the operation statement; the second debugging information is used for debugging an original program of the target neural network.
In an optional embodiment, the compiling apparatus further includes a processing module 54 configured to: responding to packaging operation of a plurality of machine instructions of a plurality of AI acceleration chips, and generating an operation statement corresponding to the packaging operation; and establishing a conversion relation between the operation statement and the target high-level language.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
An embodiment of the present disclosure further provides a computer device, as shown in fig. 6, which is a schematic structural diagram of the computer device provided in the embodiment of the present disclosure, and the computer device includes:
a processor 10 and a memory 20; the memory 20 stores machine-readable instructions executable by the processor 10, the processor 10 being configured to execute the machine-readable instructions stored in the memory 20, the processor 10 performing the following steps when the machine-readable instructions are executed by the processor 10:
analyzing the intermediate program to obtain an internal data structure of the intermediate program; the internal data structure comprises objects and incidence relations among the objects; the object includes: an operation statement, and a variable definition statement corresponding to the operation statement; the intermediate program is a program which is written by converting an original program of a target neural network written based on a target high-level language into a preset intermediate language by using conversion relation information between the preset intermediate language and the target high-level language; generating configuration information of the operation statement based on the internal data structure; and generating a machine instruction of the data processing chip when the data processing chip operates the target neural network based on the instruction packaging relation of the operation statement to the data processing chip and the configuration information.
The storage 20 includes a memory 210 and an external storage 220; the memory 210 is also referred to as an internal memory, and temporarily stores operation data in the processor 10 and data exchanged with the external memory 220 such as a hard disk, and the processor 10 exchanges data with the external memory 220 through the memory 210.
For the specific execution process of the instruction, reference may be made to the steps of the neural network compiling method described in the embodiments of the present disclosure, and details are not described here.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the neural network compiling method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the neural network compiling method in the foregoing method embodiments, which may be referred to specifically for the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (12)

1. A compiling method of a neural network, comprising:
analyzing the intermediate program to obtain an internal data structure of the intermediate program; the internal data structure comprises objects and incidence relations among the objects; the object includes: an operation statement, and a variable definition statement corresponding to the operation statement;
the intermediate program is a program which is written by converting an original program of a target neural network written based on a target high-level language into a preset intermediate language by using conversion relation information between the preset intermediate language and the target high-level language;
generating configuration information of the operation statement based on the internal data structure;
and generating a machine instruction of the data processing chip when the data processing chip operates the target neural network based on the instruction packaging relation of the operation statement to the data processing chip and the configuration information.
2. The compiling method according to claim 1 wherein the configuration information of the operation statement comprises: a target storage address of a variable corresponding to the operation statement;
generating configuration information of each of the operation statements based on the internal data structure, including:
for a target operation statement in the operation statements, based on a variable definition statement of a variable corresponding to the target operation statement, allocating a corresponding target storage address to the variable corresponding to the target operation statement; the target storage address includes a storage address in memory and/or a storage address in cache.
3. The compiling method according to claim 2, wherein before generating the configuration information of each of the operation statements based on the internal data structure, further comprising:
determining an operation statement to be deleted, which repeatedly accesses a preset storage space, from the operation statements based on the types of the operation statements and variables corresponding to the operation statements;
and deleting the operation statement to be deleted from the operation statement to obtain the target operation statement.
4. The compiling method according to claim 3 wherein the type of the operation statement comprises: variable storage, and variable reading;
the determining, based on the type of the operation statement and the variable corresponding to the operation statement, an operation statement to be deleted, which repeatedly accesses a preset storage space, from the operation statement includes:
for each first operation statement with the type stored as a variable, determining whether a second operation statement with the type read as the variable and the variable corresponding to the first operation statement is the same variable or not based on the variable corresponding to the first operation statement; wherein the first operation statement and the second operation statement respectively belong to different adjacent network layers;
and if so, determining the first operation statement and the corresponding second operation statement as the operation statement to be deleted.
5. The compilation method according to any one of claims 2 to 4, wherein the variables include: the input variable, the output variable and the intermediate variable which correspond to each network layer respectively;
the allocating, based on the variable definition statement of the variable corresponding to the target operation statement, a corresponding target storage address to the variable corresponding to the target operation statement includes:
responding to a variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an input variable, and allocating a memory and a storage address in a cache for the variable corresponding to the target operation statement;
responding to a variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an output variable, and allocating a memory and a storage address in a cache for the variable corresponding to the target operation statement;
and responding to the variable definition statement of the variable corresponding to the target operation statement to indicate that the variable corresponding to the target operation statement is an intermediate variable, and allocating a storage address in a cache for the variable corresponding to the target operation statement.
6. The compiling method according to any one of claims 1 to 5, wherein, before generating the machine instruction of the data processing chip when running the target neural network based on the instruction packing relationship of the operation statement to the data processing chip and the configuration information, further comprising:
adding waiting information for each operation statement according to the incidence relation between each operation statement and other operation statements aiming at each operation statement; the waiting information is used for indicating the execution sequence of each operation instruction.
7. The compiling method according to claim 6, wherein after adding wait information to each operation statement according to an association relationship between each operation statement and another operation statement, for each operation statement, further comprising:
generating first debugging information based on an operation statement and waiting information added for the operation statement;
the first debugging information is used for debugging an original program of the target neural network.
8. The compiling method according to any one of claims 1 to 7, wherein after generating the configuration information of the operation statement based on the internal data structure, further comprising:
generating second debugging information based on the operation statement and the configuration information corresponding to the operation statement;
the second debugging information is used for debugging an original program of the target neural network.
9. The compiling method of any one of claims 1-8 further comprising: responding to packaging operation of a plurality of machine instructions of a plurality of data processing chips, and generating an operation statement corresponding to the packaging operation;
and establishing a conversion relation between the operation statement and the target high-level language.
10. An apparatus for compiling a neural network, comprising:
the analysis module is used for analyzing the intermediate program to obtain an internal data structure of the intermediate program; the internal data structure comprises objects and incidence relations among the objects; the object includes: an operation statement, and a variable definition statement corresponding to the operation statement; the intermediate program is a program which is written by converting an original program of a target neural network written based on a target high-level language into a preset intermediate language by using conversion relation information between the preset intermediate language and the target high-level language;
a first generation module, configured to generate configuration information of the operation statement based on the internal data structure;
and the second generation module is used for generating a machine instruction of the data processing chip when the data processing chip operates the target neural network based on the instruction packaging relation of the operation statement to the data processing chip and the configuration information.
11. A computer device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor for executing the machine-readable instructions stored in the memory, the machine-readable instructions, when executed by the processor, the processor performing the steps of the method of compiling a neural network of any one of claims 1 to 9.
12. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when executed by a computer device, performs the steps of the method of compiling a neural network according to any one of claims 1 to 9.
CN202111652713.2A 2021-12-30 2021-12-30 Neural network compiling method and device, computer equipment and storage medium Pending CN114356340A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111652713.2A CN114356340A (en) 2021-12-30 2021-12-30 Neural network compiling method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111652713.2A CN114356340A (en) 2021-12-30 2021-12-30 Neural network compiling method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114356340A true CN114356340A (en) 2022-04-15

Family

ID=81104332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111652713.2A Pending CN114356340A (en) 2021-12-30 2021-12-30 Neural network compiling method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114356340A (en)

Similar Documents

Publication Publication Date Title
US10409560B1 (en) Acceleration techniques for graph analysis programs
Wahib et al. Scalable kernel fusion for memory-bound GPU applications
US10394694B2 (en) Unexplored branch search in hybrid fuzz testing of software binaries
US8997065B2 (en) Automatic modularization of source code
CN112529175B (en) Compiling method and system of neural network, computer storage medium and compiling device
CN111104120B (en) Neural network compiling method and system and corresponding heterogeneous computing platform
US9823911B2 (en) Method and apparatus for compiling code based on a dependency tree
US11526433B2 (en) Data structure allocation into storage class memory during compilation
US20150186165A1 (en) Emulating pointers
CN112947933A (en) Operator execution method and device, computer equipment and storage medium
CN110598855A (en) Deep learning model generation method, device, equipment and storage medium
JP2017174418A (en) Data structure abstraction for model checking
US20230004365A1 (en) Multistage compiler architecture
US10268461B2 (en) Global data flow optimization for machine learning programs
CN105447285A (en) Method for improving OpenCL hardware execution efficiency
CN116228515B (en) Hardware acceleration system, method and related device
CN113705798A (en) Processing unit, computing device and computation graph optimization method of deep learning model
CN114356340A (en) Neural network compiling method and device, computer equipment and storage medium
US11573777B2 (en) Method and apparatus for enabling autonomous acceleration of dataflow AI applications
CN108205596B (en) Method for realizing simulation function of serious accident analysis and calculation program of nuclear power plant
CN112015426B (en) Code management method, device and equipment
CN112232003B (en) Method for simulating design, electronic device and storage medium
CN114443042A (en) Service arrangement execution method based on rule engine and related equipment
CN117075912B (en) Method for program language conversion, compiling method and related equipment
CN113031952A (en) Method and device for determining execution code of deep learning model and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination