CN113011585B - Compiling optimization method, system, equipment and storage medium for eliminating splicing operator - Google Patents

Compiling optimization method, system, equipment and storage medium for eliminating splicing operator Download PDF

Info

Publication number
CN113011585B
CN113011585B CN202110295853.2A CN202110295853A CN113011585B CN 113011585 B CN113011585 B CN 113011585B CN 202110295853 A CN202110295853 A CN 202110295853A CN 113011585 B CN113011585 B CN 113011585B
Authority
CN
China
Prior art keywords
operator
splicing
array
address information
splicing operator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110295853.2A
Other languages
Chinese (zh)
Other versions
CN113011585A (en
Inventor
谭黎敏
田承雷
宋捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xijing Technology Co ltd
Original Assignee
Shanghai Xijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xijing Technology Co ltd filed Critical Shanghai Xijing Technology Co ltd
Priority to CN202110295853.2A priority Critical patent/CN113011585B/en
Publication of CN113011585A publication Critical patent/CN113011585A/en
Application granted granted Critical
Publication of CN113011585B publication Critical patent/CN113011585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Abstract

The invention provides a compiling optimization method, a compiling optimization system, compiling optimization equipment and a storage medium for eliminating a splicing operator, wherein the compiling optimization method comprises the following steps: searching a splicing operator to be eliminated in the neural network model; acquiring address information of an output array of the splicing operator; acquiring address information of an input array of the splicing operator; updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined; and deleting the splicing operator in the neural network model. According to the invention, the splicing operator in the neural network model is eliminated through compiling, the model size is optimized, the running time of the neural network model is not limited by the execution time of the splicing operator, and the reasoning speed of the neural network model is accelerated.

Description

Compiling optimization method, system, equipment and storage medium for eliminating splicing operator
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a compiling optimization method, system, device, and storage medium for eliminating a splicing operator.
Background
The convolutional neural network (Convolutional Neural Network, CNN) is a feed-forward neural network whose artificial neurons can respond to surrounding cells in a part of the coverage area with excellent performance for large image processing. It includes a convolution layer (convolutional layer) and a pooling layer (pooling layer). Convolutional neural networks have been widely used for image classification, object recognition, and object tracking.
Because of the reasoning of convolutional neural networks, huge computation is needed, and special AI (Artificial Intelligence ) processing chips are generated. Usually the model needs to be transformed, optimized before it can run on a dedicated chip, a process also called AI compiler. The optimization section focuses on reducing the model size and reducing the run time. The optimization mainly comprises the following steps:
1. operator optimization
2. Graph optimization
3. Model compression
One important means in computational graph optimization is operator fusion, and the purpose of reducing the operand and the memory access is achieved by combining operators.
Operator fusion is based on observation of deep learning topology patterns. Deep learning operators can be divided into two categories:
computationally intensive operators, such as convolution, full join, etc., i.e., are computationally intensive at runtime.
Memory intensive operators, such as ReLU, splice, etc., that is, memory needs to be accessed frequently at runtime.
In a typical deep learning model, generally computation-intensive and memory-intensive operators are concomitantly present, such as "conv+relu" concomitantly present. Taking GPU (Graphics Processing Unit, graphics processor) as an example, we can combine the operators into one composite operator, and after GPU has performed Conv, perform Relu in the video memory, so as to reduce the interaction with the main memory.
A stitching operator (concat) is a common operator in neural networks that links multiple tensors (tensors) of an input on a specified axis. This operator belongs to an access-intensive operator, which mainly consumes memory access time. On each hardware platform, the execution time is proportional to the memory bandwidth as long as the splicing operation is performed.
Aiming at the memory intensive operator, the AI compiler can reduce the memory access by fusing adjacent operators. However, the stitching operator is generally used to fuse features of different layers, and usually, two input arrays are far apart in the computational graph and the actual memory, which is not satisfied. Therefore, conventional memory intensive operator fusion methods cannot be used for stitching operator fusion.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a compiling and optimizing method, a system, equipment and a storage medium for eliminating a splicing operator, which are used for accelerating the reasoning speed of a neural network model by compiling and eliminating the splicing operator in the neural network model.
The embodiment of the invention provides a compiling optimization method for eliminating a splicing operator, which comprises the following steps:
s100: searching a splicing operator to be eliminated in the neural network model;
s200: acquiring address information of an output array of the splicing operator;
s300: acquiring address information of an input array of the splicing operator;
s400: updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
s500: and deleting the splicing operator in the neural network model.
In some embodiments, the address information of the output array of the splice operator includes a start address of the output array, and the address information of the input array of the splice operator includes the start address of the input array and an array length;
In the step S400, updating the address information of the input array of the splicing operator includes the following steps:
and taking the starting address of the output array of the splicing operator as the starting address of the first input array, wherein the starting address of each input array is equal to the sum of the starting address of the previous input array and the array length of the previous input array except for the first input array.
In some embodiments, before the step S400 uses the start address of the output array of the splicing operator as the start address of the first input array, the method further includes the following steps:
and sequencing the input arrays according to the splicing sequence of the splicing operators to the input arrays.
In some embodiments, the step S100: searching a splicing operator to be eliminated in a neural network model, comprising the following steps:
traversing an operator list of the neural network model, and searching for unequalized splicing operators;
and taking the searched splicing operator as the splicing operator to be eliminated.
In some embodiments, the step S200: the method for obtaining the address information of the output array of the splicing operator comprises the following steps: acquiring the DDR offset address of the output array of the splicing operator according to the operator parameters of the neural network model;
The step S300: the method for obtaining the address information of the input array of the splicing operator comprises the following steps: and acquiring the DDR offset address and the array length of the input array of the splicing operator according to the operator parameters of the neural network model.
In some embodiments, the step S400: updating the address information of the input array of the splicing operator, comprising the following steps:
acquiring a splicing sequence of the splicing operator to an input array according to operator parameters of the neural network model, and sequencing the input array according to the splicing sequence;
and sequentially updating the DDR offset addresses of the input arrays according to the ordering sequence of the input arrays, so that the DDR offset addresses of the input arrays of the splicing operator are combined and correspond to the DDR offset addresses of the output arrays of the splicing operator.
In some embodiments, the step of sequentially updating the DDR offset addresses of the input arrays according to the sorting order of the input arrays includes the steps of:
for the first input array, updating the DDR offset address of the input array into the DDR offset address of the output array of the splicing operator;
for a subsequent input array other than the first input array, the DDR offset address of the input array is updated to the DDR offset address of the previous input array plus the array length of the previous input array.
In some embodiments, the step S500: after deleting the splicing operator in the neural network model, the method further comprises the following steps:
traversing an operator list of the neural network model, and judging whether an undeleted splicing operator exists or not;
if yes, selecting the splicing operator which is not eliminated as the splicing operator to be eliminated, and then continuing to step S200;
if not, judging whether other compiling optimization tasks exist, if so, executing the other compiling optimization tasks, and if not, compiling the neural network model to obtain an executable file which can be run by the chip.
The embodiment of the invention also provides a compiling and optimizing system for eliminating the splicing operator, which is used for realizing the compiling and optimizing method for eliminating the splicing operator, and comprises the following steps:
the splicing operator searching module is used for searching splicing operators to be eliminated in the neural network model;
the address information acquisition module is used for acquiring the address information of the output array of the splicing operator and acquiring the address information of the input array of the splicing operator;
the address information updating module is used for updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
And the splicing operator deleting module is used for deleting the splicing operator in the neural network model.
In some embodiments, the address information of the output array of the splice operator includes a start address of the output array, and the address information of the input array of the splice operator includes the start address of the input array and an array length;
the address information updating module updates the address information of the input array of the splicing operator by adopting the following steps:
and taking the starting address of the output array of the splicing operator as the starting address of the first input array, wherein the starting address of each input array is equal to the sum of the starting address of the previous input array and the array length of the previous input array except for the first input array.
In some embodiments, the method further comprises a network algorithm compiling module, and the splicing operator searching module is used for searching splicing operators to be eliminated in the neural network model by adopting the following steps:
traversing an operator list of the neural network model, and searching whether an undeleted splicing operator exists;
if yes, taking the searched splicing operator as the splicing operator to be eliminated;
if not, the network algorithm compiling module judges whether other compiling optimization tasks exist, if so, the network algorithm compiling module executes the other compiling optimization tasks, and if not, the network algorithm compiling module compiles the neural network model to obtain an executable file which can be run by a chip.
In some embodiments, the address information obtaining module is configured to obtain, according to operator parameters of the neural network model, a DDR offset address of an output array of the splicing operator, and obtain, according to operator parameters of the neural network model, a DDR offset address and an array length of an input array of the splicing operator;
the address information updating module is used for updating the address information of the input array of the splicing operator by adopting the following steps:
acquiring a splicing sequence of the splicing operator to an input array according to operator parameters of the neural network model, and sequencing the input array according to the splicing sequence;
for the first input array, updating the DDR offset address of the input array into the DDR offset address of the output array of the splicing operator;
for a subsequent input array other than the first input array, the DDR offset address of the input array is updated to the DDR offset address of the previous input array plus the array length of the previous input array.
The embodiment of the invention also provides compiling and optimizing equipment for eliminating the splicing operator, which comprises the following steps:
a processor;
a memory having stored therein executable instructions of the processor;
Wherein the processor is configured to perform the steps of the compile optimization method of the eliminate splice operator via execution of the executable instructions.
The embodiment of the invention also provides a computer readable storage medium for storing a program, which when being executed by a processor, realizes the steps of the compiling optimization method for eliminating the splicing operator.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
The compiling optimization method, system, equipment and storage medium for eliminating the splicing operator have the following beneficial effects:
according to the invention, the input and input address information is updated according to the address information of the output array of the splicing operator in compiling, so that the address information of the input array of the splicing operator is combined and then corresponds to the address information of the output array of the splicing operator, and the splicing function is realized through updating the address information without setting the splicing operator alone, thereby eliminating the splicing operator in the neural network model through compiling, optimizing the model size, ensuring that the running time of the neural network model is not limited by the execution time of the splicing operator, and accelerating the reasoning speed of the neural network model.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings.
FIG. 1 is a flow chart of a compilation optimization method of eliminating a splice operator according to an embodiment of the present invention;
FIG. 2 is a functional schematic of a splice operator according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a portion of a model prior to elimination of a stitching operator according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a portion of a model after elimination of a stitching operator according to an embodiment of the present invention;
FIG. 5 is a flow chart of updating address information of an input array according to an embodiment of the present invention;
FIG. 6 is a flow chart of a loop cancellation stitching operator of an embodiment of the present invention;
FIG. 7 is a schematic diagram of a compilation optimization system that eliminates stitching operators, in accordance with an embodiment of the present invention;
FIG. 8 is a schematic diagram of a compiling optimization device for eliminating a splicing operator according to an embodiment of the present invention;
fig. 9 is a schematic structural view of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only and not necessarily all steps are included. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
As shown in fig. 1, an embodiment of the present invention provides a compiling optimization method for eliminating a splicing operator, including the following steps:
s100: searching a splicing operator to be eliminated in the neural network model;
s200: acquiring address information of an output array of the splicing operator;
s300: acquiring address information of an input array of the splicing operator;
S400: updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
s500: and deleting the splicing operator in the neural network model.
According to the compiling optimization method for eliminating the splicing operator, firstly, the splicing operator to be eliminated is searched through the step S100, then the address information of the output array and the address information of the input array are respectively obtained through the steps S200 and S300, and the input and input address information is updated according to the address information of the output array of the splicing operator in compiling through the step S400, so that the address information of the input array of the splicing operator is combined and corresponds to the address information of the output array of the splicing operator, the splicing function is realized through updating the address information, the splicing operator is not required to be set independently, and therefore the required splicing function can still be realized after the splicing operator is deleted through the step S500. Therefore, the method eliminates the splicing operator in the neural network model through compiling, optimizes the model size, ensures that the running time of the neural network model is not limited by the execution time of the splicing operator, and accelerates the reasoning speed of the neural network model.
The algorithm of the neural network model to be compiled comprises a plurality of operators, and the neural network model comprises an operator list, operator parameters and weight data. Wherein the operator list includes each operator included in the model. For a splice operator, its operator parameters include at least the parameters of its input array and the parameters of its output array.
In this embodiment, the step S100: searching a splicing operator to be eliminated in a neural network model, comprising the following steps:
traversing an operator list of the neural network model, and searching for unequalized splicing operators;
and taking the searched splicing operator as a splicing operator to be eliminated, and executing subsequent steps S200-S500 on the splicing operator to be eliminated.
The splicing operator (Concat) is used for splicing two or more arrays, and in the process of executing the splicing operator, memory copying is executed in a chip substantially, so that the splicing operator belongs to a memory access intensive operator. FIG. 2 is a functional schematic of a splice operator. The input of the splicing operator is two arrays of Array1 and Array2, and the output is Array3, namely the splicing operator is used for splicing the two input arrays of Array1 and Array2 to obtain an output Array of Array3. The size of the output Array3 is Array1+Array2, the front part of the output Array3 is the same as the input Array1, and the rear part is the same as the input Array 2. The splice operator does not perform any operation during execution.
Each array corresponds to a segment of memory, and the structure is expressed as:
where start represents the start address of the array and len represents the length of the array.
In this embodiment, the address information of the output array of the concatenation operator includes a start address of the output array, and the address information of the input array of the concatenation operator includes the start address of the input array and an array length. The address information update for the input array is mainly based on the start address of the output array and the array length of the input array.
In the step S400, updating the address information of the input array of the splicing operator includes the following steps:
the starting address of the output array of the splicing operator is taken as the starting address of the first input array, the starting address of each input array is equal to the sum of the starting address of the previous input array and the array length of the previous input array except the first input array, so that the effect of splicing each input array is realized by updating address information, the spliced input array can be directly used for the input of the next operator, and the function of the splicing operator can be realized under the condition that the splicing operator is not used.
Specifically, the start address OutArray. Start of the output array OutArray of the splicing operator is acquired in step S200, the start addresses InArray1.Start, inArray2.Start … … InArray (N). Start of the N input arrays InArray1, inArray2.… … InArray (N) of the splicing operator, and the lengths InArray1.Len, inArray2.Len … … InArray (N). Len of the N input arrays are acquired in step S300.
In step S400, the start address of the update input array is as follows:
the start address inarray1.start=outarraystart of the first input array;
the start address InArray (i). Start=inarray (i-1). Start+inarray (i-1). Len, i is a positive integer of 2 or more and n or less.
Since the input arrays are spliced by the splicing operator, the input arrays are generally required to be spliced according to a fixed order, in this case, the initial address of the input arrays needs to be updated in a specific order in step S400, so as to ensure that the finally obtained output arrays can be correctly used for inputting the post operator. Specifically, in this embodiment, before taking the start address of the output array of the splicing operator as the start address of the first input array in step S400, the method further includes: and sequencing the input arrays according to the splicing sequence of the splicing operators to the input arrays. The concatenation order of the input array can be obtained according to operator parameters of a concatenation operator in the neural network model.
As shown in fig. 3 and 4, a splicing operator that splices two input arrays is described here as an example. As shown in fig. 3, the input Array1 of the splicing operator is the output of the pre-operator 1, the input Array2 of the splicing operator is the output of the pre-operator 2, and the input Array1 and the input Array2 are spliced in the splicing operator to obtain the output Array3 as the input of the post-operator. As shown in fig. 4, the start addresses of the input Array1 and the input Array2 are updated in the neural network model, the start address of the input Array1 is updated to the start address of the output Array3, and the start address of the input Array2 is updated to the start address of the Array 1+the Array length of the Array 1. As shown in fig. 4, the output of the pre-operator 1, the output of the pre-operator 2 and the input of the post-operator are the same array, and the array is sequentially arranged, so that the actual effect of the splicing operator can be realized, and the splicing operator is not required to be arranged in practice.
In this embodiment, the start address of the array is represented by a DDR (Double Data Rate, one type of memory) offset address of the array on the target machine after compiling, and step S200: the method for obtaining the address information of the output array of the splicing operator comprises the following steps: and acquiring the DDR offset address of the output array of the splicing operator according to the operator parameters of the neural network model. The present invention is not limited to this, and other methods of expressing the initial address are also possible, which fall within the scope of the present invention.
The step S300: the method for obtaining the address information of the input array of the splicing operator comprises the following steps: and acquiring the DDR offset address and the array length of the input array of the splicing operator according to the operator parameters of the neural network model.
As shown in fig. 5, in this embodiment, the step S400: updating the address information of the input array of the splicing operator, comprising the following steps:
s410: acquiring a splicing sequence of the splicing operator on the input array according to operator parameters of the neural network model, for example, when a plurality of input arrays exist, arranging the input arrays into InArray1 and InArray2 … … in sequence according to the splicing sequence;
s420: sequencing the input arrays according to the splicing sequence;
s430: and sequentially updating the DDR offset addresses of the input arrays according to the ordering sequence of the input arrays, so that the DDR offset addresses of the input arrays of the splicing operator are combined and correspond to the DDR offset addresses of the output arrays of the splicing operator.
Specifically, the step S430 includes the steps of:
s431: for the first input array, updating the DDR offset address of the input array to be the DDR offset address of the output array of the splicing operator, namely InArray1. Start=OutArray1. Start;
S432: for the subsequent input arrays except the first input array, the DDR offset address of the input array is updated to be the DDR offset address of the previous input array plus the array length of the previous input array, namely the start address InArray (i) of the ith input array, namely start=InArray (i-1), start+InArray (i-1), len, i is a positive integer which is more than or equal to 2 and less than or equal to n.
In this embodiment, the step S500: deleting a splicing operator in the neural network model, and specifically comprises deleting the splicing operator in an operator list of the neural network model and deleting operator parameters of the splicing operator in the neural network model.
As shown in fig. 6, in this embodiment, the step S500: after deleting the splicing operator in the neural network model, the method further comprises the following steps:
s610: traversing an operator list of the neural network model, and judging whether an undeleted splicing operator exists or not;
if so, S620: selecting an undeployed splicing operator as a splicing operator to be eliminated, and executing steps S200-S500 on the splicing operator to be eliminated so as to eliminate the splicing operator on the basis of reserving the functions of the splicing operator;
if not, then step S630 is continued: judging whether there are other compilation optimization tasks, such as compilation optimization on convolution operators, compilation optimization on full join operators, and so on, if there are other compilation optimization tasks, continuing to step S640: executing other compiling optimization tasks, if there are no other compiling optimization tasks, continuing with step S650: the neural network model is compiled to obtain an executable file which can be operated by the chip, so that the executable file operated in the chip does not contain any splicing operator, the size of the model is reduced, and the operation time of the model in the chip is shortened. The data format in the executable file varies with the requirements of the various chips, with the aim of compiling the operator parameters, input data, etc. in the neural network into a chip-aware format.
As shown in fig. 7, the embodiment of the present invention further provides a compiling optimization system for eliminating a splicing operator, for implementing the compiling optimization method for eliminating a splicing operator, where the system includes:
a stitching operator searching module M100, configured to search a neural network model for a stitching operator to be eliminated, in this embodiment, by traversing an operator list of the neural network model;
the address information acquisition module M200 is used for acquiring the address information of the output array of the splicing operator and acquiring the address information of the input array of the splicing operator;
the address information updating module M300 is configured to update address information of an input array of the splicing operator according to address information of an output array of the splicing operator, so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
a stitching operator deleting module M400, configured to delete a stitching operator in the neural network model, specifically, delete the stitching operator in the operator list of the neural network model and delete the operator parameter of the stitching operator in the neural network model.
According to the compiling optimization system for eliminating the splicing operator, firstly, the splicing operator to be eliminated is searched through the splicing operator searching module M100, then the address information of the output array and the address information of the input array are respectively obtained through the address information obtaining module M200, and the input and input address information is updated according to the address information of the output array of the splicing operator in compiling through the address information updating module M300, so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined, the splicing function is realized through updating of the address information, the splicing operator is not required to be set independently, and therefore the required splicing function can still be realized after the splicing operator is deleted through the splicing operator deleting module M400. Therefore, the system eliminates the splicing operator in the neural network model through compiling, optimizes the model size, ensures that the running time of the neural network model is not limited by the execution time of the splicing operator, and accelerates the reasoning speed of the neural network model.
In this embodiment, the address information of the output array of the concatenation operator includes a start address of the output array, and the address information of the input array of the concatenation operator includes the start address of the input array and an array length.
The address information updating module M300 updates the address information of the input array of the splicing operator by adopting the following steps:
sequencing the input arrays according to the splicing sequence of the splicing operators on the input arrays;
the starting address of the output array of the splicing operator is taken as the starting address of the first input array, the starting address of each input array is equal to the sum of the starting address of the previous input array and the array length of the previous input array except the first input array, so that the effect of splicing each input array is realized by updating address information, the spliced input array can be directly used for the input of the next operator, and the function of the splicing operator can be realized under the condition that the splicing operator is not used.
In this embodiment, the system further includes a network algorithm compiling module, configured to compile the neural network model into an executable file that can be run by the chip, where a format of data in the executable file is a data format that can be recognized by the chip.
Specifically, the splicing operator searching module M100 is configured to search a neural network model for a splicing operator to be eliminated by adopting the following steps:
Traversing an operator list of the neural network model, and searching whether an undeleted splicing operator exists;
if yes, taking the searched splicing operator as the splicing operator to be eliminated;
if not, the network algorithm compiling module judges whether other compiling optimization tasks exist, if so, the other compiling optimization task executing module executes the other compiling optimization tasks, and if not, the network algorithm compiling module compiles the neural network model to obtain an executable file which can be run by a chip.
In this embodiment, the starting address of the array is represented by the DDR offset address of the array on the target machine after compiling, but the present invention is not limited thereto, and other starting address expression methods are also possible, which fall within the scope of the present invention. The address information obtaining module M200 is configured to obtain, according to operator parameters of the neural network model, a DDR offset address of an output array of the splicing operator, and obtain, according to operator parameters of the neural network model, a DDR offset address and an array length of an input array of the splicing operator.
The address information updating module M300 is configured to update address information of the input array of the concatenation operator by adopting the following steps:
acquiring a splicing sequence of the splicing operator to an input array according to operator parameters of the neural network model, and sequencing the input array according to the splicing sequence;
for the first input array, updating the DDR offset address of the input array into the DDR offset address of the output array of the splicing operator;
for a subsequent input array other than the first input array, the DDR offset address of the input array is updated to the DDR offset address of the previous input array plus the array length of the previous input array.
After the address information updating module M300 updates the initial address of the input array, the influence on the structure of the neural network model is as shown in fig. 3 and 4, that is, the output of the front operator and the input of the rear operator of the original splicing operator are the same array, and because the arrays are sequentially arranged, the actual effect of the splicing operator can be achieved, and the splicing operator can be deleted.
The embodiment of the invention also provides compiling and optimizing equipment for eliminating the splicing operator, which comprises a processor; a memory having stored therein executable instructions of the processor; wherein the processor is configured to perform the steps of the compile optimization method of the eliminate splice operator via execution of the executable instructions.
Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" platform.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 8. The electronic device 600 shown in fig. 8 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 8, the electronic device 600 is in the form of a general purpose computing device. Components of electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different system components (including the memory unit 620 and the processing unit 610), a display unit 640, etc.
Wherein the storage unit stores program code that is executable by the processing unit 610 such that the processing unit 610 performs the steps according to various exemplary embodiments of the present invention described in the above-described compilation optimization method section of a cancellation stitching operator of the present specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The memory unit 620 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 6201 and/or cache memory unit 6202, and may further include Read Only Memory (ROM) 6203.
The storage unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 630 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 600, and/or any device (e.g., router, modem, etc.) that enables the electronic device 600 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 650. Also, electronic device 600 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 over the bus 630. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 600, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
In the compiling and optimizing device for eliminating the splicing operator, the steps of the compiling and optimizing method for eliminating the splicing operator are realized when the program in the memory is executed by the processor, so that the device can obtain the technical effects of the compiling and optimizing method for eliminating the splicing operator.
The embodiment of the invention also provides a computer readable storage medium for storing a program, which when being executed by a processor, realizes the steps of the compiling optimization method for eliminating the splicing operator. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention as described in the above-mentioned compilation optimization method section of a cancellation splicing operator of the present specification, when said program product is executed on a terminal device.
Referring to fig. 9, a program product 800 for implementing the above-described method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be executed on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The program in the computer storage medium is executed by the processor to implement the steps of the method for compiling and optimizing the elimination splicing operator, so that the computer storage medium can also obtain the technical effects of the method for compiling and optimizing the elimination splicing operator.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (9)

1. A compiling optimization method for eliminating a splicing operator is characterized by comprising the following steps:
s100: searching a splicing operator to be eliminated in the neural network model;
s200: acquiring address information of an output array of the splicing operator;
s300: acquiring address information of an input array of the splicing operator;
s400: updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
s500: deleting a splicing operator in the neural network model;
the step S400: updating the address information of the input array of the splicing operator, comprising the following steps:
Acquiring a splicing sequence of the splicing operator to an input array according to operator parameters of the neural network model, and sequencing the input array according to the splicing sequence;
sequentially updating the DDR offset addresses of the input arrays according to the ordering sequence of the input arrays, so that the DDR offset addresses of the input arrays of the splicing operator are combined and correspond to the DDR offset addresses of the output arrays of the splicing operator;
the DDR offset addresses of the input arrays are updated in sequence according to the ordering sequence of the input arrays, and the DDR offset addresses of the input arrays are updated in sequence, and the method comprises the following steps:
for the first input array, updating the DDR offset address of the input array into the DDR offset address of the output array of the splicing operator;
for a subsequent input array other than the first input array, the DDR offset address of the input array is updated to the DDR offset address of the previous input array plus the array length of the previous input array.
2. The method for optimizing compilation of elimination splicing operators according to claim 1, wherein the step S100: searching a splicing operator to be eliminated in a neural network model, comprising the following steps:
Traversing an operator list of the neural network model, and searching for unequalized splicing operators;
and taking the searched splicing operator as the splicing operator to be eliminated.
3. The method for optimizing compilation of elimination splicing operators according to claim 1, wherein the step S200: the method for obtaining the address information of the output array of the splicing operator comprises the following steps: acquiring the DDR offset address of the output array of the splicing operator according to the operator parameters of the neural network model;
the step S300: the method for obtaining the address information of the input array of the splicing operator comprises the following steps: and acquiring the DDR offset address and the array length of the input array of the splicing operator according to the operator parameters of the neural network model.
4. The method for optimizing compilation of elimination splicing operators according to claim 1, wherein the step S500: after deleting the splicing operator in the neural network model, the method further comprises the following steps:
traversing an operator list of the neural network model, and judging whether an undeleted splicing operator exists or not;
if yes, selecting the splicing operator which is not eliminated as the splicing operator to be eliminated, and then continuing to step S200;
if not, judging whether other compiling optimization tasks exist, if so, executing the other compiling optimization tasks, and if not, compiling the neural network model to obtain an executable file which can be run by the chip.
5. A compilation optimization system of elimination splice operators, characterized in that it is adapted to implement the compilation optimization method of elimination splice operators according to any of claims 1 to 4, said system comprising:
the splicing operator searching module is used for searching splicing operators to be eliminated in the neural network model;
the address information acquisition module is used for acquiring the address information of the output array of the splicing operator and acquiring the address information of the input array of the splicing operator;
the address information updating module is used for updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
the splicing operator deleting module is used for deleting the splicing operator in the neural network model;
the address information updating module is used for updating the address information of the input array of the splicing operator by adopting the following steps:
acquiring a splicing sequence of the splicing operator to an input array according to operator parameters of the neural network model, and sequencing the input array according to the splicing sequence;
For the first input array, updating the DDR offset address of the input array into the DDR offset address of the output array of the splicing operator;
for a subsequent input array other than the first input array, the DDR offset address of the input array is updated to the DDR offset address of the previous input array plus the array length of the previous input array.
6. The system of claim 5, further comprising a network algorithm compiling module, wherein the splice operator lookup module is configured to lookup a splice operator to be eliminated in a neural network model by:
traversing an operator list of the neural network model, and searching whether an undeleted splicing operator exists;
if yes, taking the searched splicing operator as the splicing operator to be eliminated;
if not, the network algorithm compiling module judges whether other compiling optimization tasks exist, if so, the network algorithm compiling module executes the other compiling optimization tasks, and if not, the network algorithm compiling module compiles the neural network model to obtain an executable file which can be run by a chip.
7. The system according to claim 6, wherein the address information obtaining module is configured to obtain, according to operator parameters of the neural network model, a DDR offset address of an output array of the splicing operator, and obtain, according to operator parameters of the neural network model, a DDR offset address and an array length of an input array of the splicing operator.
8. A compilation optimization device that eliminates splice operators, comprising:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the compilation optimization method of the cancellation stitching operator of any one of claims 1 to 4 via execution of the executable instructions.
9. A computer-readable storage medium storing a program, wherein the program when executed by a processor implements the steps of the compilation optimization method of eliminating a stitching operator according to any one of claims 1 to 4.
CN202110295853.2A 2021-03-19 2021-03-19 Compiling optimization method, system, equipment and storage medium for eliminating splicing operator Active CN113011585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110295853.2A CN113011585B (en) 2021-03-19 2021-03-19 Compiling optimization method, system, equipment and storage medium for eliminating splicing operator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110295853.2A CN113011585B (en) 2021-03-19 2021-03-19 Compiling optimization method, system, equipment and storage medium for eliminating splicing operator

Publications (2)

Publication Number Publication Date
CN113011585A CN113011585A (en) 2021-06-22
CN113011585B true CN113011585B (en) 2023-09-26

Family

ID=76403198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110295853.2A Active CN113011585B (en) 2021-03-19 2021-03-19 Compiling optimization method, system, equipment and storage medium for eliminating splicing operator

Country Status (1)

Country Link
CN (1) CN113011585B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114661301B (en) * 2022-05-24 2022-09-06 深圳思谋信息科技有限公司 Graphics processing unit compiling method, device, compiling acceleration library and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102244518A (en) * 2010-05-10 2011-11-16 百度在线网络技术(北京)有限公司 System and method for realizing parallel decompression of hardware
CN109657782A (en) * 2018-12-14 2019-04-19 北京中科寒武纪科技有限公司 Operation method, device and Related product
WO2019128475A1 (en) * 2017-12-29 2019-07-04 中兴通讯股份有限公司 Method and device for training data, storage medium, and electronic device
CN110659728A (en) * 2019-09-24 2020-01-07 上海寒武纪信息科技有限公司 Neural network optimization method and device, computer equipment and storage medium
CN111401511A (en) * 2019-09-24 2020-07-10 上海寒武纪信息科技有限公司 Data processing method and device, computer equipment and storage medium
CN111523652A (en) * 2019-02-01 2020-08-11 阿里巴巴集团控股有限公司 Processor, data processing method thereof and camera device
CN112463159A (en) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 Compiling method, compiling device, electronic equipment and storage medium
CN112463160A (en) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 Compiling method, compiling device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8275209B2 (en) * 2008-10-10 2012-09-25 Microsoft Corporation Reduced DC gain mismatch and DC leakage in overlap transform processing
US11188820B2 (en) * 2017-09-08 2021-11-30 International Business Machines Corporation Deep neural network performance analysis on shared memory accelerator systems
US20200012924A1 (en) * 2018-07-03 2020-01-09 Sandisk Technologies Llc Pipelining to improve neural network inference accuracy

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102244518A (en) * 2010-05-10 2011-11-16 百度在线网络技术(北京)有限公司 System and method for realizing parallel decompression of hardware
WO2019128475A1 (en) * 2017-12-29 2019-07-04 中兴通讯股份有限公司 Method and device for training data, storage medium, and electronic device
CN109657782A (en) * 2018-12-14 2019-04-19 北京中科寒武纪科技有限公司 Operation method, device and Related product
CN111523652A (en) * 2019-02-01 2020-08-11 阿里巴巴集团控股有限公司 Processor, data processing method thereof and camera device
CN110659728A (en) * 2019-09-24 2020-01-07 上海寒武纪信息科技有限公司 Neural network optimization method and device, computer equipment and storage medium
CN111401511A (en) * 2019-09-24 2020-07-10 上海寒武纪信息科技有限公司 Data processing method and device, computer equipment and storage medium
CN112463159A (en) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 Compiling method, compiling device, electronic equipment and storage medium
CN112463160A (en) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 Compiling method, compiling device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A survey of FPGA design for AI era;Zhengjie Li等;《Journal of Semiconductors》;20200229;全文 *
一种面向雷达应用可重构系统中的数据缓存结构和管理机制;刘波 等;《上海交通大学学报》;20170531;全文 *

Also Published As

Publication number Publication date
CN113011585A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
US10534590B2 (en) Dynamic recompilation techniques for machine learning programs
CN112579063B (en) Acceleration method for exploring optimization space in deep learning compiler
US8533680B2 (en) Approximating finite domains in symbolic state exploration
US5778212A (en) Interprocedural analysis user interface
JP2755154B2 (en) Program conversion processing device and program conversion processing method
US6983458B1 (en) System for optimizing data type definition in program language processing, method and computer readable recording medium therefor
US10901715B1 (en) Lazy compilation and kernel fusion in dynamic computation graphs
US20200249925A1 (en) On-demand loading of dynamic scripting language code for reduced memory usage
US20090125893A1 (en) Method and apparatus for managing variable assignments in a program
US20150331683A1 (en) Automatic Selection Of An Abstract Data Type
CN113641413B (en) Target model loading updating method and device, readable medium and electronic equipment
CN1271890A (en) Method and device for treating abnormal as normal control flow
WO2023197554A1 (en) Model reasoning acceleration method and apparatus, and electronic device and storage medium
CN116170300B (en) Data processing method, electronic equipment and medium for determining abnormal log information
CN110598855A (en) Deep learning model generation method, device, equipment and storage medium
US20090144528A1 (en) Method for running native code across single or multi-core hybrid processor achitecture
CN113011585B (en) Compiling optimization method, system, equipment and storage medium for eliminating splicing operator
US8458679B2 (en) May-constant propagation
CN115809063B (en) Storage process compiling method, system, electronic equipment and storage medium
US20110271265A1 (en) Method of automatic generation of executable code for multi-core parallel processing
CN114356964A (en) Data blood margin construction method and device, storage medium and electronic equipment
US8825561B2 (en) Method and system of determining a prioritized list of users related to a given goal
JP7344259B2 (en) Pattern transformation methods, apparatus, electronic devices, computer storage media and computer program products in deep learning frameworks
CN113626035B (en) Neural network compiling method facing RISC-V equipment based on TVM
US20220207427A1 (en) Method for training data processing model, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 503-3, 398 Jiangsu Road, Changning District, Shanghai 200050

Applicant after: Shanghai Xijing Technology Co.,Ltd.

Address before: Room 503-3, 398 Jiangsu Road, Changning District, Shanghai 200050

Applicant before: SHANGHAI WESTWELL INFORMATION AND TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant