CN113011585A - Compiling optimization method, system, equipment and storage medium for eliminating splicing operator - Google Patents

Compiling optimization method, system, equipment and storage medium for eliminating splicing operator Download PDF

Info

Publication number
CN113011585A
CN113011585A CN202110295853.2A CN202110295853A CN113011585A CN 113011585 A CN113011585 A CN 113011585A CN 202110295853 A CN202110295853 A CN 202110295853A CN 113011585 A CN113011585 A CN 113011585A
Authority
CN
China
Prior art keywords
operator
splicing
array
splicing operator
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110295853.2A
Other languages
Chinese (zh)
Other versions
CN113011585B (en
Inventor
谭黎敏
田承雷
宋捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Westwell Information Technology Co Ltd
Original Assignee
Shanghai Westwell Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Westwell Information Technology Co Ltd filed Critical Shanghai Westwell Information Technology Co Ltd
Priority to CN202110295853.2A priority Critical patent/CN113011585B/en
Publication of CN113011585A publication Critical patent/CN113011585A/en
Application granted granted Critical
Publication of CN113011585B publication Critical patent/CN113011585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Devices For Executing Special Programs (AREA)
  • Stored Programmes (AREA)

Abstract

The invention provides a compiling optimization method, a compiling optimization system, compiling optimization equipment and a storage medium for eliminating splicing operators, wherein the method comprises the following steps: searching a splicing operator to be eliminated in the neural network model; acquiring address information of an output array of the splicing operator; acquiring address information of an input array of the splicing operator; updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator, so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined; the splice operator is deleted in the neural network model. According to the invention, the splicing operator in the neural network model is eliminated through compiling, the size of the model is optimized, the running time of the neural network model is not limited by the execution time of the splicing operator any more, and the reasoning speed of the neural network model is accelerated.

Description

Compiling optimization method, system, equipment and storage medium for eliminating splicing operator
Technical Field
The invention relates to the technical field of data processing, in particular to a compiling optimization method, a compiling optimization system, compiling optimization equipment and a storage medium for eliminating splicing operators.
Background
A Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a portion of the coverage of surrounding cells, and performs well for large image processing. It includes a convolutional layer (convolutional layer) and a pooling layer (Pooling layer). Convolutional neural networks have been widely used for image classification, object recognition, and target tracking.
Due to the reasoning of the convolutional neural network, a huge amount of computation is required, and a special AI (Artificial Intelligence) processing chip is produced. Usually, the model needs to be transformed and optimized before it can be run on a dedicated chip, and this process is also called AI compiler. The optimization section focuses on reducing the model size, reducing the run time. The optimization mainly comprises the following steps:
1. operator optimization
2. Graph optimization
3. Model compression
One important means in the optimization of the computation graph is operator fusion, and the purpose of reducing the operation amount and the memory access amount is achieved by combining operators.
Operator fusion is based on observations of deep-learning topology patterns. Deep learning operators can be divided into two categories:
computation intensive operators such as convolution, full join, etc., i.e., there are a large number of computations in operation.
Access-intensive operators, such as ReLU, concatenation, etc., require frequent memory accesses at run-time.
In a typical deep learning model, generally computation-intensive and memory-intensive operators are concomitant, such as "Conv + ReLU". Taking a GPU (Graphics Processing Unit, Graphics processor) as an example, the operators can be fused into a composite operator, and after the GPU executes the Conv, the Relu is executed in the video memory, so that interaction with the main memory can be reduced.
The concatenation operator (concat) is a commonly used operator in neural networks, which connects the input tensors (tensors) on a specified axis. This operator belongs to the access-intensive operator and mainly consumes memory access time. On each hardware platform, as long as the splicing operation is carried out, the execution time is in direct proportion to the memory bandwidth.
Aiming at the access and storage intensive operator, the access of the memory is reduced by fusing adjacent operators through the AI compiler. However, the stitching operator is generally used to fuse features of different layers, and usually two input arrays are far apart in the computation graph and the actual memory, which is not satisfied with this condition. Therefore, the conventional memory access intensive operator fusion method cannot be used for splicing operator fusion.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a compiling optimization method, a compiling optimization system, compiling optimization equipment and a storage medium for eliminating a splicing operator, wherein the splicing operator in a neural network model is eliminated through compiling, and the reasoning speed of the neural network model is accelerated.
The embodiment of the invention provides a compiling optimization method for eliminating a splicing operator, which comprises the following steps:
s100: searching a splicing operator to be eliminated in the neural network model;
s200: acquiring address information of an output array of the splicing operator;
s300: acquiring address information of an input array of the splicing operator;
s400: updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator, so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
s500: the splice operator is deleted in the neural network model.
In some embodiments, the address information of the output array of the splicing operator comprises a start address of the output array, and the address information of the input array of the splicing operator comprises a start address of the input array and an array length;
in step S400, updating the address information of the input array of the splicing operator includes the following steps:
and taking the initial address of the output array of the splicing operator as the initial address of the first input array, wherein the initial address of each input array except the first input array is equal to the sum of the initial address of the previous input array and the array length of the previous input array.
In some embodiments, before the step S400 uses the start address of the output array of the splicing operator as the start address of the first input array, the method further includes the following steps:
and sequencing the input arrays according to the splicing sequence of the splicing operator to the input arrays.
In some embodiments, the step S100: searching for a splicing operator to be eliminated in a neural network model, comprising the following steps:
traversing an operator list of the neural network model, and searching for an unremoved splicing operator;
and taking the searched splicing operator as the splicing operator to be eliminated.
In some embodiments, the step S200: acquiring address information of an output array of the splicing operator, including: acquiring a DDR offset address of an output array of the splicing operator according to the operator parameter of the neural network model;
the step S300: acquiring address information of an input array of the splicing operator, wherein the address information comprises: and acquiring the DDR offset address and the array length of the input array of the splicing operator according to the operator parameter of the neural network model.
In some embodiments, the step S400: updating the address information of the input array of the splicing operator, comprising the following steps:
acquiring the splicing sequence of the splicing operator to the input array according to the operator parameters of the neural network model, and sequencing the input array according to the splicing sequence;
and sequentially updating the DDR offset addresses of the input arrays according to the sorting sequence of the input arrays, so that the DDR offset addresses of the input arrays of the splicing operator correspond to the DDR offset addresses of the output arrays of the splicing operator after being combined.
In some embodiments, sequentially updating the DDR offset addresses of the input arrays according to the sorting order of the input arrays includes the following steps:
for the first input array, updating the DDR offset address of the input array as the DDR offset address of the output array of the splicing operator;
and for the subsequent input arrays except the first input array, updating the DDR offset address of the input array to be the DDR offset address of the previous input array plus the array length of the previous input array.
In some embodiments, the step S500: after the splicing operator is deleted from the neural network model, the method further comprises the following steps:
traversing an operator list of the neural network model, and judging whether an unremoved splicing operator still exists;
if so, selecting the splicing operator which is not eliminated as the splicing operator to be eliminated, and continuing to the step S200;
if not, judging whether other compiling optimization tasks exist, if so, executing the other compiling optimization tasks, and if not, compiling the neural network model to obtain an executable file which can be operated by the chip.
The embodiment of the invention also provides a compiling optimization system for eliminating the splicing operator, which is used for realizing the compiling optimization method for eliminating the splicing operator, and the system comprises the following steps:
the splicing operator searching module is used for searching a splicing operator to be eliminated in the neural network model;
the address information acquisition module is used for acquiring the address information of the output array of the splicing operator and acquiring the address information of the input array of the splicing operator;
the address information updating module is used for updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator, so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
and the splicing operator deleting module is used for deleting the splicing operator in the neural network model.
In some embodiments, the address information of the output array of the splicing operator comprises a start address of the output array, and the address information of the input array of the splicing operator comprises a start address of the input array and an array length;
the address information updating module updates the address information of the input array of the splicing operator by adopting the following steps:
and taking the initial address of the output array of the splicing operator as the initial address of the first input array, wherein the initial address of each input array except the first input array is equal to the sum of the initial address of the previous input array and the array length of the previous input array.
In some embodiments, the method further comprises a network algorithm compiling module, and the splicing operator searching module is configured to search for a splicing operator to be eliminated in the neural network model by using the following steps:
traversing an operator list of the neural network model, and searching whether an unremoved splicing operator exists or not;
if so, taking the searched splicing operator as the splicing operator to be eliminated;
if not, the network algorithm compiling module judges whether other compiling and optimizing tasks exist, if so, the other compiling and optimizing tasks are executed, and if not, the network algorithm compiling module compiles the neural network model to obtain an executable file which can be operated by a chip.
In some embodiments, the address information obtaining module is configured to obtain, according to an operator parameter of the neural network model, a DDR offset address of an output array of the splicing operator, and obtain, according to the operator parameter of the neural network model, a DDR offset address and an array length of an input array of the splicing operator;
the address information updating module is used for updating the address information of the input array of the splicing operator by adopting the following steps:
acquiring the splicing sequence of the splicing operator to the input array according to the operator parameters of the neural network model, and sequencing the input array according to the splicing sequence;
for the first input array, updating the DDR offset address of the input array as the DDR offset address of the output array of the splicing operator;
and for the subsequent input arrays except the first input array, updating the DDR offset address of the input array to be the DDR offset address of the previous input array plus the array length of the previous input array.
The embodiment of the present invention further provides a compiling and optimizing device for eliminating a splicing operator, including:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the compilation optimization method of elimination splice operators via execution of the executable instructions.
The embodiment of the invention also provides a computer-readable storage medium for storing a program, and the program realizes the steps of the compiling optimization method for eliminating the splicing operator when being executed by a processor.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
The compiling optimization method, the compiling optimization system, the compiling optimization equipment and the compiling optimization storage medium for eliminating the splicing operator have the following beneficial effects:
according to the method, the input and input address information is updated according to the address information of the output array of the splicing operator in the compiling process, so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined, the splicing function is realized through the updating of the address information without independently setting the splicing operator, the splicing operator in the neural network model is eliminated through the compiling process, the model size is optimized, the running time of the neural network model is not limited by the execution time of the splicing operator any more, and the reasoning speed of the neural network model is accelerated.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings.
FIG. 1 is a flow diagram of a compilation optimization method for eliminating splice operators according to an embodiment of the present invention;
FIG. 2 is a functional diagram of a splice operator according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a partial structure of a model before a stitching operator is eliminated according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a partial structure of a model after the removal of the stitching operator according to an embodiment of the present invention;
FIG. 5 is a flowchart of updating address information of an input array according to an embodiment of the present invention;
FIG. 6 is a flow diagram of a loop elimination stitching operator according to one embodiment of the present invention;
FIG. 7 is a block diagram of a compilation optimization system that eliminates concatenation operators according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of a compiling optimization device for eliminating a splicing operator according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
As shown in fig. 1, an embodiment of the present invention provides a compiling optimization method for eliminating a splicing operator, including the following steps:
s100: searching a splicing operator to be eliminated in the neural network model;
s200: acquiring address information of an output array of the splicing operator;
s300: acquiring address information of an input array of the splicing operator;
s400: updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator, so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
s500: the splice operator is deleted in the neural network model.
The compiling optimization method for eliminating the splicing operator comprises the steps of firstly searching the splicing operator to be eliminated through the step S100, then respectively obtaining address information of an output array and an input array through the steps S200 and S300, and updating the input and output address information according to the address information of the output array of the splicing operator in compiling through the step S400, so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined, the splicing function is realized through updating the address information, the splicing operator does not need to be independently arranged, and the required splicing function can be still realized after the splicing operator is deleted through the step S500. Therefore, the method eliminates the splicing operator in the neural network model through compiling, optimizes the size of the model, ensures that the running time of the neural network model is not limited by the execution time of the splicing operator any more, and accelerates the reasoning speed of the neural network model.
The algorithm of the neural network model to be compiled comprises a plurality of operators, and the neural network model comprises an operator list, operator parameters and weight data. Wherein, the operator list comprises each operator included in the model. For the splicing operator, the operator parameters at least comprise the parameters of the input array and the parameters of the output array.
In this embodiment, the step S100: searching for a splicing operator to be eliminated in a neural network model, comprising the following steps:
traversing an operator list of the neural network model, and searching for an unremoved splicing operator;
and taking the searched splicing operator as the splicing operator to be eliminated, and executing the subsequent steps S200-S500 on the splicing operator to be eliminated.
The splicing operator (Concat) is used for splicing two or more arrays, and in the process of executing the splicing operator, memory copy is executed in the chip substantially, so that the splicing operator belongs to an access intensive operator. FIG. 2 is a functional diagram of a splice operator. The input of the splicing operator is two arrays of Array1 and Array2, and the output is Array3, namely the splicing operator is used for splicing the two input arrays of Array1 and Array2 to obtain an output Array 3. The output Array3 is sized as Array1+ Array2, the front portion of the output Array3 is identical to the input Array1, and the back portion is identical to the input Array2. The splice operator does not perform any operation during execution.
Each array corresponds to a segment of memory, and the structure is expressed as:
Figure BDA0002984315570000081
where start represents the start address of the array and len represents the length of the array.
In this embodiment, the address information of the output array of the splicing operator includes a start address of the output array, and the address information of the input array of the splicing operator includes a start address of the input array and an array length. The updating of the address information of the input array is mainly based on the initial address of the output array and the array length of the input array to update the initial address of the input array.
In step S400, updating the address information of the input array of the splicing operator includes the following steps:
the initial address of the output array of the splicing operator is used as the initial address of the first input array, except the first input array, the initial address of each input array is equal to the sum of the initial address of the previous input array and the array length of the previous input array, so that the effect of splicing each input array is realized by updating address information, and the spliced input array can be directly used for inputting the next operator, so that the function of the splicing operator can be realized under the condition of not using the splicing operator.
Specifically, the start address outarray.start of the output array OutArray of the stitching operator is obtained in step S200, the start addresses InArray1.start, InArray2.start … … InArray (N) start of the N input arrays InArray1, InArray2 … … InArray (N) of the stitching operator are obtained in step S300, and the lengths InArray1.len, InArray2.len … … InArray (N) len of the N input arrays.
In step S400, the start address of the updated input array is as follows:
the start address inarray1.start of the first input array is outer array.start;
the start address InArray (i) of the ith input array is InArray (i-1), start + InArray (i-1), len, i is a positive integer not less than 2 and not more than n.
Because the input array is generally spliced according to a fixed sequence by the splicing operator, in this case, updating the start address of the input array in step S400 also needs to be updated in a specific sequence, thereby ensuring that the finally obtained output array can be correctly used for the input of the post-operator. Specifically, in this embodiment, before the step S400, taking the start address of the output array of the splicing operator as the start address of the first input array, the method further includes: and sequencing the input arrays according to the splicing sequence of the splicing operator to the input arrays. The splicing sequence of the input array can be obtained according to operator parameters of splicing operators in the neural network model.
As shown in fig. 3 and 4, a concatenation operator for concatenating two input arrays is taken as an example for explanation. As shown in fig. 3, the input Array1 of the splicing operator is the output of the prefix operator 1, the input Array2 of the splicing operator is the output of the prefix operator 2, and after the input Array1 and the input Array2 are spliced in the splicing operator, the output Array3 is obtained and used as the input of the prefix operator. As shown in fig. 4, in the neural network model, the start addresses of the input Array1 and the input Array2 are updated, the start address of the input Array1 is updated to the start address of the output Array3, and the start address of the input Array2 is updated to the start address of Array1+ the Array length of Array1. As shown in fig. 4, the output of the prefix operator 1, the output of the prefix operator 2, and the input of the postoperator are the same array, and the array is arranged sequentially, so that the actual effect of the splicing operator can be realized without actually setting the splicing operator.
In this embodiment, the start address of the array is represented by a DDR (Double Data Rate, a kind of Double Data synchronous dynamic random access memory, internal memory) offset address of the compiled array on the target machine, and the step S200: acquiring address information of an output array of the splicing operator, including: and acquiring the DDR offset address of the output array of the splicing operator according to the operator parameter of the neural network model. Here, the present invention is only an expression of the start address, but the present invention is not limited to this, and other expressions of the start address are also possible, and all fall within the scope of the present invention.
The step S300: acquiring address information of an input array of the splicing operator, wherein the address information comprises: and acquiring the DDR offset address and the array length of the input array of the splicing operator according to the operator parameter of the neural network model.
As shown in fig. 5, in this embodiment, the step S400: updating the address information of the input array of the splicing operator, comprising the following steps:
s410: acquiring the splicing sequence of the input arrays of the splicing operator according to the operator parameters of the neural network model, for example, when a plurality of input arrays exist, sequentially arranging the input arrays into InAlray 1 and InAlray 2 … … according to the splicing sequence;
s420: sorting the input arrays according to the splicing sequence;
s430: and sequentially updating the DDR offset addresses of the input arrays according to the sorting sequence of the input arrays, so that the DDR offset addresses of the input arrays of the splicing operator correspond to the DDR offset addresses of the output arrays of the splicing operator after being combined.
Specifically, the step S430 includes the following steps:
s431: for the first input array, updating the DDR offset address of the input array to be the DDR offset address of the output array of the splicing operator, that is, inarray1.start ═ outer array. start;
s432: for the subsequent input arrays except the first input array, the DDR offset address of the input array is updated to be the DDR offset address of the previous input array plus the array length of the previous input array, namely the start address InAlrray (i) of the ith input array, start is InAlrray (i-1), start + InAlrray (i-1), len, and i is a positive integer which is greater than or equal to 2 and less than or equal to n.
In this embodiment, the step S500: and deleting the splicing operator in the neural network model, specifically, deleting the splicing operator in an operator list of the neural network model and deleting an operator parameter of the splicing operator in the neural network model.
As shown in fig. 6, in this embodiment, the step S500: after the splicing operator is deleted from the neural network model, the method further comprises the following steps:
s610: traversing an operator list of the neural network model, and judging whether an unremoved splicing operator still exists;
if so, S620: selecting an unremoved splicing operator as a splicing operator to be eliminated, and then executing the steps S200-S500 on the splicing operator to be eliminated so as to eliminate the splicing operator on the basis of keeping the function of the splicing operator;
if not, continue with step S630: judging whether other compiling optimization tasks such as compiling optimization of convolution operators, compiling optimization of full-link operators and the like exist, if so, continuing to step 640: executing other compiling optimization tasks, if no other compiling optimization tasks exist, continuing to the step S650: the neural network model is compiled to obtain an executable file which can be operated by a chip, so that the executable file which operates in the chip does not contain any splicing operator, the size of the model is reduced, and the operation time of the model on the chip is reduced. The data format in the executable file is different according to different requirements of various chips, and the purpose is to compile operator parameters, input data and the like in the neural network into a format recognized by the chips.
As shown in fig. 7, an embodiment of the present invention further provides a compiling and optimizing system for eliminating a splicing operator, which is used to implement the compiling and optimizing method for eliminating a splicing operator, where the system includes:
a splicing operator searching module M100, configured to search a splicing operator to be eliminated in the neural network model, in this embodiment, the splicing operator to be eliminated is searched by traversing an operator list of the neural network model;
an address information obtaining module M200, configured to obtain address information of an output array of the splicing operator, and obtain address information of an input array of the splicing operator;
the address information updating module M300 is configured to update the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator, so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
a splicing operator deleting module M400, configured to delete a splicing operator in the neural network model, specifically, to delete the splicing operator in an operator list of the neural network model and delete an operator parameter of the splicing operator in the neural network model.
According to the compiling optimization system for eliminating the splicing operator, firstly, the splicing operator to be eliminated is searched through the splicing operator searching module M100, then the address information of the output array and the address information of the input array are respectively obtained through the address information obtaining module M200, the address information of the input and the output array of the splicing operator is updated through the address information updating module M300 in compiling according to the address information of the output array of the splicing operator, the address information of the input array of the splicing operator is combined and then corresponds to the address information of the output array of the splicing operator, therefore, the splicing function is achieved through updating of the address information, the splicing operator does not need to be separately arranged, and the required splicing function can be achieved after the splicing operator is deleted through the splicing operator deleting module M400. Therefore, the system eliminates the splicing operator in the neural network model through compiling, optimizes the size of the model, ensures that the running time of the neural network model is not limited by the execution time of the splicing operator any more, and accelerates the reasoning speed of the neural network model.
In this embodiment, the address information of the output array of the splicing operator includes a start address of the output array, and the address information of the input array of the splicing operator includes a start address of the input array and an array length.
The address information updating module M300 updates the address information of the input array of the splicing operator by the following steps:
sorting the input arrays according to the splicing sequence of the splicing operator to the input arrays;
the initial address of the output array of the splicing operator is used as the initial address of the first input array, except the first input array, the initial address of each input array is equal to the sum of the initial address of the previous input array and the array length of the previous input array, so that the effect of splicing each input array is realized by updating address information, and the spliced input array can be directly used for inputting the next operator, so that the function of the splicing operator can be realized under the condition of not using the splicing operator.
In this embodiment, the system further includes a network algorithm compiling module, configured to compile the neural network model into an executable file that can be executed by the chip, where a format of data in the executable file is a data format that can be recognized by the chip.
Specifically, the splicing operator searching module M100 is configured to search a splicing operator to be eliminated in the neural network model by using the following steps:
traversing an operator list of the neural network model, and searching whether an unremoved splicing operator exists or not;
if so, taking the searched splicing operator as the splicing operator to be eliminated;
if not, the network algorithm compiling module judges whether other compiling and optimizing tasks exist, if other compiling and optimizing tasks exist, other compiling and optimizing tasks are executed by other compiling and optimizing task executing modules, and if other compiling and optimizing tasks do not exist, the network algorithm compiling module compiles the neural network model to obtain an executable file which can be operated by a chip.
In this embodiment, the start address of the array is expressed by the DDR offset address of the compiled array on the target machine, but the invention is not limited thereto, and other starting address expressions are also possible and all fall within the scope of the invention. The address information obtaining module M200 is configured to obtain a DDR offset address of an output array of the splicing operator according to the operator parameter of the neural network model, and obtain a DDR offset address and an array length of an input array of the splicing operator according to the operator parameter of the neural network model.
The address information updating module M300 is configured to update the address information of the input array of the splicing operator by adopting the following steps:
acquiring the splicing sequence of the splicing operator to the input array according to the operator parameters of the neural network model, and sequencing the input array according to the splicing sequence;
for the first input array, updating the DDR offset address of the input array as the DDR offset address of the output array of the splicing operator;
and for the subsequent input arrays except the first input array, updating the DDR offset address of the input array to be the DDR offset address of the previous input array plus the array length of the previous input array.
After the initial address of the input array is updated by the address information updating module M300, the influence on the structure of the neural network model is as shown in fig. 3 and fig. 4, that is, the output of the pre-operator and the input of the post-operator of the original splicing operator are the same array, and since the arrays are arranged sequentially, the actual effect of the splicing operator can be realized, and the splicing operator can be deleted.
The embodiment of the invention also provides compiling optimization equipment for eliminating the splicing operator, which comprises a processor; a memory having stored therein executable instructions of the processor; wherein the processor is configured to perform the steps of the compilation optimization method of elimination splice operators via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 8. The electronic device 600 shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 8, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code, which can be executed by the processing unit 610, so that the processing unit 610 executes the steps according to various exemplary embodiments of the present invention described in the above compiling optimization method for eliminating a splicing operator section of the present specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In the compiling and optimizing device for eliminating the splicing operator, when the program in the memory is executed by the processor, the step of the compiling and optimizing method for eliminating the splicing operator is realized, so that the device can also obtain the technical effect of the compiling and optimizing method for eliminating the splicing operator.
The embodiment of the invention also provides a computer-readable storage medium for storing a program, and the program realizes the steps of the compiling optimization method for eliminating the splicing operator when being executed by a processor. In some possible embodiments, the various aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present invention described in the above-mentioned compilation optimization method section of the elimination splice operator of the present specification, when the program product is executed on the terminal device.
Referring to fig. 9, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be executed on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
When being executed by a processor, the program in the computer storage medium implements the steps of the compiling optimization method for eliminating the splicing operator, so that the computer storage medium can also obtain the technical effect of the compiling optimization method for eliminating the splicing operator.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (14)

1. A compiling optimization method for eliminating a splicing operator is characterized by comprising the following steps:
s100: searching a splicing operator to be eliminated in the neural network model;
s200: acquiring address information of an output array of the splicing operator;
s300: acquiring address information of an input array of the splicing operator;
s400: updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator, so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
s500: the splice operator is deleted in the neural network model.
2. The compiling optimization method for eliminating a splicing operator according to claim 1, wherein the address information of the output array of the splicing operator comprises a start address of the output array, and the address information of the input array of the splicing operator comprises a start address of the input array and an array length;
in step S400, updating the address information of the input array of the splicing operator includes the following steps:
and taking the initial address of the output array of the splicing operator as the initial address of the first input array, wherein the initial address of each input array except the first input array is equal to the sum of the initial address of the previous input array and the array length of the previous input array.
3. The compiling and optimizing method for eliminating a splicing operator according to claim 2, wherein in the step S400, before the start address of the output array of the splicing operator is the start address of the first input array, further comprising the steps of:
and sequencing the input arrays according to the splicing sequence of the splicing operator to the input arrays.
4. The compilation optimization method for eliminating splicing operators according to claim 1, wherein the step S100: searching for a splicing operator to be eliminated in a neural network model, comprising the following steps:
traversing an operator list of the neural network model, and searching for an unremoved splicing operator;
and taking the searched splicing operator as the splicing operator to be eliminated.
5. The compilation optimization method for eliminating splicing operators according to claim 1, wherein the step S200: acquiring address information of an output array of the splicing operator, including: acquiring a DDR offset address of an output array of the splicing operator according to the operator parameter of the neural network model;
the step S300: acquiring address information of an input array of the splicing operator, wherein the address information comprises: and acquiring the DDR offset address and the array length of the input array of the splicing operator according to the operator parameter of the neural network model.
6. The compilation optimization method for eliminating splicing operators according to claim 4, wherein the step S400: updating the address information of the input array of the splicing operator, comprising the following steps:
acquiring the splicing sequence of the splicing operator to the input array according to the operator parameters of the neural network model, and sequencing the input array according to the splicing sequence;
and sequentially updating the DDR offset addresses of the input arrays according to the sorting sequence of the input arrays, so that the DDR offset addresses of the input arrays of the splicing operator correspond to the DDR offset addresses of the output arrays of the splicing operator after being combined.
7. The compiling and optimizing method for eliminating a splicing operator according to claim 6, wherein the step of sequentially updating the DDR offset addresses of the input arrays according to the sorting order of the input arrays comprises the steps of:
for the first input array, updating the DDR offset address of the input array as the DDR offset address of the output array of the splicing operator;
and for the subsequent input arrays except the first input array, updating the DDR offset address of the input array to be the DDR offset address of the previous input array plus the array length of the previous input array.
8. The compilation optimization method for eliminating splicing operators according to claim 1, wherein the step S500: after the splicing operator is deleted from the neural network model, the method further comprises the following steps:
traversing an operator list of the neural network model, and judging whether an unremoved splicing operator still exists;
if so, selecting the splicing operator which is not eliminated as the splicing operator to be eliminated, and continuing to the step S200;
if not, judging whether other compiling optimization tasks exist, if so, executing the other compiling optimization tasks, and if not, compiling the neural network model to obtain an executable file which can be operated by the chip.
9. A compilation optimization system for eliminating a splicing operator, wherein the compilation optimization method for eliminating the splicing operator is implemented according to any one of claims 1 to 8, and the system comprises:
the splicing operator searching module is used for searching a splicing operator to be eliminated in the neural network model;
the address information acquisition module is used for acquiring the address information of the output array of the splicing operator and acquiring the address information of the input array of the splicing operator;
the address information updating module is used for updating the address information of the input array of the splicing operator according to the address information of the output array of the splicing operator, so that the address information of the input array of the splicing operator corresponds to the address information of the output array of the splicing operator after being combined;
and the splicing operator deleting module is used for deleting the splicing operator in the neural network model.
10. The compiling optimization system for eliminating a splicing operator according to claim 9, wherein the address information of the output array of the splicing operator comprises a start address of the output array, and the address information of the input array of the splicing operator comprises a start address of the input array and an array length;
the address information updating module updates the address information of the input array of the splicing operator by adopting the following steps:
and taking the initial address of the output array of the splicing operator as the initial address of the first input array, wherein the initial address of each input array except the first input array is equal to the sum of the initial address of the previous input array and the array length of the previous input array.
11. The compiling and optimizing system for eliminating a splicing operator according to claim 9 further comprising a network algorithm compiling module, wherein the splicing operator searching module is configured to search a neural network model for a splicing operator to be eliminated by:
traversing an operator list of the neural network model, and searching whether an unremoved splicing operator exists or not;
if so, taking the searched splicing operator as the splicing operator to be eliminated;
if not, the network algorithm compiling module judges whether other compiling and optimizing tasks exist, if so, the other compiling and optimizing tasks are executed, and if not, the network algorithm compiling module compiles the neural network model to obtain an executable file which can be operated by a chip.
12. The compiling and optimizing system for eliminating the splicing operator according to claim 11, wherein the address information obtaining module is configured to obtain a DDR offset address of an output array of the splicing operator according to the operator parameter of the neural network model, and obtain a DDR offset address and an array length of an input array of the splicing operator according to the operator parameter of the neural network model;
the address information updating module is used for updating the address information of the input array of the splicing operator by adopting the following steps:
acquiring the splicing sequence of the splicing operator to the input array according to the operator parameters of the neural network model, and sequencing the input array according to the splicing sequence;
for the first input array, updating the DDR offset address of the input array as the DDR offset address of the output array of the splicing operator;
and for the subsequent input arrays except the first input array, updating the DDR offset address of the input array to be the DDR offset address of the previous input array plus the array length of the previous input array.
13. A compilation optimization device that eliminates concatenation operators, comprising:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the compilation optimization method of elimination-splice operator of any of claims 1 to 8 via execution of the executable instructions.
14. A computer-readable storage medium storing a program, wherein the program when executed by a processor implements the steps of the method for compilation optimization of elimination-concatenation operators of any of claims 1 to 8.
CN202110295853.2A 2021-03-19 2021-03-19 Compiling optimization method, system, equipment and storage medium for eliminating splicing operator Active CN113011585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110295853.2A CN113011585B (en) 2021-03-19 2021-03-19 Compiling optimization method, system, equipment and storage medium for eliminating splicing operator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110295853.2A CN113011585B (en) 2021-03-19 2021-03-19 Compiling optimization method, system, equipment and storage medium for eliminating splicing operator

Publications (2)

Publication Number Publication Date
CN113011585A true CN113011585A (en) 2021-06-22
CN113011585B CN113011585B (en) 2023-09-26

Family

ID=76403198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110295853.2A Active CN113011585B (en) 2021-03-19 2021-03-19 Compiling optimization method, system, equipment and storage medium for eliminating splicing operator

Country Status (1)

Country Link
CN (1) CN113011585B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114661301A (en) * 2022-05-24 2022-06-24 深圳思谋信息科技有限公司 Graphics processing unit compiling method, device, compiling acceleration library and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100092098A1 (en) * 2008-10-10 2010-04-15 Microsoft Corporation Reduced dc gain mismatch and dc leakage in overlap transform processing
CN102244518A (en) * 2010-05-10 2011-11-16 百度在线网络技术(北京)有限公司 System and method for realizing parallel decompression of hardware
US20190080232A1 (en) * 2017-09-08 2019-03-14 International Business Machines Corporation Deep neural network perforance analysis on shared memory accelerator systems
CN109657782A (en) * 2018-12-14 2019-04-19 北京中科寒武纪科技有限公司 Operation method, device and Related product
WO2019128475A1 (en) * 2017-12-29 2019-07-04 中兴通讯股份有限公司 Method and device for training data, storage medium, and electronic device
CN110659728A (en) * 2019-09-24 2020-01-07 上海寒武纪信息科技有限公司 Neural network optimization method and device, computer equipment and storage medium
US20200012924A1 (en) * 2018-07-03 2020-01-09 Sandisk Technologies Llc Pipelining to improve neural network inference accuracy
CN111401511A (en) * 2019-09-24 2020-07-10 上海寒武纪信息科技有限公司 Data processing method and device, computer equipment and storage medium
CN111523652A (en) * 2019-02-01 2020-08-11 阿里巴巴集团控股有限公司 Processor, data processing method thereof and camera device
CN112463159A (en) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 Compiling method, compiling device, electronic equipment and storage medium
CN112463160A (en) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 Compiling method, compiling device, electronic equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100092098A1 (en) * 2008-10-10 2010-04-15 Microsoft Corporation Reduced dc gain mismatch and dc leakage in overlap transform processing
CN102244518A (en) * 2010-05-10 2011-11-16 百度在线网络技术(北京)有限公司 System and method for realizing parallel decompression of hardware
US20190080232A1 (en) * 2017-09-08 2019-03-14 International Business Machines Corporation Deep neural network perforance analysis on shared memory accelerator systems
WO2019128475A1 (en) * 2017-12-29 2019-07-04 中兴通讯股份有限公司 Method and device for training data, storage medium, and electronic device
US20200012924A1 (en) * 2018-07-03 2020-01-09 Sandisk Technologies Llc Pipelining to improve neural network inference accuracy
CN109657782A (en) * 2018-12-14 2019-04-19 北京中科寒武纪科技有限公司 Operation method, device and Related product
CN111523652A (en) * 2019-02-01 2020-08-11 阿里巴巴集团控股有限公司 Processor, data processing method thereof and camera device
CN110659728A (en) * 2019-09-24 2020-01-07 上海寒武纪信息科技有限公司 Neural network optimization method and device, computer equipment and storage medium
CN111401511A (en) * 2019-09-24 2020-07-10 上海寒武纪信息科技有限公司 Data processing method and device, computer equipment and storage medium
CN112463159A (en) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 Compiling method, compiling device, electronic equipment and storage medium
CN112463160A (en) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 Compiling method, compiling device, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ZHENGJIE LI等: "A survey of FPGA design for AI era", 《JOURNAL OF SEMICONDUCTORS》 *
ZHENGJIE LI等: "A survey of FPGA design for AI era", 《JOURNAL OF SEMICONDUCTORS》, 29 February 2020 (2020-02-29) *
刘波 等: "一种面向雷达应用可重构系统中的数据缓存结构和管理机制", 《上海交通大学学报》 *
刘波 等: "一种面向雷达应用可重构系统中的数据缓存结构和管理机制", 《上海交通大学学报》, 31 May 2017 (2017-05-31) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114661301A (en) * 2022-05-24 2022-06-24 深圳思谋信息科技有限公司 Graphics processing unit compiling method, device, compiling acceleration library and storage medium

Also Published As

Publication number Publication date
CN113011585B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
US10534590B2 (en) Dynamic recompilation techniques for machine learning programs
JP2755154B2 (en) Program conversion processing device and program conversion processing method
CN112579063B (en) Acceleration method for exploring optimization space in deep learning compiler
US8533680B2 (en) Approximating finite domains in symbolic state exploration
US5355494A (en) Compiler for performing incremental live variable analysis for data-parallel programs
US5854932A (en) Compiler and method for avoiding unnecessary recompilation
US6983458B1 (en) System for optimizing data type definition in program language processing, method and computer readable recording medium therefor
US9696974B2 (en) Graph-based model for type systems
US5778212A (en) Interprocedural analysis user interface
US20080288915A1 (en) Determining destinations of a dynamic branch
US7353503B2 (en) Efficient dead code elimination
US20200249925A1 (en) On-demand loading of dynamic scripting language code for reduced memory usage
US10228920B2 (en) Automatic selection of an abstract data type
US9201692B2 (en) System and method for generating a plan to complete a task in computing environment
Loogen et al. Distributed implementation of programmed graph reduction
US8752056B2 (en) Running native code across single or multi-core hybrid processor achitecture
WO2023197554A1 (en) Model reasoning acceleration method and apparatus, and electronic device and storage medium
CN110598855A (en) Deep learning model generation method, device, equipment and storage medium
CN115809063B (en) Storage process compiling method, system, electronic equipment and storage medium
CN113011585B (en) Compiling optimization method, system, equipment and storage medium for eliminating splicing operator
CN112506523A (en) BERT model optimization method and system, electronic device and storage medium
US20110271265A1 (en) Method of automatic generation of executable code for multi-core parallel processing
US5515535A (en) System and method for parallel variable optimization
CN114356964A (en) Data blood margin construction method and device, storage medium and electronic equipment
WO2000022523A1 (en) Apparatus and method for program optimizing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 503-3, 398 Jiangsu Road, Changning District, Shanghai 200050

Applicant after: Shanghai Xijing Technology Co.,Ltd.

Address before: Room 503-3, 398 Jiangsu Road, Changning District, Shanghai 200050

Applicant before: SHANGHAI WESTWELL INFORMATION AND TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant