CN117291259A

CN117291259A - Operator optimization method and device, electronic equipment and storage medium

Info

Publication number: CN117291259A
Application number: CN202311193617.5A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Bi Ren Technology Co ltd
Current assignee: Shanghai Bi Ren Technology Co ltd
Priority date: 2023-09-14
Filing date: 2023-09-14
Publication date: 2023-12-26

Abstract

The invention provides an operator optimization method, an operator optimization device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, wherein the operator optimization method comprises the following steps: generating a first scheduling tree (schedule tree) corresponding to the segmentation strategy based on the acquired segmentation strategy (training strategy); the segmentation strategy is used for segmenting data associated with at least one operator in the neural network model; the for circulation sentences in the segmentation strategy are in one-to-one correspondence with the nodes in the first scheduling tree; and optimizing each operator in the neural network model based on the first scheduling tree. In the method, the first scheduling tree is configurable, so that flexible and changeable segmentation strategies can be supported, and each operator is optimized by using the first scheduling tree, so that the segmentation strategies are prevented from being added into code engineering in a hard coding mode, and the optimization process of each operator is accelerated.

Description

Operator optimization method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to an operator optimization method, an operator optimization device, an electronic device, and a storage medium.

Background

With the rapid development of artificial intelligence (Artificial Intelligence, AI), more and more practical applications are beginning to use deep learning techniques, such as speech recognition, machine translation, autopilot, etc., which have great potential in real-life applications, which also make it more and more interesting. The calculation efficiency of the deep neural network (Deep Neural Network, DNN) model adopted by the deep learning technology directly influences the effect of practical application, for example, the calculation time of the algorithms such as target detection, target identification, motion prediction and the like in automatic driving determines the availability and safety of the algorithms. Therefore, how to perform AI computation with high performance is an urgent need for AI chip development at present.

The high-performance operator is the basis of efficient calculation of the AI chip, and in the related technology, a segmentation (tiling) strategy is generally used for optimizing the operator. However, currently, operators are tilled in a fixed pattern. The fixed paradigm approach results in the need to extend branches each time there is a new tilling strategy and add it in hard code (hard code) form to the code engineering for test verification. The fixed paradigm lacks flexibility, cannot support flexible and changeable tilling strategies, and is a very low-efficiency tuning means.

Therefore, how to optimize each operator and further accelerate the tuning process of each operator is a current urgent problem to be solved.

Disclosure of Invention

Aiming at the problems existing in the prior art, the embodiment of the invention provides an operator optimization method, an operator optimization device, electronic equipment and a storage medium.

The invention provides an operator optimization method, which comprises the following steps:

based on the acquired segmentation strategy, generating a first scheduling tree corresponding to the segmentation strategy; the segmentation strategy is used for segmenting data associated with at least one operator in the neural network model; the for circulation sentences in the segmentation strategy are in one-to-one correspondence with the nodes in the first schedule tree;

and optimizing each operator in the neural network model based on the first schedule tree.

Optionally, the optimizing each operator in the neural network model based on the first schedule tree includes:

performing deserialization processing on the first schedule tree to generate a target object corresponding to the first schedule tree;

and optimizing each operator based on the target object.

Optionally, the optimizing each operator based on the target object includes:

performing depth-first traversal operation on the target object to obtain a depth-first traversal result;

based on the depth-first traversal result, segmenting the data associated with each operator to obtain at least one data block associated with each operator;

and optimizing each operator based on each data block.

Optionally, the slicing policy is a plurality of for loop sentences preconfigured in the yaml file.

Optionally, the method further comprises:

under the condition that updating of the segmentation strategy is monitored, updating the first schedule tree based on the updated segmentation strategy to generate a second schedule tree;

and optimizing each operator in the neural network model based on the second schedule tree.

Optionally, the method further comprises:

acquiring a third preconfigured schedule tree under the condition that the segmentation strategy is not acquired;

and optimizing each operator in the neural network model based on the third schedule tree.

The invention also provides an operator optimizing device, which comprises:

the first acquisition module is used for generating a first scheduling tree corresponding to the segmentation strategy based on the acquired segmentation strategy; the segmentation strategy is used for segmenting data associated with at least one operator in the neural network model; the for circulation sentences in the segmentation strategy are in one-to-one correspondence with the nodes in the first schedule tree;

and the first optimization module is used for optimizing each operator in the neural network model based on the first schedule tree.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the operator optimisation method as described in any one of the above when executing the program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an operator optimisation method as described in any one of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements an operator optimisation method as described in any one of the above.

According to the operator optimization method, the operator optimization device, the electronic equipment and the storage medium, a first schedule tree corresponding to the segmentation strategy is generated based on the obtained segmentation strategy, wherein for circulation sentences in the segmentation strategy are in one-to-one correspondence with nodes in the first schedule tree; because the first schedule tree is configurable, flexible and changeable segmentation strategies can be supported, operators in the neural network model are optimized by using the first schedule tree, the segmentation strategies are prevented from being added into code engineering in a hard coding mode, and the tuning process of the operators can be accelerated.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is one of the flow diagrams of the operator optimization method provided by the present invention;

FIG. 2 is a schematic diagram of a first schedule tree provided by the present invention;

FIG. 3 is a schematic diagram of a second schedule tree provided by the present invention;

FIG. 4 is a second flow chart of the operator optimization method provided by the present invention;

FIG. 5 is a logical schematic of an operator optimization method provided by the present invention;

FIG. 6 is a schematic diagram of the structure of an operator optimizing apparatus provided by the present invention;

fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The operator optimizing method provided by the invention is specifically described below with reference to fig. 1 to 5. Fig. 1 is one of the flow diagrams of the operator optimizing method provided by the present invention, and referring to fig. 1, the method includes steps 101 to 102, where:

step 101, generating a first scheduling tree corresponding to the segmentation strategy based on the obtained segmentation strategy; the segmentation strategy is used for segmenting data associated with at least one operator in the neural network model; the for loop sentences in the segmentation strategy are in one-to-one correspondence with the nodes in the first schedule tree.

It should be noted that the execution body of the present invention may be any electronic device capable of implementing operator optimization, for example, any one of a smart phone, a smart watch, a desktop computer, a laptop computer, and the like.

The invention is applied to the operator tuning application scene in the artificial intelligent chip software stack. The high-performance operator is the basis of efficient computation of the artificial intelligent chip, and in the embodiment of the invention, the segmentation strategy is also called a tilling strategy, which is a technology for reducing access to global memory (global memory) by using shared memory (shared memory) on a graphics processor (Graphics Processing Unit, GPU) so as to improve the execution efficiency of a kernel function. The data associated with each operator in the neural network model can be segmented by using the tilling strategy, so that the data quantity input into each operator is reduced, and the performance of the neural network model can be effectively improved.

Because the operator is performed with the form of a fixed paradigm at present, when a new acting strategy exists, branches need to be expanded each time, and the branches are added into code engineering in the form of hard codes for test verification. However, this fixed paradigm lacks flexibility and cannot support flexible and changeable tilling strategies.

Thus, in the embodiment of the present invention, by converting the slicing strategy (training strategy) into a tree structure, that is, a scheduling tree (schedule tree), the schedule tree is a tree-like representation method about the scheduling execution sequence, and is composed of nodes and edges.

Flexible configuration of the training strategy can be realized based on the schedule tree, when new training exists, the schedule tree is only required to be updated, and hard code is not required to be carried out on the training strategy, so that the iteration speed of performance optimization of developers is increased, and a foundation is laid for follow-up intelligent training.

Specifically, the slicing policy may be expressed, for example, as:

in the embodiment of the invention, a user can configure a tilling strategy in a yaml file at will, and after the tilling strategy configured by the user is acquired, the tilling strategy needs to be abstracted into a first schedule tree.

For example, the user configured tilling policy is:

then, after obtaining the above-mentioned tilling policy from the yaml file, the tilling policy needs to be converted into a first schedule tree. FIG. 2 is a schematic diagram of a first schedule tree provided by the present invention.

Step 102, optimizing each operator in the neural network model based on the first schedule tree.

In the embodiment of the invention, after the segmentation strategy is converted into the first schedule tree, the first schedule tree represents the segmentation strategy for segmenting the data associated with each operator in the neural network model.

Based on the first schedule tree, the data associated with each operator is segmented, so that the data processing amount of each operator is reduced, and the optimization of each operator is realized.

According to the operator optimization method provided by the invention, the first schedule tree corresponding to the segmentation strategy is generated based on the obtained segmentation strategy, and the first schedule tree is configurable and can support flexible and changeable segmentation strategies, so that each operator in the neural network model is optimized by using the first schedule tree, the segmentation strategy is prevented from being added into code engineering in a hard coding mode, and the optimization process of each operator can be accelerated.

Optionally, the optimizing each operator in the neural network model based on the first schedule tree may be specifically implemented by the following steps 1) to 2):

step 1), performing deserialization processing on the first schedule tree to generate a target object corresponding to the first schedule tree;

step 2), optimizing each operator based on the target object.

In the embodiment of the invention, for example, a user predefines a tilling policy in a yaml file, firstly, the tilling policy preconfigured by the user is obtained from the yaml file, then after the tilling policy is converted into a first schedule tree, the first schedule tree needs to be subjected to deserialization processing, and a target object corresponding to the first schedule tree which can be executed by a computer is generated.

And then, the data associated with each operator is segmented based on the target object, so that the optimization of each operator is realized.

Optionally, the optimizing the operators based on the target object may be specifically implemented by the following steps 1) to 3):

step 1), performing depth-first traversal operation on the target object to obtain a depth-first traversal result;

step 2), based on the depth-first traversal result, segmenting the data associated with each operator to obtain at least one data block associated with each operator;

step 3), optimizing each operator based on each data block.

In the embodiment of the present invention, depth-first traversal refers to starting from an initial access node, where the initial access node may have a plurality of neighboring nodes, and the policy of depth-first traversal is to first access a first neighboring node, then use the accessed neighboring node as the initial node, and access the first neighboring node, where it can be understood that: each time after the current node is accessed, the first adjacent node of the current node is accessed first.

After a depth-first traversal result is obtained, a code for segmenting data associated with each operator can be generated, and then the data associated with each operator is segmented to obtain at least one data block associated with each operator; and finally, inputting each data block into each operator so that each operator processes each data block to generate a final assembly code.

Optionally, in the case that the update of the segmentation strategy is monitored, the following steps need to be performed:

step 1), under the condition that updating of the segmentation strategy is monitored, updating the first schedule tree based on the updated segmentation strategy to generate a second schedule tree;

and 2) optimizing each operator in the neural network model based on the second schedule tree.

In the embodiment of the invention, as each node in the first schedule tree can be flexibly configured, under the condition that the updating of the segmentation strategy (the tilling strategy) is monitored, the second schedule tree can be generated by adaptively adjusting the first schedule tree based on the updated tilling strategy, so that the flexibly changeable segmentation strategy can be supported, and the hard code of the segmentation strategy is avoided, thereby accelerating the tuning process of each operator.

In practical application, taking fig. 2 as an example, fig. 2 is a first schedule tree before updating a training policy, and when it is detected that the updated segmentation policy is updated, the updated segmentation policy is as follows:

then, the first schedule tree is updated based on the updated tilling to generate a second schedule tree. FIG. 3 is a schematic diagram of a second schedule tree provided by the present invention.

Optionally, in the case where the segmentation policy is not acquired, the following steps need to be performed:

step 1), under the condition that the segmentation strategy is not obtained, obtaining a preconfigured third schedule tree;

and 2) optimizing each operator in the neural network model based on the third schedule tree.

In the embodiment of the invention, if the segmentation strategy is not obtained, for example, the user does not predefine the segmentation strategy in the yaml file, a third preconfigured schedule tree is obtained, and each operator in the neural network model is optimized.

FIG. 4 is a second flow chart of the operator optimizing method provided by the present invention, and referring to FIG. 4, the method includes steps 401-410, wherein:

step 401, a tilling strategy is obtained, wherein the tilling strategy is a for loop sentence preconfigured in a yaml file.

It should be noted that, in the case where the tilling policy is not acquired, steps 409 to 410 are performed.

Step 402, generating a first schedule tree corresponding to a tilling strategy based on the acquired tilling strategy;

step 403, performing deserialization processing on the first schedule tree, and generating a target object corresponding to the first schedule tree.

And step 404, performing depth-first traversal operation on the target object to obtain a depth-first traversal result.

And 405, based on the depth-first traversal result, segmenting data associated with each operator in the neural network model to obtain at least one data block associated with each operator.

Step 406, optimizing each operator based on each data block.

Step 407, updating the first schedule tree based on the updated tilling policy to generate a second schedule tree under the condition that updating of the tilling policy is monitored.

Step 408, optimizing each operator in the neural network model based on the second schedule tree.

And 409, acquiring a third preconfigured schedule tree under the condition that the segmentation strategy is not acquired.

Step 410, optimizing each operator in the neural network model based on the third schedule tree.

FIG. 5 is a logical schematic of an operator optimization method provided by the present invention. Referring to fig. 5, (a) shows a logic diagram of optimizing an operator using a tilling strategy in the prior art.

Specifically, a tilling Algorithm (tilling algorism) is obtained from an open pluggable specification (Open Pluggable Specification, OPS) file, then a tilling generation module is utilized to generate execution codes of a tilling strategy, and operators stored in llapis are optimized by utilizing the execution codes of the tilling strategy, so that an Asm file (file) is finally generated.

In (a), the word code needs to be performed on the tilling strategy, and the fixed paradigm lacks flexibility, so that the method is a very low-efficiency tuning means.

(b) The logic schematic diagram for optimizing operators by utilizing a tilling strategy is shown. In (b), the Schedule Manager first determines whether a predefined Schedule can be obtained from the yaml file.

If so, selecting the corresponding schedule tree. The schedule tree was automatically generated from yaml.

If not, a default schedule tree is obtained from the Tiling Algorithm module. Note that, tiling Algorithm was performed by c++.

Then, the depth-first traversal is performed on the schedule tree by using a Generation System module and a Generators module, and the execution code for generating the tilling strategy optimizes operators stored in the llapis, so as to finally generate an Asm file (file).

The operator optimizing device provided by the invention is described below, and the operator optimizing device described below and the operator optimizing method described above can be referred to correspondingly. FIG. 6 is a schematic structural diagram of an operator optimizing apparatus provided by the present invention, as shown in FIG. 6, the operator optimizing apparatus 600 includes: a first acquisition module 601, a first optimization module 602, wherein:

a first obtaining module 601, configured to generate a first schedule tree corresponding to an obtained segmentation policy based on the segmentation policy; the segmentation strategy is used for segmenting data associated with at least one operator in the neural network model; the for circulation sentences in the segmentation strategy are in one-to-one correspondence with the nodes in the first schedule tree;

a first optimization module 602, configured to optimize each operator in the neural network model based on the first schedule tree.

According to the operator optimizing device provided by the invention, the first schedule tree corresponding to the segmentation strategy is generated based on the obtained segmentation strategy, and the first schedule tree is configurable and can support flexible and changeable segmentation strategies, so that each operator in the neural network model is optimized by using the first schedule tree, the segmentation strategy is prevented from being added into code engineering in a hard coding mode, and the optimization process of each operator can be accelerated.

Optionally, the optimizing module 602 is further configured to:

and optimizing each operator based on the target object.

Optionally, the optimizing module 602 is further configured to:

and optimizing each operator based on each data block.

Optionally, the apparatus further comprises:

the updating module is used for updating the first schedule tree based on the updated segmentation strategy under the condition that the segmentation strategy is updated, and generating a second schedule tree;

and the second optimization module is used for optimizing each operator in the neural network model based on the second schedule tree.

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring a preconfigured third schedule tree under the condition that the segmentation strategy is not acquired;

and the third optimization module is used for optimizing each operator in the neural network model based on the third schedule tree.

Fig. 7 is a schematic structural diagram of an electronic device according to the present invention, and as shown in fig. 7, the electronic device may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may call logic instructions in memory 730 to perform an operator optimization method comprising: based on the acquired segmentation strategy, generating a first scheduling tree corresponding to the segmentation strategy; the segmentation strategy is used for segmenting data associated with at least one operator in the neural network model; the for circulation sentences in the segmentation strategy are in one-to-one correspondence with the nodes in the first schedule tree; and optimizing each operator in the neural network model based on the first schedule tree.

Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the operator optimization method provided by the methods described above, the method comprising: based on the acquired segmentation strategy, generating a first scheduling tree corresponding to the segmentation strategy; the segmentation strategy is used for segmenting data associated with at least one operator in the neural network model; the for circulation sentences in the segmentation strategy are in one-to-one correspondence with the nodes in the first schedule tree; and optimizing each operator in the neural network model based on the first schedule tree.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the operator optimization method provided by the above methods, the method comprising: based on the acquired segmentation strategy, generating a first scheduling tree corresponding to the segmentation strategy; the segmentation strategy is used for segmenting data associated with at least one operator in the neural network model; the for circulation sentences in the segmentation strategy are in one-to-one correspondence with the nodes in the first schedule tree; and optimizing each operator in the neural network model based on the first schedule tree.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An operator optimization method, comprising:

generating a first scheduling tree corresponding to the segmentation strategy based on the obtained segmentation strategy; the segmentation strategy is used for segmenting data associated with at least one operator in the neural network model; the for circulation sentences in the segmentation strategy are in one-to-one correspondence with the nodes in the first scheduling tree;

and optimizing each operator in the neural network model based on the first scheduling tree.

2. The operator optimization method according to claim 1, wherein optimizing each of the operators in the neural network model based on the first scheduling tree comprises:

performing deserialization processing on the first scheduling tree to generate a target object corresponding to the first scheduling tree;

and optimizing each operator based on the target object.

3. The operator optimization method according to claim 2, wherein optimizing the operators based on the target object includes:

and optimizing each operator based on each data block.

4. The operator optimization method of any one of claims 1 to 3 wherein the slicing strategy is a plurality of for loop statements preconfigured in yaml file.

5. A method of operator optimisation according to any one of claims 1 to 3, wherein the method further comprises:

under the condition that the segmentation strategy is updated, updating the first scheduling tree based on the updated segmentation strategy to generate a second scheduling tree;

and optimizing each operator in the neural network model based on the second scheduling tree.

6. A method of operator optimisation according to any one of claims 1 to 3, wherein the method further comprises:

acquiring a preconfigured third scheduling tree under the condition that the segmentation strategy is not acquired;

and optimizing each operator in the neural network model based on the third scheduling tree.

7. An operator optimizing apparatus, comprising:

the first acquisition module is used for generating a first scheduling tree (schedule tree) corresponding to the segmentation strategy based on the acquired segmentation strategy; the segmentation strategy is used for segmenting data associated with at least one operator in the neural network model; the for circulation sentences in the segmentation strategy are in one-to-one correspondence with the nodes in the first scheduling tree;

and the first optimization module is used for optimizing each operator in the neural network model based on the first scheduling tree.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the operator optimisation method according to any one of claims 1 to 6 when the program is executed by the processor.

9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the operator optimization method according to any one of claims 1 to 6.

10. A computer program product comprising a computer program which, when executed by a processor, implements the operator optimisation method according to any one of claims 1 to 6.