CN114925830A - Operator compiling method and device and electronic equipment - Google Patents

Operator compiling method and device and electronic equipment Download PDF

Info

Publication number
CN114925830A
CN114925830A CN202210612358.4A CN202210612358A CN114925830A CN 114925830 A CN114925830 A CN 114925830A CN 202210612358 A CN202210612358 A CN 202210612358A CN 114925830 A CN114925830 A CN 114925830A
Authority
CN
China
Prior art keywords
tensor shape
configuration
shape
operator
jth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210612358.4A
Other languages
Chinese (zh)
Inventor
王亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Tianjin Co Ltd
Original Assignee
Spreadtrum Communications Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Tianjin Co Ltd filed Critical Spreadtrum Communications Tianjin Co Ltd
Priority to CN202210612358.4A priority Critical patent/CN114925830A/en
Publication of CN114925830A publication Critical patent/CN114925830A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9017Indexing; Data structures therefor; Storage structures using directory or table look-up
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/447Target code generation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides an operator compiling method and device and electronic equipment. The method comprises the following steps: the electronic equipment acquires a lookup table corresponding to an operator according to the type of the operator, and inquires out the optimal segmentation configuration corresponding to the tensor shape from the lookup table according to the tensor shape of the operator, wherein the lookup table comprises the tensor shape and the optimal segmentation configuration corresponding to the tensor shape, the electronic equipment compiles the operator according to the tensor shape and the optimal segmentation configuration corresponding to the tensor shape to generate a first data file, and in the operator compiling process, the optimal segmentation configuration is obtained by acquiring the lookup table corresponding to the operator and the operator is compiled through the optimal segmentation configuration, so that when the operator compiling is realized, the flexibility is improved, the utilization rate of a computing unit is improved, the bandwidth is reduced, and the operation speed is accelerated.

Description

Operator compiling method and device and electronic equipment
[ technical field ] A method for producing a semiconductor device
The embodiment of the invention relates to the technical field of Artificial Intelligence (AI), in particular to an operator compiling method, an operator compiling device and electronic equipment.
[ background of the invention ]
With the development of the field of artificial intelligence, neural network models have become more and more complex. In the process of operating the neural network, a large number of operator compiles exist, and along with the increase of parameters of the neural network model, the operating speed of the neural network model is obviously influenced by the compiling speed of the operators, so that the improvement of the compiling efficiency of the operators is very necessary.
At present, when operator compiling is carried out, a theoretically better compiling process is usually realized based on a fixed algorithm and an energy consumption model of hardware, the hardware and the algorithm are bound, and when the operator compiling is changed by the hardware, the algorithm needs to be iterated, so that the problem of poor flexibility is caused; a perfect energy consumption model is very difficult to design and can be matured after multiple iterations, and if the energy consumption model is unreasonable in design, the problems of low utilization rate of a computing unit, high bandwidth and low operation speed can be caused during operator compilation.
[ summary of the invention ]
In view of this, embodiments of the present invention provide an operator compiling method and apparatus, and an electronic device, so as to improve flexibility, improve a utilization rate of a computing unit, reduce a bandwidth, and accelerate an operation speed when implementing operator compiling.
In a first aspect, an embodiment of the present invention provides an operator compiling method, where the method includes:
acquiring a lookup table corresponding to an operator according to the type of the operator;
inquiring an optimal segmentation configuration corresponding to the tensor shape from the lookup table according to the tensor shape of the operator, wherein the lookup table comprises the tensor shape and the optimal segmentation configuration corresponding to the tensor shape;
compiling the operator according to the tensor shape and the optimal segmentation configuration corresponding to the tensor shape to generate a first data file.
Optionally, the operator corresponds to a plurality of set tensor shapes; before the obtaining of the lookup table corresponding to the operator according to the type of the operator, the method further includes:
and updating the lookup table corresponding to the operator according to each set tensor shape and the generated optimal segmentation configuration corresponding to each set tensor shape.
Optionally, the updating the lookup table corresponding to the operator according to each set tensor shape and the generated optimal segmentation configuration corresponding to each set tensor shape includes:
generating an optimal segmentation configuration corresponding to the ith set tensor shape according to the ith set tensor shape;
updating a lookup table corresponding to the operator according to the ith set tensor shape and the optimal segmentation configuration corresponding to the ith set tensor shape;
judging whether the serial number of the ith set tensor shape is less than or equal to the total number of the set tensor shapes;
and if the number of the ith set tensor shape is judged to be smaller than the total number of the plurality of set tensor shapes, taking the (i + 1) th set tensor shape as the ith set tensor shape, and executing the step of generating the optimal splitting configuration corresponding to the ith set tensor shape according to the ith set tensor shape, wherein i is a positive integer.
Optionally, the generating an optimal slicing configuration corresponding to the ith set tensor shape according to the ith set tensor shape includes:
generating a plurality of segmentation configurations corresponding to the ith set tensor shape according to the ith set tensor shape;
and determining the optimal splitting configuration corresponding to the ith set tensor shape according to the ith set tensor shape and the jth splitting configuration corresponding to the ith set tensor shape, wherein j is a positive integer.
Optionally, before updating the lookup table corresponding to the operator according to each set tensor shape and the generated optimal segmentation configuration corresponding to each set tensor shape, the method further includes:
and generating a plurality of set tensor shapes according to the set upper limit of each dimension and the set dimension step length corresponding to each dimension in a plurality of dimensions of the tensor of the operator.
Optionally, the generating a plurality of slicing configurations corresponding to the ith set tensor shape according to the ith set tensor shape includes:
and segmenting the ith set tensor shape according to the set dimension segmentation threshold values corresponding to different dimensions on different dimensions of the ith set tensor shape to generate a plurality of segmentation configurations corresponding to the ith set tensor shape.
Optionally, the determining an optimal slicing configuration corresponding to the ith set tensor shape according to the ith set tensor shape and the jth slicing configuration corresponding to the ith set tensor shape includes:
determining an optimal segmentation configuration according to an ith set tensor shape and a jth segmentation configuration corresponding to the ith set tensor shape;
judging whether the sequence number of the jth splitting configuration is less than or equal to the total number of the plurality of splitting configurations corresponding to the ith set tensor shape;
and if the sequence number of the jth slicing configuration is judged to be smaller than the total number of the plurality of slicing configurations corresponding to the ith set tensor shape, taking the jth +1 th slicing configuration as the jth slicing configuration, and executing the step of determining the optimal slicing configuration according to the ith set tensor shape and the jth slicing configuration corresponding to the ith set tensor shape.
Optionally, the determining an optimal splitting configuration according to the ith set tensor shape and the jth splitting configuration corresponding to the ith set tensor shape includes:
acquiring a comprehensive performance value corresponding to the jth segmentation configuration according to the ith set tensor shape and the jth segmentation configuration corresponding to the ith set tensor shape;
judging whether the comprehensive performance value corresponding to the jth segmentation configuration is superior to the comprehensive performance value corresponding to the jth-1 segmentation configuration;
if the comprehensive performance value corresponding to the jth segmentation configuration is judged to be superior to the comprehensive performance value corresponding to the jth-1 segmentation configuration, setting the jth segmentation configuration as the optimal segmentation configuration; and if the comprehensive performance value corresponding to the jth segmentation configuration is judged to be not superior to the comprehensive performance value corresponding to the jth segmentation configuration, taking the jth segmentation configuration as the optimal segmentation configuration.
Optionally, the obtaining, according to an ith set tensor shape and a jth slicing configuration corresponding to the ith set tensor shape, a comprehensive performance value corresponding to the jth slicing configuration includes:
compiling the operator according to the ith set tensor shape and the jth segmentation configuration corresponding to the ith set tensor shape to generate a second data file;
sending the second data file to a mobile terminal;
receiving a plurality of performance values obtained and sent by the mobile terminal by running the second data file;
and performing weighted calculation on the performance values to generate a comprehensive performance value corresponding to the jth segmentation configuration.
Optionally, the obtaining, according to an ith set tensor shape and a jth slicing configuration corresponding to the ith set tensor shape, a comprehensive performance value corresponding to the jth slicing configuration includes:
compiling the operator according to the ith set tensor shape and the jth segmentation configuration corresponding to the ith set tensor shape to generate a second data file;
running the second data file to obtain a plurality of performance values;
and performing weighted calculation on the performance values to generate a comprehensive performance value corresponding to the jth segmentation configuration.
Optionally, the method further comprises:
and if the sequence number of the jth splitting configuration is judged to be equal to the total number of the splitting configurations corresponding to the ith set tensor shape, executing the step of updating the lookup table corresponding to the operator according to the ith set tensor shape and the optimal splitting configuration corresponding to the ith set tensor shape.
In a second aspect, an embodiment of the present invention provides an operator compiling apparatus, where the apparatus includes:
the acquisition module is used for acquiring a lookup table corresponding to an operator according to the type of the operator;
the query module is used for querying an optimal segmentation configuration corresponding to the tensor shape from the lookup table according to the tensor shape of the operator, wherein the lookup table comprises the tensor shape and the optimal segmentation configuration corresponding to the tensor shape;
and the generation module is used for compiling the operator according to the tensor shape and the optimal segmentation configuration corresponding to the tensor shape to generate a first data file.
Optionally, the method further comprises: updating the module;
and the updating module is used for updating the lookup table corresponding to the operator according to each set tensor shape and the generated optimal segmentation configuration corresponding to each set tensor shape.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, a device on which the computer-readable storage medium is located is controlled to execute the operator compiling method in the first aspect or any possible implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including: one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions that, when executed by the apparatus, cause the apparatus to perform the method of operator compilation in the first aspect or any possible implementation of the first aspect.
In the technical scheme provided by the embodiment of the invention, the electronic equipment acquires the lookup table corresponding to the operator according to the type of the operator, and inquires the optimal segmentation configuration corresponding to the tensor shape from the lookup table according to the tensor shape of the operator, wherein the lookup table comprises the tensor shape and the optimal segmentation configuration corresponding to the tensor shape, the electronic equipment compiles the operator according to the tensor shape and the optimal segmentation configuration corresponding to the tensor shape to generate the first data file, and in the operator compiling process, the optimal segmentation configuration is obtained by acquiring the lookup table corresponding to the operator and the operator is compiled through the optimal segmentation configuration, so that when the operator compiling is realized, the flexibility is improved, the utilization rate of a computing unit is improved, the bandwidth is reduced, and the operation speed is accelerated.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an operator compiling method according to an embodiment of the present invention;
fig. 2 is a flowchart of a lookup table updating method according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for determining an optimal segmentation configuration according to an embodiment of the present invention;
fig. 4 is a flowchart of another method for determining an optimal slicing configuration according to an embodiment of the present invention;
fig. 5 is a flowchart of another method for determining an optimal slicing configuration according to an embodiment of the present invention;
FIG. 6 is a flowchart of a method for obtaining a comprehensive performance value according to an embodiment of the present invention;
FIG. 7 is a flowchart of another method for obtaining a composite performance value according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an operator compiling apparatus according to an embodiment of the present invention;
fig. 9 is a schematic view of an electronic device according to an embodiment of the present invention.
[ detailed description ] A
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used herein is merely one type of associative relationship that describes an associated object, meaning that three types of relationships may exist, e.g., A and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
Fig. 1 is a flowchart of an operator compiling method according to an embodiment of the present invention, as shown in fig. 1, the method includes:
and 11, acquiring a lookup table corresponding to the operator according to the type of the operator.
Various steps of embodiments of the invention may be performed by an electronic device. Electronic devices include, but are not limited to, cell phones, tablet computers, pocket PCs, desktop computers, wearable devices, and the like.
The embodiment of the invention can be realized by various neural network back-end compilers, wherein the neural network back-end compilers are software installed in electronic equipment.
In the embodiment of the present invention, the operator is a basic unit of neural network computation, for example, the operator may include convolution, pooling, activation or normalization, etc. Taking an operator as convolution as an example, the operator is divided by type, and the operator may include two-dimensional convolution or three-dimensional convolution and the like. The different types of operators correspond to different lookup tables, for example, when the types of operators include two-dimensional convolution, the electronic device obtains the lookup table corresponding to the two-dimensional convolution according to the two-dimensional convolution.
And 12, inquiring the optimal segmentation configuration corresponding to the tensor shape from a lookup table according to the tensor shape of the operator, wherein the lookup table comprises the tensor shape and the optimal segmentation configuration corresponding to the tensor shape.
Each tensor shape and the optimal slicing configuration corresponding to each tensor shape may be stored in a lookup table in the form of key-value pairs. For example, when the tensor shape of the operator is [100,100,100, 100], the electronic device obtains the optimal segmentation configuration corresponding to the tensor shape from the lookup table, wherein the optimal segmentation configuration is "segmentation from H dimension into 4 tensor shapes, and each tensor shape is [100, 25, 100,100 ].
And step 13, configuring a compiler according to the tensor shape and the optimal segmentation corresponding to the tensor shape to generate a first data file.
The electronic equipment compiles parameters of an operator according to the tensor shape and the optimal segmentation configuration corresponding to the tensor shape to generate an operator parameter structural body; the electronic equipment carries out serialization processing on the operator parameter structural body to generate a first data file, wherein the type of the first data file is a binary file; the first data file includes, but is not limited to, binary weight data.
In the embodiment of the present invention, step 11 further includes:
and step 10, updating the lookup table corresponding to the operator according to each set tensor shape and the generated optimal segmentation configuration corresponding to each set tensor shape.
In the embodiment of the present invention, before step 10, the method further includes:
and in multiple dimensions of the tensor of the operator, the electronic equipment generates multiple set tensor shapes according to the set upper limit of each dimension and the set dimension step length corresponding to each dimension. For example, the tensor of the operator is composed of four dimensions of Batch (Batch, N), Height (Height, H), Width (Width, W) and Channel (Channel, C), the electronic device sets the upper limit of N, H, W and the upper limit of C to 100, and sets the corresponding dimension step size of each dimension to 10, and then the electronic device may generate a plurality of set tensor shapes according to the upper limit 100 of each dimension of N, H, W and C and the corresponding dimension step size 10 of each dimension, for example, the plurality of set tensor shapes generated include [10, 10, 10, 10], [20, 10, 10], [10, 10, 20, 10], … [100,100,100, 100 ].
Each operator may correspond to a plurality of set tensor shapes, and the electronic device may generate, from each set tensor shape, a plurality of sliced configurations corresponding to the set tensor shape. The electronic equipment obtains the optimal segmentation configuration from the plurality of segmentation configurations corresponding to each set tensor shape, and stores each set tensor shape and the optimal segmentation configuration corresponding to each set tensor shape into the lookup table corresponding to the operator so as to update the lookup table.
For example, the plurality of set tensor shapes for the operator include [10, 10, 10, 10], [20, 10, 10, 10], [10, 20, 10, 10], [10, 10, 20, 10], … [100,100,100, 100 ]. For example, when the tensor shape is set to [100,100,100, 100], the plurality of slicing configurations corresponding to [100,100,100, 100] include: "is divided into 2 tensor shapes from the H dimension, each tensor shape is [100, 50, 100,100 ]", "is divided into 4 tensor shapes from the H dimension, each tensor shape is [100, 25, 100,100 ]", "is divided into 2 tensor shapes from the W dimension, each tensor shape is [100,100, 50, 100 ]", "is divided into 4 tensor shapes from the W dimension, each tensor shape is [100,100, 25, 100 ]", is divided into 2 tensor shapes from the C dimension, each tensor shape is [100,100,100,50], and "is divided into 4 tensor shapes from the C dimension, each tensor shape is [100,100,100,25 ]".
For example, on the H dimension of [100,100,100, 100], the H dimension of [100,100,100, 100] is sliced according to the dimension slicing threshold corresponding to the H dimension, and a plurality of slicing configurations corresponding to [100,100,100, 100] are generated. For example, when the dimension splitting threshold is 2, the splitting configuration includes "splitting into 2 tensor shapes from the H dimension, each tensor shape being [100, 50, 100,100 ]"; for another example, when the dimension splitting threshold is 4, the splitting configuration includes "splitting into 4 tensor shapes from the H dimension, each tensor shape being [100, 25, 100,100 ]".
For another example, in the W dimension of [100,100,100, 100], the W dimension of [100,100,100 ] is sliced according to the dimension slicing threshold corresponding to the W dimension, and a plurality of slicing configurations corresponding to [100,100,100, 100] are generated. For example, when the dimension splitting threshold is 2, the splitting configuration includes "splitting into 2 tensor shapes from the W dimension, each tensor shape being [100,100, 50, 100 ]"; for another example, when the dimension division threshold is 4, the division configuration includes "division from the W dimension into 4 tensor shapes, each tensor shape being [100,100, 25, 100 ]".
For another example, on the C dimension of [100,100,100, 100], the C dimension of [100,100,100 ] is sliced according to the dimension slicing threshold corresponding to the C dimension, and a plurality of slicing configurations corresponding to [100,100,100, 100] are generated. For example, when the dimension slicing threshold is 2, the slicing configuration includes "slicing from the C dimension into 2 tensor shapes, each tensor shape being [100,100,100,50 ]"; for another example, when the dimension slicing threshold is 4, the slicing configuration includes "slicing from the C dimension into 4 tensor shapes, each tensor shape being [100,100,100,25 ]".
In summary, the generated plurality of slicing configurations may include "slicing from the H dimension into 2 tensor shapes, each tensor shape being [100, 50, 100,100 ]", "slicing from the H dimension into 4 tensor shapes, each tensor shape being [100, 25, 100,100 ]", "slicing from the W dimension into 2 tensor shapes, each tensor shape being [100,100, 50, 100 ]", "slicing from the W dimension into 4 tensor shapes, each tensor shape being [100,100, 25, 100 ]", "slicing from the C dimension into 2 tensor shapes, each tensor shape being [100,100,100,50 ]", and "slicing from the C dimension into 4 tensor shapes, each tensor shape being [100,100,100,25 ]".
For example, the optimal segmentation corresponding to the tensor shape [100,100,100, 100] is configured to store [100,100,100 ] and the corresponding "segmentation from the H dimension into 4 tensor shapes, each tensor shape being [100, 25, 100,100 ]" in the lookup table corresponding to the operator.
In the embodiment of the invention, the electronic equipment acquires the lookup table corresponding to the operator according to the type of the operator, inquires the optimal segmentation configuration corresponding to the tensor shape from the lookup table according to the tensor shape of the operator, compiles the operator according to the tensor shape and the optimal segmentation configuration corresponding to the tensor shape to generate the first data file, and obtains the optimal segmentation configuration by acquiring the lookup table corresponding to the operator and compiles the operator through the optimal segmentation configuration in the operator compiling process, so that the flexibility is improved, the utilization rate of a computing unit is improved, the bandwidth is reduced and the operation speed is accelerated when the operator compiling is realized. When the operator is compiled, the operator is compiled in a mode of acquiring the lookup table, so that the change of the back-end hardware is automatically adapted, and the compiling process of the operator cannot be influenced when the back-end hardware is changed.
Fig. 2 is a flowchart of a lookup table updating method according to an embodiment of the present invention, and as shown in fig. 2, step 10 specifically includes:
step 101, generating an optimal splitting configuration corresponding to the ith set tensor shape according to the ith set tensor shape.
For example, when i is equal to 1 and the 1 st set tensor shape is [10, 10, 10, 10], the electronic device generates an optimal slicing configuration corresponding to [10, 10, 10, 10] according to [10, 10, 10, 10], where the optimal slicing configuration is: "is cut into 2 tensor shapes from the C dimension, each tensor shape being [10, 10, 10, 5 ]". For another example, when i is equal to 100 and the 100 th set tensor shape is [100,100,100, 100], the electronic device generates an optimal slicing arrangement corresponding to [100,100,100, 100] from [100,100,100 ], the optimal slicing arrangement being "4 tensor shapes each having [100, 25, 100,100 ] sliced from the H dimension". i is a positive integer, e.g., 1, 2, … N.
Fig. 3 is a flowchart of a method for determining an optimal slicing configuration according to an embodiment of the present invention, and as shown in fig. 3, step 101 specifically includes:
at step 1011, a plurality of sliced arrangements corresponding to the ith set tensor shape are generated from the ith set tensor shape.
On the different dimensionalities of the ith set tensor shape, the electronic equipment segments the ith set tensor shape according to the set dimensionality segmentation threshold values corresponding to the different dimensionalities, and a plurality of segmentation configurations corresponding to the ith set tensor shape are generated. In the compiling process of the operator, in order to realize the high-efficiency reasoning operation under the multiprocessor architecture, the tensor shape of the operator is subjected to data-independent segmentation, and the segmented tensor shape can independently participate in the operation, so that the plurality of processors can simultaneously carry out concurrent operation on the operator.
During operator compilation, the set tensor shape is typically sliced from three dimensions, H, W and C, into equal-sized blocks, where the equal-sized blocks may be configured for slicing. The electronic equipment segments the set tensor shape according to the set dimension segmentation threshold values corresponding to different dimensions, dozens of or even hundreds of segmentation configurations can be generated by the same set tensor shape, and the compiling efficiency of operators can be influenced by different segmentation configurations.
For example, when i is equal to 100, and the 100 th set tensor shape is [100,100,100, 100], the upper limits of the dimension splitting thresholds of H, W and C for setting [100,100,100, 100] are all 4, and the lower limits of the dimension splitting thresholds are all 1, the electronic device splits [100,100,100, 100] according to the dimension splitting threshold of H, W or C, and a plurality of splitting configurations corresponding to [100,100,100, 100] are generated. Wherein the plurality of slicing configurations comprise: "is divided into 2 tensor shapes from the H dimension, each tensor shape is [100, 50, 100,100 ]", "is divided into 4 tensor shapes from the H dimension, each tensor shape is [100, 25, 100,100 ]", "is divided into 2 tensor shapes from the W dimension, each tensor shape is [100,100, 50, 100 ]", "is divided into 4 tensor shapes from the W dimension, each tensor shape is [100,100, 25, 100 ]", is divided into 2 tensor shapes from the C dimension, each tensor shape is [100,100,100,50], and "is divided into 4 tensor shapes from the C dimension, each tensor shape is [100,100,100,25 ]".
For example, in the H dimension of [100,100,100, 100], the electronic device sets the upper limit of the dimension division threshold value of the H dimension to 4 and sets the lower limit of the dimension division threshold value to 1, and then the electronic device divides the H dimension of [100,100,100, 100] according to the upper limit of the set dimension division threshold value of the H dimension 4 and the lower limit of the dimension division threshold value 1, and generates a plurality of division configurations corresponding to [100,100,100, 100], where the plurality of division configurations include "division from the H dimension into 2 tensor shapes, each tensor shape being [100, 50, 100,100 ]", "division from the H dimension into 4 tensor shapes, each tensor shape being [100, 25, 100,100 ]". For another example, in the C dimension of [100,100,100, 100], the electronic device sets the upper limit of the dimension division threshold value of the C dimension to 4 and sets the lower limit of the dimension division threshold value to 1, and then the electronic device divides the C dimension of [100,100,100, 100] according to the upper limit of the set dimension division threshold value of the C dimension 4 and the lower limit of the dimension division threshold value 1, and generates a plurality of division configurations corresponding to [100,100,100, 100], and the plurality of division configurations include "division from the C dimension into 2 tensor shapes, each tensor shape is [100,100,100,50 ]", and "division from the C dimension into 4 tensor shapes, each tensor shape is [100,100,100,25 ]".
Step 1012 determines an optimal slicing configuration corresponding to the ith set tensor shape, j being a positive integer, according to the ith set tensor shape and the jth slicing configuration corresponding to the ith set tensor shape.
The electronic equipment obtains a comprehensive performance value corresponding to the jth segmentation configuration according to the ith set tensor shape and the jth segmentation configuration corresponding to the ith set tensor shape, compares the comprehensive performance value corresponding to the jth segmentation configuration with the comprehensive performance value corresponding to the jth-1 segmentation configuration, and determines the optimal segmentation configuration corresponding to the ith set tensor shape according to a comparison result. For example, when i is equal to 100 and j is equal to 6, the electronic device obtains an overall performance value corresponding to the 6 th slicing configuration according to the 100 th set tensor shape and the 6 th slicing configuration corresponding to the 100 th set tensor shape, compares the overall performance value corresponding to the 6 th slicing configuration with the overall performance value corresponding to the 5 th slicing configuration, and determines the optimal slicing configuration corresponding to the 100 th set tensor shape according to the comparison result.
And 102, updating a lookup table corresponding to an operator according to the ith set tensor shape and the optimal segmentation configuration corresponding to the ith set tensor shape.
And the electronic equipment stores the ith set tensor shape and the optimal segmentation configuration corresponding to the ith set tensor shape into the lookup table corresponding to the operator so as to update the lookup table. For example, when i is equal to 1 and the 1 st set tensor shape is [10, 10, 10, 10], the optimal split corresponding to [10, 10, 10, 10] is configured as "split from the C dimension into 2 tensor shapes, each tensor shape is [10, 10, 10, 5 ]", and [10, 10, 10, 10] and the corresponding "split from the C dimension into 2 tensor shapes, each tensor shape is [10, 10, 10, 5 ]" are saved in the lookup table corresponding to the operator. For another example, when i is equal to 100 and the 100 th set tensor shape is [100,100,100, 100], the optimal split corresponding to [100,100,100, 100] is configured to "split from the H dimension into 4 tensor shapes, each tensor shape is [100, 25, 100,100 ]", and store [100,100,100 ] and the corresponding "split from the H dimension into 4 tensor shapes, each tensor shape is [100, 25, 100,100 ]" in the lookup table corresponding to the operator.
Step 103, judging whether the serial number of the ith set tensor shape is less than or equal to the total number of the plurality of set tensor shapes, and if the serial number of the ith set tensor shape is less than or equal to the total number of the plurality of set tensor shapes, executing step 104; if so, the process ends.
If the electronic device judges that the serial number of the ith set tensor shape is smaller than the total number of the plurality of set tensor shapes, indicating that the optimal splitting configuration corresponding to the rest set tensor shapes needs to be determined, executing step 104; if the electronic device judges that the serial number of the ith set tensor shape is equal to the total number of the plurality of set tensor shapes, the electronic device indicates that all the set tensor shapes are traversed, the optimal segmentation configuration corresponding to each set tensor shape is determined, the updating of the lookup table is completed, and the process is finished. For example, if the total number of the set tensor shapes is 100, and when i is equal to 66 and the number of the 66 th set tensor shape is 66, at this time, the number of the 66 th set tensor shape is smaller than the total number of the set tensor shapes, which indicates that it is necessary to determine the optimal slicing configuration corresponding to the rest set tensor shapes, step 104 is executed; when i is equal to 100 and the serial number of the 100 th set tensor shape is 100, at this time, the serial number of the 100 th set tensor shape is equal to the total number of the plurality of set tensor shapes, which indicates that all the set tensor shapes have been traversed and the optimal splitting configuration corresponding to each set tensor shape has been determined, the updating of the lookup table is completed, and the process is ended.
Step 104 is to set the (i + 1) th set tensor shape as the ith set tensor shape, and step 101 is executed.
When the electronic device judges that the serial number of the ith set tensor shape is smaller than the total number of the plurality of set tensor shapes, and the optimal splitting configuration corresponding to the rest set tensor shapes needs to be determined, the (i + 1) th set tensor shape is used as the ith set tensor shape, and the optimal splitting configuration corresponding to the ith set tensor shape is generated according to the ith set tensor shape. For example, when i is equal to 2 and the number of the 2 nd set tensor shape is 2, and at this time, the number of the 2 nd set tensor shape is smaller than the total number of the plurality of set tensor shapes, the 3 rd set tensor shape is set as the i th set tensor shape, i.e., i is 3, and the optimal slicing arrangement corresponding to the 3 rd set tensor shape is generated from the 3 rd set tensor shape in step 101.
In the embodiment of the invention, the electronic equipment generates the optimal segmentation configuration corresponding to the ith set tensor shape according to the ith set tensor shape, updates the lookup table corresponding to the operator according to the ith set tensor shape and the optimal segmentation configuration corresponding to the ith set tensor shape, and in the process of updating the lookup table, the electronic equipment performs data-independent segmentation on the tensor shape of the operator, the segmented tensor shape can independently participate in operation, and the optimal segmentation configuration is determined, so that efficient concurrent operation is realized.
Fig. 4 is a flowchart of another method for determining an optimal slicing configuration according to an embodiment of the present invention, and as shown in fig. 4, step 1112 specifically includes:
and step S1, determining the optimal splitting configuration according to the ith set tensor shape and the jth splitting configuration corresponding to the ith set tensor shape.
The electronic equipment obtains a comprehensive performance value corresponding to the jth segmentation configuration according to the ith set tensor shape and the jth segmentation configuration corresponding to the ith set tensor shape, compares the comprehensive performance value corresponding to the jth segmentation configuration with the comprehensive performance value corresponding to the (j-1) th segmentation configuration, and determines the optimal segmentation configuration according to a comparison result. For example, when i is equal to 8 and j is equal to 6, the electronic device obtains a comprehensive performance value corresponding to a 6 th slicing configuration according to an 8 th set tensor shape and a 6 th slicing configuration corresponding to the 8 th set tensor shape, compares the comprehensive performance value corresponding to the 6 th slicing configuration with the comprehensive performance value corresponding to a 5 th slicing configuration, and determines an optimal slicing configuration corresponding to the 8 th set tensor shape according to a comparison result.
Step S2, judging whether the sequence number of the jth splitting configuration is less than or equal to the total number of the plurality of splitting configurations corresponding to the ith set tensor shape, if so, executing step S3; if yes, go to step 102.
If the electronic device judges that the sequence number of the jth splitting configuration is smaller than the total number of the multiple splitting configurations corresponding to the ith set tensor shape, which indicates that whether the rest splitting configurations corresponding to the ith set tensor shape need to be determined to be the optimal splitting configuration of the tensor shape, the step S3 is executed; if the electronic device determines that the sequence number of the jth splitting configuration is equal to the total number of the multiple splitting configurations corresponding to the ith set tensor shape, which indicates that all splitting configurations corresponding to the ith set tensor shape have been traversed, and determines the optimal splitting configuration corresponding to the set tensor shape, then step 102 is executed. For example, when i is equal to 3 and j is equal to 6, the total number of the multiple slicing configurations corresponding to the 3 rd set tensor shape is 9, and the serial number of the 6 th slicing configuration is 6, where at this time, the serial number of the 6 th slicing configuration is smaller than the total number of the multiple slicing configurations, which indicates that it needs to be determined whether the remaining slicing configurations corresponding to the 3 rd set tensor shape are the optimal slicing configuration of the tensor shape, step S3 is executed; when j is equal to 9, the serial number of the 9 th splitting configuration is 9, at this time, the serial number of the 9 th splitting configuration is equal to the total number of the multiple splitting configurations, which indicates that all the splitting configurations corresponding to the 3 rd set tensor shape have been traversed, and the optimal splitting configuration corresponding to the tensor shape is determined, then step 102 is executed.
Step S3, the j +1 th slicing configuration is made the j th slicing configuration, and step S1 is performed.
When the electronic equipment judges that the serial number of the jth splitting configuration is smaller than the total number of the multiple splitting configurations corresponding to the ith set tensor shape, the jth +1 splitting configuration is used as the jth splitting configuration, and the optimal splitting configuration is determined according to the ith set tensor shape and the jth splitting configuration corresponding to the ith set tensor shape. For example, when i is equal to 3 and j is equal to 2, the number of the 2 nd slicing configuration is 2, the total number of the plurality of slicing configurations corresponding to the 3 rd set tensor shape is 9, at this time, the number of the 2 nd slicing configuration is smaller than the total number of the plurality of slicing configurations, the 3 rd slicing configuration is taken as the jth slicing configuration, that is, j is 3, and the optimal slicing configuration is determined according to the 3 rd set tensor shape and the 3 rd slicing configuration corresponding to the 3 rd set tensor shape in step S1.
Fig. 5 is a flowchart of another method for determining an optimal slicing configuration according to an embodiment of the present invention, and as shown in fig. 5, step S1 specifically includes:
step S11, obtaining a comprehensive performance value corresponding to the jth slicing configuration according to the ith set tensor shape and the jth slicing configuration corresponding to the ith set tensor shape.
For example, when i is equal to 100 and j is equal to 6, the electronic device obtains the comprehensive performance value corresponding to the 6 th slicing configuration according to the 100 th set tensor shape and the 6 th slicing configuration corresponding to the 100 th set tensor shape. When the 100 th set tensor shape is [100,100,100, 100], and the 6 th slicing configuration is "slicing from the C dimension into 4 tensor shapes, and each tensor shape is [100,100,100,25 ]", the electronic device obtains the comprehensive performance value corresponding to the 6 th slicing configuration according to [100,100,100, 100] and "slicing from the C dimension into 4 tensor shapes, and each tensor shape is [100,100,100,25 ]". i is a positive integer, e.g., 1, 2, … N). j is a positive integer, for example, j is 1, 2, … M.
Step S12, judging whether the comprehensive performance value corresponding to the jth segmentation configuration is superior to the comprehensive performance value corresponding to the jth-1 segmentation configuration, if so, executing step S13; if not, step S14 is executed.
If the electronic device judges that the comprehensive performance value corresponding to the jth segmentation configuration is superior to that corresponding to the jth-1 segmentation configuration, which indicates that the jth segmentation configuration is superior to the jth-1 segmentation configuration, the step S13 is executed; if the electronic device determines that the comprehensive performance value corresponding to the jth segmentation configuration is not superior to the comprehensive performance value corresponding to the jth segmentation configuration, indicating that the jth segmentation configuration is superior to the jth segmentation configuration, the electronic device executes step S14. For example, when j is equal to 6, the comprehensive performance value corresponding to the 6 th slicing configuration is 3, and the comprehensive performance value corresponding to the 5 th slicing configuration is 7, at this time, the comprehensive performance value corresponding to the 6 th slicing configuration is better than the comprehensive performance value corresponding to the 5 th slicing configuration, which indicates that the 6 th slicing configuration is better than the 5 th slicing configuration, then step S13 is executed; when the comprehensive performance value corresponding to the 6 th slicing configuration is 9 and the comprehensive performance value corresponding to the 5 th slicing configuration is 7, at this time, the comprehensive performance value corresponding to the 6 th slicing configuration is not superior to the comprehensive performance value corresponding to the 5 th slicing configuration, which indicates that the 5 th slicing configuration is superior to the 6 th slicing configuration, step S14 is executed.
And step S13, setting the jth splitting configuration as the optimal splitting configuration.
And when the electronic equipment judges that the comprehensive performance value corresponding to the jth segmentation configuration is superior to the comprehensive performance value corresponding to the jth-1 segmentation configuration, setting the jth segmentation configuration as the optimal segmentation configuration. For example, when j is equal to 6, the comprehensive performance value corresponding to the 6 th slicing configuration is 3, and the comprehensive performance value corresponding to the 5 th slicing configuration is 7, at this time, the electronic device determines that the comprehensive performance value corresponding to the 6 th slicing configuration is better than the comprehensive performance value corresponding to the 5 th slicing configuration, and sets the 6 th slicing configuration as the optimal slicing configuration.
And step S14, taking the j-1 st segmentation configuration as the optimal segmentation configuration.
And when the electronic equipment judges that the comprehensive performance value corresponding to the jth segmentation configuration is not superior to the comprehensive performance value corresponding to the jth segmentation configuration, setting the jth segmentation configuration as the optimal segmentation configuration. For example, when j is equal to 6, when the comprehensive performance value corresponding to the 6 th slicing configuration is 9 and the comprehensive performance value corresponding to the 5 th slicing configuration is 7, at this time, the electronic device determines that the comprehensive performance value corresponding to the 6 th slicing configuration is not superior to the comprehensive performance value corresponding to the 5 th slicing configuration, and sets the 5 th slicing configuration as the optimal slicing configuration.
Fig. 6 is a flowchart of a method for obtaining a comprehensive performance value according to an embodiment of the present invention, as shown in fig. 6, as an alternative, step S11 may specifically include:
step S111 generates a second data file from the ith set tensor shape and the jth segmentation arrangement compiler corresponding to the ith set tensor shape.
The electronic equipment configures parameters of a compiling operator according to the ith set tensor shape and the jth segmentation corresponding to the ith set tensor shape to generate an operator parameter structural body; the electronic equipment carries out serialization processing on the operator parameter structural body to generate a second data file, wherein the type of the second data file is a binary file; the second data file includes, but is not limited to, binary weight data.
And step S112, transmitting the second data file to the mobile terminal.
In the embodiment of the present invention, the mobile terminal includes, but is not limited to, a mobile phone, a wearable device, and the like.
When the electronic device is a tablet computer, a portable PC, or a desktop computer, the electronic device sends the second data file to the mobile terminal through a Universal Serial Bus (USB) interface.
And step S113, receiving a plurality of performance values obtained and sent by the mobile terminal operating the second data file.
The plurality of performance values includes, but is not limited to, computation time and bandwidth data. The electronic equipment receives the calculation time and the bandwidth data which are acquired and sent by the mobile terminal through the USB interface.
In the embodiment of the invention, after the mobile terminal receives the second data file sent by the electronic equipment through the USB interface, the second data file is stored to the local, the processor of the mobile terminal runs the second data file, and the mobile terminal obtains a plurality of performance values through the running log.
And step S114, performing weighted calculation on the plurality of performance values to generate a comprehensive performance value corresponding to the jth segmentation configuration.
In the embodiment of the invention, the weight value corresponding to each performance value can be set according to the requirement of a user. And the electronic equipment performs weighted calculation on each performance value according to the set weight value corresponding to each performance value to generate a comprehensive performance value corresponding to the jth segmentation configuration.
And the electronic equipment performs weighted calculation according to the calculation time and the weight value corresponding to the calculation time as well as the bandwidth data and the weight value corresponding to the bandwidth data to generate a comprehensive performance numerical value corresponding to the jth segmentation configuration. For example, when the user demand calculation time is long and the bandwidth is low, the weight value of the calculation time is set to 0.5 and the weight value of the bandwidth data is set to 0.1, when j is equal to 2, the calculation time of the 2 nd slicing configuration is 10 seconds, the bandwidth data is 2 million/second, and at this time, the comprehensive performance value corresponding to the 2 nd slicing configuration is 0.5 × 10+0.1 × 2 — 5.2. For another example, when the user demands the slicing configuration with short calculation time and high bandwidth, the weight value of the calculation time is set to 0.3 and the weight value of the bandwidth data is set to 0.6, when j is equal to 2, the calculation time of the 2 nd slicing configuration is 10 seconds, the bandwidth data is 2 mega/second, and at this time, the comprehensive performance value corresponding to the 2 nd slicing configuration is 0.3 × 10+0.6 × 2 — 4.2.
Fig. 7 is a flowchart of another method for obtaining a comprehensive performance value according to an embodiment of the present invention, as shown in fig. 7, as another alternative, step S11 may specifically include:
step S115 generates a second data file from the ith set tensor shape and the jth segmentation configuration compiler corresponding to the ith set tensor shape.
The electronic equipment configures parameters of a compiling operator according to the ith set tensor shape and the jth segmentation corresponding to the ith set tensor shape to generate an operator parameter structural body; the electronic equipment carries out serialization processing on the operator parameter structural body to generate a second data file, wherein the type of the second data file is a binary file; the second data file includes, but is not limited to, binary weight data.
And step S116, operating the second data file to obtain a plurality of performance values.
The plurality of performance values includes, but is not limited to, computation time and bandwidth data. When the electronic equipment is a mobile phone or wearable equipment, the electronic equipment stores the second data file to the local, a processor of the electronic equipment runs the second data file, and the electronic equipment obtains a plurality of performance values through the running log.
And step 117, performing weighted calculation on the multiple performance values to generate a comprehensive performance value corresponding to the jth segmentation configuration.
And the electronic equipment performs weighted calculation on each performance value according to the set weight value corresponding to each performance value to generate a comprehensive performance value corresponding to the jth segmentation configuration. And the electronic equipment performs weighted calculation according to the calculation time and the weight value corresponding to the calculation time as well as the bandwidth data and the weight value corresponding to the bandwidth data to generate a comprehensive performance numerical value corresponding to the jth segmentation configuration. For example, when the user demands a segmentation configuration with a long calculation time and a low bandwidth, the weight value of the calculation time is set to 0.5 and the weight value of the bandwidth data is set to 0.1, when j is equal to 2, the calculation time of the 2 nd segmentation configuration is 10 seconds, the bandwidth data is 2 mega/second, and at this time, the 2 nd segmentation configuration corresponds to an overall performance value of 0.5 × 10+0.1 × 2 — 5.2. For another example, when the user demands the slicing configuration with short calculation time and high bandwidth, the weight value of the calculation time is set to 0.3 and the weight value of the bandwidth data is set to 0.6, when j is equal to 2, the calculation time of the 2 nd slicing configuration is 10 seconds, the bandwidth data is 2 mega/second, and at this time, the comprehensive performance value corresponding to the 2 nd slicing configuration is 0.3 × 10+0.6 × 2 — 4.2.
It should be noted that: and aiming at the same operator, the weight value corresponding to each performance numerical value is fixed.
Alternatively, when i is equal to 1 and j is equal to 1, the number of the 1 st set tensor shape is 1 and the number of the 1 st slicing arrangement corresponding to the 1 st set tensor shape is 1, and at this time, a lookup table is generated from the 1 st set tensor shape and the 1 st slicing arrangement corresponding to the 1 st set tensor shape, and at this time, the 1 st set tensor shape and the 1 st slicing arrangement corresponding to the 1 st set tensor shape are stored in the lookup table.
In the embodiment of the invention, when the electronic equipment compiles the operators, if the operators of multiple types exist, the electronic equipment can generate the lookup table corresponding to each type of operator. The electronic device may update the lookup tables corresponding to the different types of operators according to step S10.
In the technical solution of the operator compiling method provided by the embodiment of the present invention, the electronic device generates, according to each set tensor shape and the generated optimal segmentation configuration corresponding to each set tensor shape, updating the lookup table corresponding to the operator, acquiring the lookup table corresponding to the operator according to the type of the operator, and the optimal segmentation configuration corresponding to the tensor shape is inquired from the lookup table according to the tensor shape of the operator, wherein the lookup table comprises a tensor shape and an optimal slicing configuration corresponding to the tensor shape, the electronic device compiles an operator according to the tensor shape and the optimal slicing configuration corresponding to the tensor shape to generate a first data file, and before compiling by the operator, when the lookup table is updated, the electronic equipment performs data-independent segmentation on the tensor shape of the operator, the segmented tensor shape can independently participate in operation, and the optimal segmentation configuration is determined; in the process of compiling the operator, the optimal segmentation configuration is obtained by obtaining the lookup table corresponding to the operator, and the operator is compiled through the optimal segmentation configuration, so that when the operator is compiled, the flexibility is improved, the utilization rate of a computing unit is improved, the bandwidth is reduced, and the operation speed is accelerated by efficient concurrent operation. When the operator is compiled, the operator is compiled in a mode of obtaining the lookup table, so that the change of the back-end hardware is automatically adapted, and the compiling process of the operator cannot be influenced when the back-end hardware is changed.
Fig. 8 is a schematic structural diagram of an operator compiling apparatus according to an embodiment of the present invention, and as shown in fig. 8, the apparatus includes: an acquisition module 11, a query module 12 and a generation module 13.
The obtaining module 11 is connected with the inquiring module 12, and the inquiring module 12 is connected with the generating module 13.
The obtaining module 11 is configured to obtain a lookup table corresponding to an operator according to the type of the operator; the query module 12 is configured to query an optimal segmentation configuration corresponding to a tensor shape from a lookup table according to the tensor shape of the operator, where the lookup table includes the tensor shape and the optimal segmentation configuration corresponding to the tensor shape; the generating module 13 is configured to generate a first data file according to the tensor shape and the optimal segmentation configuration compiler corresponding to the tensor shape.
In an embodiment of the present invention, the apparatus further includes: the module 14 is updated.
The updating module 14 is configured to update the lookup table corresponding to the operator according to each set tensor shape and the generated optimal segmentation configuration corresponding to each set tensor shape.
In the embodiment of the present invention, the update module 14 includes: a generation submodule 141, an update submodule 142, a judgment submodule 143, and a setting submodule 144.
The generating submodule 141 is configured to generate an optimal splitting configuration corresponding to the ith set tensor shape according to the ith set tensor shape; the updating submodule 142 is configured to update the lookup table corresponding to the operator according to the ith set tensor shape and the optimal segmentation configuration corresponding to the ith set tensor shape; the judging submodule 143 is configured to judge whether the serial number of the ith set tensor shape is less than or equal to the total number of the plurality of set tensor shapes; the setting sub-module 144 is configured to, if the determining sub-module 143 determines that the serial number of the ith set tensor shape is smaller than the total number of the plurality of set tensor shapes, regard the (i + 1) th set tensor shape as the ith set tensor shape, and trigger the generating sub-module 141 to perform the step of generating the optimal slicing configuration corresponding to the ith set tensor shape according to the ith set tensor shape, where i is a positive integer.
In this embodiment of the present invention, the generating submodule 141 is specifically configured to generate, according to the ith set tensor shape, a plurality of splitting configurations corresponding to the ith set tensor shape; and determining the optimal splitting configuration corresponding to the ith set tensor shape according to the ith set tensor shape and the jth splitting configuration corresponding to the ith set tensor shape, wherein j is a positive integer.
In this embodiment of the present invention, the generating module 13 is further configured to generate a plurality of set tensor shapes according to the set upper limit of each dimension and the set dimension step length corresponding to each dimension in the plurality of dimensions of the tensor of the operator.
In this embodiment of the present invention, the generating submodule 141 is specifically configured to segment the ith set tensor shape according to the set dimension segmentation threshold corresponding to different dimensions in different dimensions of the ith set tensor shape, so as to generate a plurality of segmentation configurations corresponding to the ith set tensor shape.
In the embodiment of the present invention, the generating submodule 141 is specifically configured to determine an optimal splitting configuration according to the ith set tensor shape and the jth splitting configuration corresponding to the ith set tensor shape; judging whether the sequence number of the jth splitting configuration is less than or equal to the total number of the plurality of splitting configurations corresponding to the ith set tensor shape; and if the sequence number of the jth splitting configuration is judged to be less than the total number of the plurality of splitting configurations corresponding to the ith set tensor shape, taking the jth +1 splitting configuration as the jth splitting configuration, and executing the step of determining the optimal splitting configuration according to the ith set tensor shape and the jth splitting configuration corresponding to the ith set tensor shape.
In this embodiment of the present invention, the generating submodule 141 is specifically configured to obtain, according to an ith set tensor shape and a jth slicing configuration corresponding to the ith set tensor shape, a comprehensive performance value corresponding to the jth slicing configuration; judging whether the comprehensive performance value corresponding to the jth segmentation configuration is superior to the comprehensive performance value corresponding to the jth-1 segmentation configuration; if the comprehensive performance value corresponding to the jth segmentation configuration is judged to be superior to the comprehensive performance value corresponding to the jth-1 segmentation configuration, setting the jth segmentation configuration as the optimal segmentation configuration; and if the comprehensive performance value corresponding to the jth segmentation configuration is judged to be not superior to the comprehensive performance value corresponding to the jth segmentation configuration, taking the jth segmentation configuration as the optimal segmentation configuration.
In this embodiment of the present invention, the generating submodule 141 is specifically configured to generate a second data file according to an ith set tensor shape and a jth segmentation configuration compiler corresponding to the ith set tensor shape; transmitting the second data file to the mobile terminal; receiving a plurality of performance values obtained and sent by the mobile terminal by running a second data file; and performing weighted calculation on the plurality of performance values to generate a comprehensive performance value corresponding to the jth segmentation configuration.
In this embodiment of the present invention, the generating submodule 141 is specifically configured to generate a second data file according to an ith set tensor shape and a jth segmentation configuration compiler corresponding to the ith set tensor shape; running a second data file to obtain a plurality of performance values; and performing weighted calculation on the plurality of performance values to generate a comprehensive performance value corresponding to the jth segmentation configuration.
In this embodiment of the present invention, the generating sub-module 141 is specifically configured to, if it is determined that the sequence number of the jth slicing configuration is equal to the total number of the multiple slicing configurations corresponding to the ith set tensor shape, trigger the updating sub-module 142 to execute a step of updating the lookup table corresponding to the operator according to the ith set tensor shape and the optimal slicing configuration corresponding to the ith set tensor shape.
In the technical scheme of the operator compiling device provided by the embodiment of the invention, the electronic device acquires the lookup table corresponding to the operator according to the type of the operator, and inquires the optimal segmentation configuration corresponding to the tensor shape from the lookup table according to the tensor shape of the operator, wherein the lookup table comprises the tensor shape and the optimal segmentation configuration corresponding to the tensor shape, the electronic device compiles the operator according to the tensor shape and the optimal segmentation configuration corresponding to the tensor shape to generate the first data file, and in the operator compiling process, the optimal segmentation configuration is acquired by acquiring the lookup table corresponding to the operator and the compiler is configured through the optimal segmentation, so that the flexibility is improved, the utilization rate of a computing unit is improved, the bandwidth is reduced, and the operation speed is accelerated when the operator compiling is realized.
Fig. 9 is a schematic diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 9, the electronic device 21 includes: the processor 211, the memory 212, and the computer program 213 stored in the memory 212 and capable of running on the processor 211, wherein the computer program 213, when executed by the processor 211, implements the operator compiling method in the embodiments, and therefore, for avoiding repetition, it is not described herein repeatedly.
The electronic device 21 includes, but is not limited to, a processor 211 and a memory 212. Those skilled in the art will appreciate that fig. 9 is merely an example of the electronic device 21, and does not constitute a limitation of the electronic device 21, and may include more or less components than those shown, or combine certain components, or different components, e.g., the electronic device may also include input-output devices, network access devices, buses, etc.
The Processor 211 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 212 may be an internal storage unit of the electronic device 21, such as a hard disk or a memory of the electronic device 21. The memory 212 may also be an external storage device of the electronic device 21, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 21. Further, the memory 212 may also include both an internal storage unit of the electronic device 21 and an external storage device. The memory 212 is used to store computer programs and other programs and data required by the network device. The memory 212 may also be used to temporarily store data that has been output or is to be output.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (15)

1. A method of operator compilation, the method comprising:
acquiring a lookup table corresponding to an operator according to the type of the operator;
inquiring an optimal segmentation configuration corresponding to the tensor shape from the lookup table according to the tensor shape of the operator, wherein the lookup table comprises the tensor shape and the optimal segmentation configuration corresponding to the tensor shape;
compiling the operator according to the tensor shape and the optimal segmentation configuration corresponding to the tensor shape to generate a first data file.
2. The method of claim 1, wherein the operator corresponds to a plurality of set tensor shapes; before the obtaining of the lookup table corresponding to the operator according to the type of the operator, the method further includes:
and updating the lookup table corresponding to the operator according to each set tensor shape and the generated optimal segmentation configuration corresponding to each set tensor shape.
3. The method of claim 2, wherein updating the lookup table corresponding to the operator according to the optimal slicing configuration corresponding to each set tensor shape and each generated set tensor shape comprises:
generating an optimal segmentation configuration corresponding to the ith set tensor shape according to the ith set tensor shape;
updating a lookup table corresponding to the operator according to the ith set tensor shape and the optimal segmentation configuration corresponding to the ith set tensor shape;
judging whether the serial number of the ith set tensor shape is less than or equal to the total number of the plurality of set tensor shapes;
and if the number of the ith set tensor shape is judged to be smaller than the total number of the plurality of set tensor shapes, taking the (i + 1) th set tensor shape as the ith set tensor shape, and executing the step of generating the optimal splitting configuration corresponding to the ith set tensor shape according to the ith set tensor shape, wherein i is a positive integer.
4. The method of claim 3, wherein generating the optimal slicing configuration corresponding to the ith set tensor shape from the ith set tensor shape comprises:
generating a plurality of segmentation configurations corresponding to the ith set tensor shape according to the ith set tensor shape;
and determining the optimal splitting configuration corresponding to the ith set tensor shape according to the ith set tensor shape and the jth splitting configuration corresponding to the ith set tensor shape, wherein j is a positive integer.
5. The method of any of claims 2 through 4, wherein before updating the lookup table corresponding to the operator according to the optimal slicing configuration corresponding to each set tensor shape and each generated set tensor shape, further comprising:
and generating a plurality of set tensor shapes according to the set upper limit of each dimension and the set dimension step length corresponding to each dimension in a plurality of dimensions of the tensor of the operator.
6. The method of claim 4, wherein said generating a plurality of slicing configurations corresponding to an ith set tensor shape from an ith set tensor shape comprises:
and segmenting the ith set tensor shape according to the set dimension segmentation threshold values corresponding to the different dimensions on the different dimensions of the ith set tensor shape to generate a plurality of segmentation configurations corresponding to the ith set tensor shape.
7. The method of claim 4, wherein determining an optimal slicing configuration for the ith set tensor shape from the ith set tensor shape and the jth slicing configuration corresponding to the ith set tensor shape comprises:
determining an optimal segmentation configuration according to an ith set tensor shape and a jth segmentation configuration corresponding to the ith set tensor shape;
judging whether the sequence number of the jth splitting configuration is less than or equal to the total number of the plurality of splitting configurations corresponding to the ith set tensor shape;
and if the sequence number of the jth slicing configuration is judged to be smaller than the total number of the plurality of slicing configurations corresponding to the ith set tensor shape, taking the jth +1 th slicing configuration as the jth slicing configuration, and executing the step of determining the optimal slicing configuration according to the ith set tensor shape and the jth slicing configuration corresponding to the ith set tensor shape.
8. The method of claim 7, wherein determining an optimal slicing configuration according to an ith set tensor shape and a jth slicing configuration corresponding to the ith set tensor shape comprises:
acquiring a comprehensive performance value corresponding to the jth segmentation configuration according to the ith set tensor shape and the jth segmentation configuration corresponding to the ith set tensor shape;
judging whether the comprehensive performance value corresponding to the jth segmentation configuration is superior to the comprehensive performance value corresponding to the jth-1 th segmentation configuration;
if the comprehensive performance value corresponding to the jth segmentation configuration is judged to be superior to the comprehensive performance value corresponding to the jth-1 segmentation configuration, setting the jth segmentation configuration as the optimal segmentation configuration; and if the comprehensive performance value corresponding to the jth segmentation configuration is judged to be not superior to the comprehensive performance value corresponding to the jth segmentation configuration, taking the jth segmentation configuration as the optimal segmentation configuration.
9. The method of claim 8, wherein obtaining an aggregate performance value corresponding to an ith slicing configuration from an ith set tensor shape and a jth slicing configuration corresponding to the ith set tensor shape comprises:
compiling the operator according to the ith set tensor shape and the jth segmentation configuration corresponding to the ith set tensor shape to generate a second data file;
sending the second data file to a mobile terminal;
receiving a plurality of performance values obtained and sent by the mobile terminal by running the second data file;
and performing weighted calculation on the performance values to generate a comprehensive performance value corresponding to the jth segmentation configuration.
10. The method of claim 8, wherein obtaining the composite performance value corresponding to the jth slicing configuration according to the ith set tensor shape and the jth slicing configuration corresponding to the ith set tensor shape comprises:
compiling the operator according to the ith set tensor shape and the jth segmentation configuration corresponding to the ith set tensor shape to generate a second data file;
running the second data file to obtain a plurality of performance values;
and performing weighted calculation on the performance values to generate a comprehensive performance value corresponding to the jth segmentation configuration.
11. The method of claim 8, further comprising:
and if the sequence number of the jth slicing configuration is judged to be equal to the total number of the slicing configurations corresponding to the ith set tensor shape, executing the step of updating the lookup table corresponding to the operator according to the ith set tensor shape and the optimal slicing configuration corresponding to the ith set tensor shape.
12. An operator compilation apparatus, the apparatus comprising:
the acquisition module is used for acquiring a lookup table corresponding to an operator according to the type of the operator;
the query module is used for querying an optimal segmentation configuration corresponding to the tensor shape from the lookup table according to the tensor shape of the operator, wherein the lookup table comprises the tensor shape and the optimal segmentation configuration corresponding to the tensor shape;
and the generating module is used for compiling the operator according to the tensor shape and the optimal segmentation configuration corresponding to the tensor shape to generate a first data file.
13. The apparatus of claim 12, further comprising: updating the module;
and the updating module is used for updating the lookup table corresponding to the operator according to each set tensor shape and the generated optimal segmentation configuration corresponding to each set tensor shape.
14. A computer-readable storage medium, comprising a stored program, wherein when the program is run, the program controls an apparatus on which the computer-readable storage medium is located to execute the operator compilation method according to any one of claims 1 to 11.
15. An electronic device, comprising: one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the apparatus, cause the apparatus to perform the operator compilation method of any of claims 1 to 11.
CN202210612358.4A 2022-05-31 2022-05-31 Operator compiling method and device and electronic equipment Pending CN114925830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210612358.4A CN114925830A (en) 2022-05-31 2022-05-31 Operator compiling method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210612358.4A CN114925830A (en) 2022-05-31 2022-05-31 Operator compiling method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114925830A true CN114925830A (en) 2022-08-19

Family

ID=82812131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210612358.4A Pending CN114925830A (en) 2022-05-31 2022-05-31 Operator compiling method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114925830A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115617351A (en) * 2022-11-29 2023-01-17 上海燧原科技有限公司 Operator segmentation pattern searching method and device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115617351A (en) * 2022-11-29 2023-01-17 上海燧原科技有限公司 Operator segmentation pattern searching method and device, computer equipment and storage medium
CN115617351B (en) * 2022-11-29 2023-03-21 上海燧原科技有限公司 Operator segmentation pattern searching method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US6466946B1 (en) Computer implemented scalable, incremental and parallel clustering based on divide and conquer
Eshratifar et al. Towards collaborative intelligence friendly architectures for deep learning
CN110414569B (en) Clustering implementation method and device
CN114925830A (en) Operator compiling method and device and electronic equipment
CN114418226B (en) Fault analysis method and device for power communication system
CN114626552A (en) Segmentation method and device of machine learning model
CN112256623A (en) Heterogeneous system-based processing performance optimization method and device
Vo et al. A deep learning accelerator based on a streaming architecture for binary neural networks
Esteves et al. Cluster analysis for the cloud: Parallel competitive fitness and parallel k-means++ for large dataset analysis
CN116664335B (en) Intelligent monitoring-based operation analysis method and system for semiconductor production system
CN112771546A (en) Operation accelerator and compression method
CN116662876A (en) Multi-modal cognitive decision method, system, device, equipment and storage medium
Xiao et al. Towards agile dnn accelerator design using incremental synthesis on FPGAs
Liu et al. Mapping of BLASTP algorithm onto GPU clusters
CN115130110B (en) Vulnerability discovery method, device, equipment and medium based on parallel integrated learning
Zoulkarni et al. Hardware acceleration of decision tree learning algorithm
CN112783574B (en) Application development method, device, equipment and storage medium
CN115600664A (en) Operator processing method, electronic device and storage medium
CN115130672A (en) Method and device for calculating convolution neural network by software and hardware collaborative optimization
Foina et al. P-means, a parallel clustering algorithm for a heterogeneous multi-processor environment
Gao et al. IDLA: An instruction-based adaptive CNN accelerator
CN116522002B (en) Container recommendation method and system of navigation service system based on machine learning
CN117473330B (en) Data processing method, device, equipment and storage medium
Nallaperuma et al. Parameterized complexity analysis and more effective construction methods for ACO algorithms and the euclidean traveling salesperson problem
CN113991678B (en) Stability control analysis method, device, medium and equipment for power system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination