CN110968321B - Tensor calculation code optimization method, device, equipment and medium - Google Patents

Tensor calculation code optimization method, device, equipment and medium Download PDF

Info

Publication number
CN110968321B
CN110968321B CN201911025891.5A CN201911025891A CN110968321B CN 110968321 B CN110968321 B CN 110968321B CN 201911025891 A CN201911025891 A CN 201911025891A CN 110968321 B CN110968321 B CN 110968321B
Authority
CN
China
Prior art keywords
optimization
space
information
circulation
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911025891.5A
Other languages
Chinese (zh)
Other versions
CN110968321A (en
Inventor
梁云
郑思泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Original Assignee
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Institute of Information Technology AIIT of Peking University, Hangzhou Weiming Information Technology Co Ltd filed Critical Advanced Institute of Information Technology AIIT of Peking University
Priority to CN201911025891.5A priority Critical patent/CN110968321B/en
Publication of CN110968321A publication Critical patent/CN110968321A/en
Application granted granted Critical
Publication of CN110968321B publication Critical patent/CN110968321B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a tensor calculation code optimization method, a tensor calculation code optimization device, electronic equipment and a medium. Wherein the method comprises the following steps: and analyzing the cyclic characteristics and the computational graph characteristics of the tensor computation codes to obtain corresponding cyclic information and computational graph information, and generating an optimization space according to the cyclic information and the computational graph information and a preset optimization method, wherein each space point in the optimization space represents a preset optimization method combination and parameter selection. Based on a simulated annealing algorithm and a reinforcement learning algorithm, searching and determining a target space point in an optimization space, and optimizing the tensor calculation code according to a preset optimization method combination and parameter selection corresponding to the target space point, so that automatic optimization of the tensor calculation code can be completed quickly, the operating efficiency of the tensor calculation code is improved, human input of a development operator can be avoided for a programming developer, relatively good performance is obtained, cost can be reduced, and development efficiency is improved.

Description

Tensor calculation code optimization method, device, equipment and medium
Technical Field
The present application relates to the field of code optimization technology, and in particular, to a method and apparatus for optimizing tensor calculation code, an electronic device, and a computer readable medium.
Background
Code optimization refers to the process of transforming program codes on the premise of not changing the running result of the program so as to improve the running efficiency of the program. Code optimization may be performed at various stages of program compilation.
Tensors are multi-dimensional arrays that are hierarchically classified according to the order of the array elements, e.g., scalar quantities may be regarded as zero-order tensors, vectors as first-order tensors, matrices as second-order tensors, data stereo as third-order tensors, and so on.
Tensor calculation has important application in the fields of artificial intelligence, scientific calculation, image processing and the like. When code optimization is performed for tensor calculation, a unified optimization framework and a unified path are required to be provided, so that manual guidance is avoided, and the search of an optimal solution is automatically completed. Because of the large search space, poor search methods can result in very large time consumption (often exceeding 24 hours), making it challenging to implement an optimization scheme that can be completed in less than 24 hours, and ensuring the quality of the results of the optimization while reducing the time.
Therefore, how to implement code optimization for tensor calculation, and ensure the quality of the optimized result while shortening the time is a technical problem to be solved in the field.
Disclosure of Invention
The purpose of the application is to provide a tensor calculation code optimization method and device, an electronic device and a computer readable medium.
The first aspect of the present application provides a tensor calculation code optimization method, including:
analyzing the cyclic features and the computational graph features of the tensor computational codes to obtain corresponding cyclic information and computational graph information;
generating an optimization space according to the circulation information, the calculation map information and a preset optimization method; the optimization space consists of a plurality of space points, and each space point represents a preset optimization method combination and parameter selection;
searching and determining target space points in the optimized space based on a simulated annealing algorithm and a reinforcement learning algorithm;
and optimizing the tensor calculation code according to the preset optimization method combination and parameter selection corresponding to the target space point.
In some embodiments of the present application, the preset optimizing method includes: cyclic partitioning, cyclic reorganization, cyclic rearrangement, cyclic expansion, vectorization, parallelization.
In some embodiments of the present application, the analyzing the cyclic features and the computational graph features of the tensor computation code to obtain corresponding cyclic information and computational graph information includes:
carrying out grammar analysis on tensor calculation codes, and analyzing the generated abstract grammar tree into a calculation graph;
traversing the computation graph, and collecting connection information and circulation information of each computation node; the circulation information comprises circulation number, circulation occupied space, circulation sequence and circulation data dependence; the connection information includes the input number and the output number of the computation nodes, and the connection information of each computation node constitutes the computation graph information.
In some embodiments of the present application, the generating an optimization space according to the cycle information, the calculation map information, and the preset optimization method includes:
according to the circulation information and the calculation map information, enumerating preset optimization method combinations and parameter selections to form a basic optimization space;
based on pruning technology, selecting a plurality of preset optimization method combinations in the basic optimization space according to preset conditions, setting a parameter selection range, and generating an optimization space.
A second aspect of the present application provides a tensor calculation code optimization apparatus, including:
the static analysis module is used for analyzing the cycle characteristics and the calculation graph characteristics of the tensor calculation code to obtain corresponding cycle information and calculation graph information;
the optimization space generation module is used for generating an optimization space according to the circulation information, the calculation map information and a preset optimization method; the optimization space consists of a plurality of space points, and each space point represents a preset optimization method combination and parameter selection;
the optimization space searching module is used for searching and determining target space points in the optimization space based on a simulated annealing algorithm and a reinforcement learning algorithm;
and the optimization implementation module is used for implementing optimization on the tensor calculation codes according to the preset optimization method combination and parameter selection corresponding to the target space point.
In some embodiments of the present application, the preset optimizing method includes: cyclic partitioning, cyclic reorganization, cyclic rearrangement, cyclic expansion, vectorization, parallelization.
In some embodiments of the present application, the static analysis module is specifically configured to:
carrying out grammar analysis on tensor calculation codes, and analyzing the generated abstract grammar tree into a calculation graph;
traversing the computation graph, and collecting connection information and circulation information of each computation node; the circulation information comprises circulation number, circulation occupied space, circulation sequence and circulation data dependence; the connection information includes the input number and the output number of the computation nodes, and the connection information of each computation node constitutes the computation graph information.
In some embodiments of the present application, the optimization space generation module is specifically configured to:
according to the circulation information and the calculation map information, enumerating preset optimization method combinations and parameter selections to form a basic optimization space;
based on pruning technology, selecting a plurality of preset optimization method combinations in the basic optimization space according to preset conditions, setting a parameter selection range, and generating an optimization space.
A third aspect of the present application provides an electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the computer program to perform the method of the first aspect of the present application.
A fourth aspect of the present application provides a computer readable medium having stored thereon computer readable instructions executable by a processor to implement the method of the first aspect of the present application.
Compared with the prior art, the tensor calculation code optimization method, device, equipment and medium provided by the application are used for obtaining corresponding circulation information and calculation map information by analyzing the circulation characteristics and calculation map characteristics of tensor calculation codes, and generating an optimization space according to the circulation information and the calculation map information and a preset optimization method, wherein the optimization space consists of a plurality of space points, and each space point represents a preset optimization method combination and parameter selection. Based on a simulated annealing algorithm and a reinforcement learning algorithm, searching and determining a target space point in the optimization space, and optimizing the tensor calculation code according to a preset optimization method combination and parameter selection corresponding to the target space point, so that automatic optimization of the tensor calculation code can be completed quickly, the operating efficiency of the tensor calculation code is improved, human input of a development operator can be avoided for a programming developer, relatively good performance is obtained, cost can be reduced, and development efficiency is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 illustrates a flow chart of a tensor computing code optimization method provided by some embodiments of the present application;
FIG. 2 illustrates a schematic diagram of a tensor computing code optimization device provided by some embodiments of the present application;
FIG. 3 illustrates a schematic diagram of an electronic device provided by some embodiments of the present application;
fig. 4 illustrates a schematic diagram of a computer-readable medium provided by some embodiments of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.
In addition, the terms "first" and "second" etc. are used to distinguish different objects and are not used to describe a particular order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Embodiments of the present application provide a tensor calculation code optimization method and apparatus, an electronic device, and a computer readable medium, which are described below with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a tensor calculation code optimization method according to some embodiments of the present application is shown, where the tensor calculation code optimization method may include the following steps:
step S101: and analyzing the cyclic features and the computational graph features of the tensor computational code to obtain corresponding cyclic information and computational graph information.
The circulation information specifically comprises: the number of loops, the space occupied by loops, the order of loops, and the dependency of the loop data. The calculation map information specifically includes: the connection mode of each computing node in the computing graph and the input number and the output number of each computing node are calculated.
In some embodiments of the present application, the step S101 may be specifically implemented as:
carrying out grammar analysis on tensor calculation codes, and analyzing the generated abstract grammar tree into a calculation graph;
traversing the computation graph, and collecting connection information and circulation information of each computation node; the circulation information comprises circulation number, circulation occupied space, circulation sequence and circulation data dependence; the connection information includes the input number and the output number of the computation nodes, and the connection information of each computation node constitutes the computation graph information.
Specifically, traversing the computational graph, collecting multiple cycle information in each node, including the number of cycles, the size of the cycles (occupied space), the position (sequence) where the cycles are located, and whether there is a data dependency, where the data dependency is determined according to: only the loops appearing on the right in the tensor-computed einstein expression contain data dependencies, while other loops do not contain data dependencies, loops that do not contain data dependencies can be directly parallelized. The node connection mode of the computation graph and the input number and the output number of each computation node are used for judging whether the nodes can be combined or not. And finally, carrying out topological sorting on the whole calculation graph, and optimizing the nodes one by one according to the topological sorting order.
Step S102: generating an optimization space according to the circulation information, the calculation map information and a preset optimization method; the optimization space consists of a plurality of space points, and each space point represents a preset optimization method combination and parameter selection.
In some embodiments of the present application, the preset optimizing method may include: cyclic partitioning, cyclic reorganization, cyclic rearrangement, cyclic expansion, vectorization, parallelization.
In some embodiments of the present application, the step S102 may be specifically implemented as:
according to the circulation information and the calculation map information, enumerating preset optimization method combinations and parameter selections to form a basic optimization space;
based on pruning technology, selecting a plurality of preset optimization method combinations in the basic optimization space according to preset conditions, setting a parameter selection range, and generating an optimization space.
Specifically, according to the cycle information and the calculation map information acquired in step S101, different optimization method combinations and parameter selections are enumerated to form a basic optimization space. Alternative optimization methods include loop partitioning, loop reorganization, loop rearrangement, loop expansion, vectorization, parallelization, as described above, which improve parallelism and locality of the original computation. All optimization methods have different combinations and parameter settings, and these different choices constitute an overall optimization space, also called basic optimization space.
Pruning is carried out in the basic optimization space, unnecessary searching parts are removed according to optimization knowledge, and the searching task amount is reduced. The basic optimization space is extremely large, and common searches cannot be performed therein. The pruning technology is to determine a combination mode of partial optimization methods and limit parameter ranges to reduce search tasks, and specifically, only multiple blocks are considered for loop operation, and only the outermost loop is considered for parallelization loop. The parameters only consider enumeration within a limited range, and the specific range can be set according to actual conditions.
Step S103: searching and determining target space points in the optimized space based on a simulated annealing algorithm and a reinforcement learning algorithm.
Specifically, determining space points to be explored in each step in the search according to a simulated annealing algorithm; in the searching process, a large number of space points need to be explored, each space point needs to be explored, surrounding points need to be evaluated, and better reservation is selected. And selecting explored space points by adopting a heuristic method of simulated annealing.
For each spatial point to be explored, the exploration method is to search all possible directions, then obtain the adjacent points in the corresponding directions and evaluate whether the performance is better by using an optimizer. Since the search task is heavy due to excessive directions, a reinforcement learning method in machine learning is adopted, and the correct direction of the next step is predicted according to the current search state, so that only one direction needs to be explored, and the task amount is greatly reduced. The direction of each search is predicted using the Q-learning method in reinforcement learning.
Step S104: and optimizing the tensor calculation code according to the preset optimization method combination and parameter selection corresponding to the target space point.
Specifically, after the search result is obtained, the optimization scheme represented by the space point needs to be reversely interpreted. The specific method is that records are reserved when an optimized space is generated, the records are searched after target space points (performance optimal points) are searched, and the optimization method combination and parameter selection corresponding to the target space points are determined.
And according to the analyzed optimization method combination and parameter selection, modifying the original tensor calculation program into a new program, adding an optimization primitive and corresponding parameter setting into the new program, and finishing the modification of the program abstract syntax tree. Compiling and code generation are carried out by means of a deep learning compiler TVM tool according to the modified abstract syntax tree.
The tensor calculation code optimization method can be used for a client, and in the embodiment of the application, the client can comprise hardware or software. When the client comprises hardware, it may be a variety of electronic devices having a display screen and supporting information interaction, for example, may include, but is not limited to, smartphones, tablets, laptop and desktop computers, and the like. When the client includes software, it may be installed in the above-described electronic device, which may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module. The present invention is not particularly limited herein.
Compared with the prior art, the tensor calculation code optimization method provided by the embodiment of the application obtains corresponding circulation information and calculation map information by analyzing the circulation characteristics and the calculation map characteristics of the tensor calculation code, and generates an optimization space according to the circulation information and the calculation map information and a preset optimization method, wherein the optimization space consists of a plurality of space points, and each space point represents a preset optimization method combination and parameter selection. Based on a simulated annealing algorithm and a reinforcement learning algorithm, searching and determining a target space point in the optimization space, and optimizing the tensor calculation code according to a preset optimization method combination and parameter selection corresponding to the target space point, so that automatic optimization of the tensor calculation code can be completed quickly, the operating efficiency of the tensor calculation code is improved, human input of a development operator can be avoided for a programming developer, relatively good performance is obtained, cost can be reduced, and development efficiency is improved.
In the above embodiment, a tensor calculation code optimization method is provided, and correspondingly, the application also provides a tensor calculation code optimization device. The tensor calculation code optimization device provided by the embodiment of the application can implement the tensor calculation code optimization method, and the tensor calculation code optimization device can be implemented in a software, hardware or a combination of software and hardware. For example, the tensor computing code optimization device may include integrated or separate functional modules or units to perform the corresponding steps in the methods described above. Referring to fig. 2, a schematic diagram of a tensor calculation code optimization device according to some embodiments of the present application is shown. Since the apparatus embodiments are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
As shown in fig. 2, the tensor calculation code optimization device 10 may include:
the static analysis module 101 is configured to analyze the cycle characteristics and the calculation map characteristics of the tensor calculation code to obtain corresponding cycle information and calculation map information;
the optimization space generating module 102 is configured to generate an optimization space according to the cycle information, the calculation map information and a preset optimization method; the optimization space consists of a plurality of space points, and each space point represents a preset optimization method combination and parameter selection;
an optimization space searching module 103, configured to search and determine a target space point in the optimization space based on a simulated annealing algorithm and a reinforcement learning algorithm;
and the optimization implementation module 104 is configured to implement optimization on the tensor calculation code according to a preset optimization method combination and parameter selection corresponding to the target spatial point.
In some implementations of the embodiments of the present application, the preset optimizing method includes: cyclic partitioning, cyclic reorganization, cyclic rearrangement, cyclic expansion, vectorization, parallelization.
In some implementations of the embodiments of the present application, the static analysis module 101 is specifically configured to:
carrying out grammar analysis on tensor calculation codes, and analyzing the generated abstract grammar tree into a calculation graph;
traversing the computation graph, and collecting connection information and circulation information of each computation node; the circulation information comprises circulation number, circulation occupied space, circulation sequence and circulation data dependence; the connection information includes the input number and the output number of the computation nodes, and the connection information of each computation node constitutes the computation graph information.
In some implementations of the embodiments of the present application, the optimization space generation module 102 is specifically configured to:
according to the circulation information and the calculation map information, enumerating preset optimization method combinations and parameter selections to form a basic optimization space;
based on pruning technology, selecting a plurality of preset optimization method combinations in the basic optimization space according to preset conditions, setting a parameter selection range, and generating an optimization space.
The tensor calculation code optimization device 10 provided in the embodiment of the present application has the same beneficial effects as the tensor calculation code optimization method provided in the foregoing embodiment of the present application due to the same inventive concept.
The embodiment of the application also provides an electronic device corresponding to the tensor calculation code optimization method provided by the previous embodiment, where the electronic device may be an electronic device for a client, for example, a mobile phone, a notebook computer, a tablet computer, a desktop computer, etc., so as to execute the tensor calculation code optimization method.
Referring to fig. 3, a schematic diagram of an electronic device according to some embodiments of the present application is shown. As shown in fig. 3, the electronic device 20 includes: a processor 200, a memory 201, a bus 202 and a communication interface 203, the processor 200, the communication interface 203 and the memory 201 being connected by the bus 202; the memory 201 stores a computer program executable on the processor 200, and the processor 200 executes the tensor calculation code optimization method provided in any of the foregoing embodiments of the present application when the computer program is executed.
The memory 201 may include a high-speed random access memory (RAM: random AccessMemory), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 203 (which may be wired or wireless), the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
Bus 202 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. The memory 201 is configured to store a program, and the processor 200 executes the program after receiving an execution instruction, and the tensor calculation code optimization method disclosed in any of the foregoing embodiments of the present application may be applied to the processor 200 or implemented by the processor 200.
The processor 200 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 200 or by instructions in the form of software. The processor 200 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201, and in combination with its hardware, performs the steps of the above method.
The electronic device provided by the embodiment of the application and the tensor calculation code optimization method provided by the embodiment of the application are the same in the same invention conception, and have the same beneficial effects as the method adopted, operated or realized by the electronic device.
The present embodiment also provides a computer readable medium corresponding to the tensor calculation code optimization method provided in the foregoing embodiment, referring to fig. 4, the computer readable storage medium is shown as an optical disc 30, on which a computer program (i.e. a program product) is stored, where the computer program, when executed by a processor, performs the tensor calculation code optimization method provided in any of the foregoing embodiments.
It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.
The computer readable storage medium provided by the above embodiments of the present application and the tensor calculation code optimization method provided by the embodiments of the present application have the same advantageous effects as the method adopted, operated or implemented by the application program stored therein, for the same inventive concept.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the embodiments, and are intended to be included within the scope of the claims and description.

Claims (6)

1. A tensor calculation code optimization method, comprising:
analyzing the cyclic features and the computational graph features of the tensor computational codes to obtain corresponding cyclic information and computational graph information;
generating an optimization space according to the circulation information, the calculation map information and a preset optimization method; the optimization space consists of a plurality of space points, and each space point represents a preset optimization method combination and parameter selection; the preset optimizing method comprises the following steps: cyclic division, cyclic recombination, cyclic rearrangement, cyclic expansion, vectorization and parallelization;
searching and determining target space points in the optimized space based on a simulated annealing algorithm and a reinforcement learning algorithm;
optimizing the tensor calculation code according to the preset optimization method combination and parameter selection corresponding to the target space point;
the analyzing the cyclic characteristic and the computational graph characteristic of the tensor computation code to obtain corresponding cyclic information and computational graph information comprises the following steps:
carrying out grammar analysis on tensor calculation codes, and analyzing the generated abstract grammar tree into a calculation graph;
traversing the computation graph, and collecting connection information and circulation information of each computation node; the circulation information comprises circulation number, circulation occupied space, circulation sequence and circulation data dependence; the connection information includes the input number and the output number of the computation nodes, and the connection information of each computation node constitutes the computation graph information.
2. The method of claim 1, wherein generating an optimization space according to the loop information and the calculation map information and a preset optimization method comprises:
according to the circulation information and the calculation map information, enumerating preset optimization method combinations and parameter selections to form a basic optimization space;
based on pruning technology, selecting a plurality of preset optimization method combinations in the basic optimization space according to preset conditions, setting a parameter selection range, and generating an optimization space.
3. A tensor calculation code optimization apparatus, comprising:
the static analysis module is used for analyzing the cycle characteristics and the calculation graph characteristics of the tensor calculation code to obtain corresponding cycle information and calculation graph information;
the optimization space generation module is used for generating an optimization space according to the circulation information, the calculation map information and a preset optimization method; the optimization space consists of a plurality of space points, and each space point represents a preset optimization method combination and parameter selection; the preset optimizing method comprises the following steps: cyclic division, cyclic recombination, cyclic rearrangement, cyclic expansion, vectorization and parallelization;
the optimization space searching module is used for searching and determining target space points in the optimization space based on a simulated annealing algorithm and a reinforcement learning algorithm;
the optimization implementation module is used for implementing optimization on the tensor calculation codes according to the preset optimization method combination and parameter selection corresponding to the target space point;
the static analysis module is specifically configured to:
carrying out grammar analysis on tensor calculation codes, and analyzing the generated abstract grammar tree into a calculation graph;
traversing the computation graph, and collecting connection information and circulation information of each computation node; the circulation information comprises circulation number, circulation occupied space, circulation sequence and circulation data dependence; the connection information includes the input number and the output number of the computation nodes, and the connection information of each computation node constitutes the computation graph information.
4. The apparatus of claim 3, wherein the optimization space generation module is specifically configured to:
according to the circulation information and the calculation map information, enumerating preset optimization method combinations and parameter selections to form a basic optimization space;
based on pruning technology, selecting a plurality of preset optimization method combinations in the basic optimization space according to preset conditions, setting a parameter selection range, and generating an optimization space.
5. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor executes to implement the method according to any of claims 1 to 2 when the computer program is run.
6. A computer readable medium having stored thereon computer readable instructions executable by a processor to implement the method of any one of claims 1 to 2.
CN201911025891.5A 2019-10-25 2019-10-25 Tensor calculation code optimization method, device, equipment and medium Active CN110968321B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911025891.5A CN110968321B (en) 2019-10-25 2019-10-25 Tensor calculation code optimization method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911025891.5A CN110968321B (en) 2019-10-25 2019-10-25 Tensor calculation code optimization method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN110968321A CN110968321A (en) 2020-04-07
CN110968321B true CN110968321B (en) 2023-06-20

Family

ID=70029929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911025891.5A Active CN110968321B (en) 2019-10-25 2019-10-25 Tensor calculation code optimization method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN110968321B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114594954A (en) * 2020-12-07 2022-06-07 华为技术有限公司 Code optimization method and device, computing equipment and computer storage medium
CN112527272B (en) * 2020-12-25 2023-11-17 深圳云天励飞技术股份有限公司 Method for docking TVM (transient voltage management) and related equipment
CN114756444A (en) * 2021-01-08 2022-07-15 华为技术有限公司 Calculation graph optimization method and device
WO2022155967A1 (en) * 2021-01-25 2022-07-28 京东方科技集团股份有限公司 Method for detecting object in real-time by utilizing object real-time detection model and optimization method
CN112947908A (en) * 2021-02-26 2021-06-11 上海商汤智能科技有限公司 Code generation method, device, equipment and storage medium
CN113703768A (en) * 2021-07-13 2021-11-26 清华大学 Tensor program optimization method and device
CN113887396A (en) * 2021-09-29 2022-01-04 上海商汤智能科技有限公司 Image processing method and device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3293696A1 (en) * 2016-09-07 2018-03-14 Facebook, Inc. Similarity search using polysemous codes
DE202018100066U1 (en) * 2018-01-08 2018-04-10 Google Llc Loop and library fusion
CN108228187A (en) * 2018-01-02 2018-06-29 南京大学 A kind of global optimization method of mathematical program
DE102018110719A1 (en) * 2017-05-05 2018-11-08 Intel Corporation Hardware-implemented point-to-point communication primitive for machine learning
CN109145143A (en) * 2018-08-03 2019-01-04 厦门大学 Sequence constraints hash algorithm in image retrieval
CN110147236A (en) * 2019-04-30 2019-08-20 阿里巴巴集团控股有限公司 Code compiling method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180246848A1 (en) * 2015-02-10 2018-08-30 D-Wave Systems Inc. Systems, devices, articles, and methods for quantum processor architecture

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3293696A1 (en) * 2016-09-07 2018-03-14 Facebook, Inc. Similarity search using polysemous codes
DE102018110719A1 (en) * 2017-05-05 2018-11-08 Intel Corporation Hardware-implemented point-to-point communication primitive for machine learning
CN108228187A (en) * 2018-01-02 2018-06-29 南京大学 A kind of global optimization method of mathematical program
DE202018100066U1 (en) * 2018-01-08 2018-04-10 Google Llc Loop and library fusion
CN109145143A (en) * 2018-08-03 2019-01-04 厦门大学 Sequence constraints hash algorithm in image retrieval
CN110147236A (en) * 2019-04-30 2019-08-20 阿里巴巴集团控股有限公司 Code compiling method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐冰冰 ; 岑科廷 ; 黄俊杰 ; 沈华伟 ; 程学旗 ; .图卷积神经网络综述.计算机学报.(05),全文. *
欧海英 ; 张为华 ; 赵经成 ; 韩玉 ; .设计优化可视化研究综述.系统仿真学报.2008,(20),全文. *

Also Published As

Publication number Publication date
CN110968321A (en) 2020-04-07

Similar Documents

Publication Publication Date Title
CN110968321B (en) Tensor calculation code optimization method, device, equipment and medium
US11221834B2 (en) Method and system of intelligent iterative compiler optimizations based on static and dynamic feedback
JP4042604B2 (en) Program parallelization apparatus, program parallelization method, and program parallelization program
CN113703775B (en) Compiling method, compiling device, compiling equipment and storage medium
US10394694B2 (en) Unexplored branch search in hybrid fuzz testing of software binaries
US9081586B2 (en) Systems and methods for customizing optimization/transformation/ processing strategies
CN113283613B (en) Deep learning model generation method, optimization method, device, equipment and medium
CN103514025A (en) OPENCL compilation
US9195444B2 (en) Compiler method and compiler apparatus for optimizing a code by transforming a code to another code including a parallel processing instruction
Wahib et al. Automated GPU kernel transformations in large-scale production stencil applications
Gysi et al. Absinthe: Learning an analytical performance model to fuse and tile stencil codes in one shot
Ivanenko et al. TuningGenie: auto-tuning framework based on rewriting rules
US11262989B2 (en) Automatic generation of efficient vector code with low overhead in a time-efficient manner independent of vector width
US20130232471A1 (en) Method and Apparatus for Assessing Software Parallelization
EP2677423B1 (en) OpenCL compilation
Giaquinta et al. Maximum convex subgraphs under I/O constraint for automatic identification of custom instructions
Campos et al. On data parallelism code restructuring for HLS targeting FPGAs
US11573777B2 (en) Method and apparatus for enabling autonomous acceleration of dataflow AI applications
CN113296788A (en) Instruction scheduling method, apparatus, device, storage medium and program product
US10241764B2 (en) Automatically transform pass-by-value semantics into pass-by-reference implementation
Kato et al. Loop fusion with outer loop shifting for high-level synthesis
Ye et al. HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis
Nobre et al. Beyond Polyhedral Analysis of OpenStream Programs
Balasa et al. Signal assignment model for the memory management of multidimensional signal processing applications
Xu et al. AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 101, building 1, block C, Qianjiang Century Park, ningwei street, Xiaoshan District, Hangzhou City, Zhejiang Province

Applicant after: Hangzhou Weiming Information Technology Co.,Ltd.

Applicant after: Institute of Information Technology, Zhejiang Peking University

Address before: Room 288-1, 857 Xinbei Road, Ningwei Town, Xiaoshan District, Hangzhou City, Zhejiang Province

Applicant before: Institute of Information Technology, Zhejiang Peking University

Applicant before: Hangzhou Weiming Information Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20200407

Assignee: Zhejiang Visual Intelligence Innovation Center Co.,Ltd.

Assignor: Institute of Information Technology, Zhejiang Peking University|Hangzhou Weiming Information Technology Co.,Ltd.

Contract record no.: X2023330000927

Denomination of invention: Tensor calculation code optimization methods, devices, equipment, and media

Granted publication date: 20230620

License type: Common License

Record date: 20231219