CN113296780A - Processing method, device and equipment of calculation graph - Google Patents

Processing method, device and equipment of calculation graph Download PDF

Info

Publication number
CN113296780A
CN113296780A CN202011282533.5A CN202011282533A CN113296780A CN 113296780 A CN113296780 A CN 113296780A CN 202011282533 A CN202011282533 A CN 202011282533A CN 113296780 A CN113296780 A CN 113296780A
Authority
CN
China
Prior art keywords
operator
optimization
graph
mapping information
tensor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011282533.5A
Other languages
Chinese (zh)
Inventor
姜霄棠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202011282533.5A priority Critical patent/CN113296780A/en
Publication of CN113296780A publication Critical patent/CN113296780A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the application provides a method, a device and equipment for processing a computation graph, wherein the method comprises the following steps: obtaining a calculation graph, wherein the calculation graph comprises a non-optimization operator; and converting the non-optimization operator in the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator. The optimization capability for the computational graph is improved.

Description

Processing method, device and equipment of calculation graph
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for processing a computation graph.
Background
Currently, algorithmic models developed by developers can be abstracted into computational graph representations, and computational graphs can be deployed onto hardware platforms according to an inference framework.
The inference framework can optimize the computational graph so that the computational graph can be efficiently executed on a hardware platform. Generally, considering that the number of operators and the number of hardware platforms are large, in order to reduce the cost of writing the optimized code, the inference framework writes the optimized code under different hardware platforms for common operators and does not write the optimized code for uncommon operators. Therefore, when the calculation graph is executed, the optimization code of the common operator under the current hardware platform can be called for the common operator in the calculation graph, and only the basic code of the uncommon operator can be called for the uncommon operator in the calculation graph.
Therefore, the above method has a problem that the optimization capability of the computation graph is weak.
Disclosure of Invention
The embodiment of the application provides a method, a device and equipment for processing a computation graph, which are used for solving the problem that the optimization capability of the computation graph in the prior art is weak.
In a first aspect, an embodiment of the present application provides a method for processing a computation graph, including:
obtaining a calculation graph, wherein the calculation graph comprises a non-optimization operator;
and converting the non-optimization operator in the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator.
In a second aspect, an embodiment of the present application provides a method for processing a computation graph, including:
obtaining a calculation graph, wherein the calculation graph comprises a non-optimization operator;
converting the non-optimization operator in the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator;
and sequentially calling operators in the optimization graph corresponding to the optimization code of the current hardware platform.
In a third aspect, an embodiment of the present application provides a processing apparatus for calculating a graph, including:
the acquisition module is used for acquiring a calculation graph, and the calculation graph comprises a non-optimization operator;
and the conversion module is used for converting the last non-optimization operator of the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator.
In a fourth aspect, an embodiment of the present application provides a method for processing a computation graph, including:
the acquisition module is used for acquiring a calculation graph, and the calculation graph comprises a non-optimization operator;
the conversion module is used for converting the non-optimization operator in the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator;
and the calling module is used for calling the optimization codes of the operators in the optimization graph corresponding to the current hardware platform in sequence.
In a fifth aspect, an embodiment of the present application provides a computing device, including: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the method of any of the first aspects.
In a sixth aspect, an embodiment of the present application provides a computing device, including: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the method of any of the first aspects.
Embodiments of the present application also provide a computer-readable storage medium storing a computer program, the computer program comprising at least one code, which is executable by a computer to control the computer to perform the method according to any one of the first aspect.
Embodiments of the present application also provide a computer-readable storage medium storing a computer program, the computer program comprising at least one code, which is executable by a computer to control the computer to perform the method according to any one of the second aspect.
Embodiments of the present application also provide a computer program, which is used to implement the method according to any one of the first aspect when the computer program is executed by a computer.
Embodiments of the present application also provide a computer program, which is used to implement the method according to any one of the second aspect when the computer program is executed by a computer.
According to the method, the device and the equipment for processing the calculation graph, the calculation graph containing the non-optimization operators is obtained, the non-optimization operators in the calculation graph are converted to obtain the converted optimization graph, the operators in the optimization graph are the optimization operators, the function that the optimization operators with optimization codes complete the non-optimization operators without the optimization codes in the calculation graph is achieved by using the optimization operators with the optimization codes, the operator range of the calculation graph capable of achieving optimization of the operator codes based on the pre-written optimization codes is expanded, and therefore the optimization capacity of the calculation graph is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic view of an application scenario according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a processing method of a computation graph according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a computation graph according to an embodiment of the present application;
FIG. 4A is a schematic diagram of a computation graph according to another embodiment of the present application;
FIG. 4B is a schematic diagram of a process for the computation graph shown in FIG. 4A according to an embodiment of the present application;
FIG. 4C provides an optimization map obtained by the process of FIG. 4B according to an embodiment of the present application;
FIG. 5A is a schematic diagram of a computational graph according to yet another embodiment of the present application;
FIG. 5B is a schematic diagram of a process for the computation graph of FIG. 5A according to an embodiment of the present application;
FIG. 5C provides an optimization map obtained by the process of FIG. 5B according to an embodiment of the present application;
fig. 6 is a schematic flowchart of a processing method of a computation graph according to another embodiment of the present application;
fig. 7 is a schematic structural diagram of a processing device for calculating a graph according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a computing device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a processing device for calculating a graph according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a computing device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work according to the embodiments of the present application are within the scope of the present application.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a" and "an" typically include at least two, but do not exclude the presence of at least one.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.
In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.
For the convenience of those skilled in the art to understand the technical solutions provided in the embodiments of the present application, a technical environment for implementing the technical solutions is described below.
The method for processing a commonly used computational graph in the related art mainly comprises the steps of writing optimization codes under different hardware platforms aiming at commonly used operators, not writing the optimization codes aiming at commonly used operators, calling the optimization codes of the commonly used operators under the current hardware platform when the commonly used operators in the computational graph need to be called, and calling only the basic codes of the commonly used operators when the commonly used operators in the computational graph need to be called, so that the problem of weak optimization capability of the computational graph exists, and therefore a processing mode of the computational graph capable of improving the optimization capability of the computational graph is urgently needed in the related art.
In addition, although a compilation-based inference framework-Tensor Virtual Machine (TVM) is also proposed in the related art to automatically generate corresponding optimized codes for all operators in the computation graph at the back end of the compiler, compared with the manually written optimized codes, the automatically generated optimized codes cannot implement deep customization optimization, and thus, there is a problem that the optimization capability of the computation graph is limited.
According to the actual technical requirements similar to those described above, the processing method of the computational graph provided by the application can improve the optimization capability of the computational graph by using a technical means.
The following describes a processing method of a computation graph provided in various embodiments of the present application in detail through an exemplary application scenario.
As shown in fig. 1, the application scenario may include a server 11 and a computing device 12. In one embodiment, The computing device 12 may include a terminal, such as a mobile phone, a tablet, a Personal Computer (PC) Internet of Things (The Internet of Things, IoT) device, and The like, and in another embodiment, The computing device 12 may include any form of data processing server, such as a cloud server, a distributed server, and The like.
In one embodiment, the inference framework may be deployed in the computing device 12. In this case, the server 11 may include any form of data storage server such as a cloud server and a distributed server, and the server 11 may store the computation graph therein. The computing device 12 may obtain the computation graph from the server 11 by interacting with the server 11, where the computation graph includes a non-optimization operator, which refers to an operator where no pre-written optimization code exists. In contrast, the optimization operator refers to an operator with a pre-written optimization code, that is, the optimization code of the optimization operator corresponding to each hardware platform is pre-written based on the interface function provided by each hardware platform.
The hardware platform may be, for example, an AMD-type Central Processing Unit (CPU) that supports a certain SSE instruction set, and the interface function provided by the hardware platform is the SSE instruction set, and the optimization code is optimization code implemented based on the SSE instruction set. For another example, the hardware platform may be an intel-type CPU supporting an AVX instruction set, the interface function provided by the hardware platform is the AVX instruction set, and the optimization code is the optimization code implemented by the AVX instruction set. For another example, the hardware platform may be a Graphics Processing Unit (GPU) supporting an Open Computing Language (OpenCL), an interface function provided by the GPU is an OpenCL interface function, and the optimization code is an optimization code implemented based on the OpenCL interface function. Of course, in other embodiments, the hardware platform may be of other types, which is not limited in this application.
After the calculation graph is obtained, the computing device 12 may convert the non-optimization operator in the calculation graph to obtain a converted optimization graph, where the operator in the optimization graph is an optimization operator. It can be understood that the optimization graph is a new computation graph obtained by converting the non-optimization operators in the original computation graph. Further, the computing device 12 may sequentially invoke optimization codes of operators in the optimization graph corresponding to the current hardware platform to implement executing the computation graph.
In another embodiment, the inference framework may be deployed in the server 11. In this case, the server 11 may include any form of data processing server such as a cloud server, a distributed server, and the like. The server 11 may complete conversion from the computation graph to the optimization graph, specifically, the server 11 may obtain the computation graph, where the computation graph includes a non-optimization operator, and convert the non-optimization operator in the computation graph to obtain the converted optimization graph, where the operator in the optimization graph is an optimization operator. Computing device 12 may retrieve the optimization map from server 11 by interacting with server 11. Further, the computing device 12 may sequentially invoke optimization codes of operators in the optimization graph corresponding to the current hardware platform to implement executing the optimization graph.
By way of example, the types of algorithms supported by the inference framework in the embodiments of the present application may include deep learning related algorithms and image processing related algorithms. The deep learning related algorithm may be, for example, a face recognition algorithm, an image classification algorithm, a speech recognition algorithm, an instance segmentation algorithm, a target detection algorithm, and the like. The image processing related algorithm may be, for example, a blurring algorithm, a filter algorithm, an edge detection algorithm, or the like. Correspondingly, the computation graph in the embodiment of the present application may be a computation graph of a deep learning related algorithm or an image processing related algorithm, for example, the computation graph may be a computation graph of a face recognition algorithm, a computation graph of an image classification algorithm, a computation graph of a speech recognition algorithm, a computation graph of an example segmentation algorithm, a computation graph of a target detection algorithm, a computation graph of a fuzzy algorithm, a computation graph of a filter algorithm, a computation graph of an edge detection algorithm, and the like.
Optionally, when the method provided in the embodiment of the present application is applied to the internet of things device, considering that the difference between the computing capabilities of different types of internet of things devices is large, the method provided in the embodiment of the present application has a certain requirement on the computing capability, so before the internet of things device side converts the non-optimal operator in the computation graph, it may be determined whether the computing capability of the device itself meets a preset requirement, if so, the non-optimal operator of the computation graph may be converted to obtain a converted optimization graph and execute the optimization graph, otherwise, the non-optimal operator of the computation graph may not be converted but the computation graph may be directly executed. Therefore, the problem that the internet of things equipment crashes due to the fact that the computing power of the internet of things equipment cannot meet the requirement of executing the method provided by the embodiment of the application on the computing power can be solved.
In the embodiment of the application, the optimization graph with the operator as the optimization operator is obtained by converting the non-optimization operator in the calculation graph, so that the function of completing the non-optimization operator without the optimization code in the calculation graph by using the optimization operator with the optimization code is realized, the operator range in the calculation graph, which can realize the optimization of the operator code based on the pre-written optimization code, is expanded, and the optimization capability of the calculation graph is improved.
In addition, since the optimization performance of the pre-written optimization code is better than that of the automatically generated optimization code, the optimization capability of the computation graph can be improved compared with the case that the corresponding optimization code is automatically generated for all operators in the computation graph at the back end of the compiler.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Fig. 2 is a flowchart illustrating a processing method of a computation graph according to an embodiment of the present application, where an execution subject of the embodiment may be the server 11 or the computing device 12 in fig. 1. As shown in fig. 2, the method of this embodiment may include:
step 21, obtaining a calculation graph, wherein the calculation graph comprises a non-optimization operator;
and step 22, converting the non-optimization operator in the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator.
The server 11 may read the computation graph from the local storage space, for example, and the terminal 12 may obtain the computation graph from the server 11, for example, and of course, in other embodiments, the computation graph may also be obtained in other manners, which is not limited in this application.
In one embodiment, all operators in the computation graph are non-optimization operators, and in another embodiment, a part of operators in the computation graph are non-optimization operators, and another part of operators are optimization operators.
The computation graph may be, for example, as shown in fig. 3, the nodes in fig. 3 may correspond to tensors, the edges may correspond to operators, the tensors include A, B, C, D, E and F, and the operators include operator 1, operator 2, operator 3, operator 4, and operator 5. Tensor A is the input tensor of operator 1, tensor B is the output tensor of operator 1 and the input tensor of operator 2, tensor C is the output tensor of operator 2 and the input tensor of operator 3, tensor D is the output tensor of operator 3 and the input tensor of operator 4, tensor E is the output tensor of operator 4 and the input tensor of operator 5, and tensor F is the output tensor of operator 5. The calculation chart shown in fig. 3 is an example only.
After the computation graph containing the non-optimization operator is obtained, the non-optimization operator in the computation graph may be converted to obtain a converted optimization graph.
In the embodiment of the application, the non-optimization operator can comprise a first non-optimization operator for changing element coordinates and keeping element values unchanged, and/or a second non-optimization operator for changing the element coordinates and the element values. The first non-optimization operator may include, for example, a Transpose (Transpose) operator, a join (Concat) operator, a Slice extraction (Slice) operator, and the like. The second non-optimal operator may include, for example, a Convolution (Convolution) operator, a Long Short-Term Memory artificial nerve (LSTM) operator, a Local Response Normalization (LRN) operator, and the like.
In the embodiment of the present application, when the non-optimization operator is converted, the coordinate mapping involved in the non-optimization operator is normalized, and the normalization process includes calculation of mapping information and processing based on the mapping information.
Based on this, in an embodiment, when the computation graph includes the first non-optimization operator, the converting the non-optimization operator in the computation graph to obtain the converted optimization graph may specifically include the following steps 221 and 222.
Step 221, calculating first mapping information of the first non-optimization operator in the computation graph; wherein the first non-optimized operator is a non-optimized operator for changing element coordinates while element values remain unchanged, and the first mapping information is used to characterize a coordinate mapping relationship between an input tensor and an output tensor of the first non-optimized operator;
step 222, according to the first mapping information, converting the first non-optimization operator in the computation graph to obtain a converted optimization graph.
In the embodiment of the application, the specific form of the first mapping information can be flexibly realized according to requirements. In one embodiment, the first mapping information may include range sub information, first location sub information, and second location sub information; wherein the range sub-information is used to characterize a coordinate range of the input tensor of the first non-optimized operator; the first position sub-information is used for representing the corresponding position of an element of any coordinate in the coordinate range when the input tensor of the first non-optimized operator is regarded as a target vector; the second position sub-information is used for representing the corresponding position of the element of any coordinate in the coordinate range when the output tensor of the first non-optimized operator is regarded as the target vector.
The target vector may be a row vector or a column vector. The target vector may be a row vector when the memory is accessed in a row bit unit, and the target vector may be a column vector when the memory is accessed in a column unit, so that continuous memory can be read during calculation, which is beneficial to saving time for accessing the memory.
Taking the example that the first mapping information is used for characterizing the coordinate mapping relationship for 3 dimensions (for example, X dimension, Y dimension, and Z dimension), the first mapping information may be represented by a Region structure:
Figure BDA0002781251970000091
Figure BDA0002781251970000101
wherein size [3] represents range sub-information, View src represents first position sub-information, View dst represents second position sub-information, and size [3], View src and Viewdst can be obtained by calculating aiming at a first non-optimized operator. For any coordinate element in the range sub-information represented by size [3], its corresponding position p in the target vector may satisfy the following formula (1).
P ═ x × stride [0] + y × stride [1] + z × stride [2] formula (1)
Exemplary, size [0]]The size (size) of the input tensor in one dimension (e.g., the Z dimension) can be represented, size 1]The size of the input tensor in another dimension, e.g., the Y dimension, can be expressed, size 2]The size of the input tensor in yet another dimension, such as the X dimension, can be expressed. Note that, in practical applications, size [3]]The type of the characterized coordinate range space may be a cube, a triangular pyramid, or the like, and the cube space may be described as 0, for example<=x<size[2],0<=y<size[1],0<=z<size[0]The triangular pyramid space can be described as {0 }, for example<=x<size[2],0<=y<size[1],0<=z<size[0]And is
Figure BDA0002781251970000102
}. Where X may represent coordinates in the X dimension, Y may represent coordinates in the Y axis, and Z may represent coordinates in the Z axis.
It should be noted that, in other embodiments, the first mapping information may be used to characterize coordinate mapping relationships for other numbers of dimensions. The larger the number of dimensions is, the more complicated the implementation of element replication based on the first mapping information is; the smaller the number of dimensions is, the more the number of first mapping information required for realizing the coordinate mapping relationship capable of representing more dimensional degrees is, and therefore, the compromise between the two can be synthesized, and in the embodiment of the present application, 3 dimensions are taken as an example.
For example 1, assuming that the first non-optimized operator is a Concat operator, and the input tensors of the Concat operator are a [2,4] and B [2,4], and the output tensors are C [2,8], the calculated first mapping information of the Concat operator may include Region0 and Region1, Region0 may be used to represent a coordinate mapping relationship from the input tensor a [2,4] to the output tensor C [2,8], and Region1 may be used to represent a coordinate mapping relationship from the input tensor B [2,4] to the output tensor C [2,8 ]. Wherein src.offset is 0, src.stride [3] = {0,4,1}, dst.offset is 0, dst.stride [3] } {0,8,1}, and size [3] } {1,2,4 }; src.offset is 0, src.stride [3] = {0,4,1}, dst.offset is 4, dst.stride [3] } {0,8,1}, and size [3] } {1,2,4}, in Region 1. size [3] ═ {1,2,4} can indicate that z is fixed to 0, y ranges from 0< ═ y <2, and x ranges from 0< ═ x < 4.
Based on Region0 and equation (1), it can be calculated:
the element of the coordinates (x is 0) in the input tensor a [2,4], the corresponding position when the input tensor a [2,4] is regarded as the target vector is 0 × 1+0 × 4+0+0 (where the first 0 denotes x is 0, the second 0 denotes y is 0, the third 0 denotes src.stride [0] of Region0 is 0, the fourth 0 denotes src.fset of Region0 is 0,4 denotes src.stride [1] 4 of Region0, 1 denotes src.stride [2] of Region0 is 1), the corresponding position when the output tensor a target vector is regarded as the output tensor a [2,8] is 0 × 1+0 × 8+0+0 is 0 (where the first 0 denotes x is 0, the second 0 denotes x is 0, the third 0 denotes x is 0, and the fourth 0 denotes 0, 3 denotes 0, 2 denotes 0, 3 denotes 0, 2 denotes 0, 3 denotes a fourth 0, 3 denotes a range of Region, 3 denotes a Region of Region, 3, and 3 denotes a Region of a target vector;
an element of coordinates (x is 0, y is 1) in the input tensor a [2,4], a corresponding position when the input tensor a [2,4] is regarded as a target vector is 1 × 1+0 × 4+0+0 ═ 1, and a corresponding position when the output tensor C [2,8] is regarded as a target vector is 1 × 1+0 × 8+0+0 ═ 1;
an element of coordinates (x is 0, y is 2) in the input tensor a [2,4], a corresponding position when the input tensor a [2,4] is regarded as a target vector is 2 × 1+0 × 4+0+0 is 2, and a corresponding position when the output tensor C [2,8] is regarded as a target vector is 2 × 1+0 × 8+0+0 is 2;
an element of coordinates (x is 0, y is 3) in the input tensor a [2,4], a corresponding position when the input tensor a [2,4] is regarded as a target vector is 3 × 1+0 × 4+0+0 ═ 3, and a corresponding position when the output tensor C [2,8] is regarded as a target vector is 3 × 1+0 × 8+0+0 ═ 3;
an element of coordinates (x is 1, y is 0) in the input tensor a [2,4], a corresponding position when the input tensor a [2,4] is regarded as a target vector is 0 × 1+1 × 4+0+0 is 4, and a corresponding position when the output tensor C [2,8] is regarded as a target vector is 0 × 1+1 × 8+0+0 is 8;
an element of coordinates (x is 1, y is 1) in the input tensor a [2,4], a corresponding position when the input tensor a [2,4] is regarded as a target vector is 1 × 1+1 × 4+0+0 is 5, and a corresponding position when the output tensor C [2,8] is regarded as a target vector is 1 × 1+1 × 8+0+0 is 9;
an element of coordinates (x is 1, x is 2) in the input tensor a [2,4], a corresponding position when the input tensor a [2,4] is regarded as a target vector is 2 × 1+1 × 4+0+0 is 6, and a corresponding position when the output tensor C [2,8] is regarded as a target vector is 2 × 1+1 × 8+0+0 is 10;
the element of the coordinate (x is 1, y is 3) in the input tensor a [2,4], the corresponding position when the input tensor a [2,4] is regarded as the target vector is 3 × 1+1 × 4+0+0 is 7, and the corresponding position when the output tensor C [2,8] is regarded as the target vector is 3 × 1+1 × 8+0+0 is 11.
Based on Region1 and equation (1), it can be calculated:
coordinates (x is 0, y is 0) in the input tensor B [2,4], a corresponding position when the input tensor B [2,4] is regarded as the target vector is 0 × 1+0 × 4+0+0 (where the first 0 denotes x is 0, the second 0 denotes y is 0, the third 0 denotes src.stride [0] of Region0, the fourth 0 denotes src.offset 0 of Region0, 4 denotes src.stride [1] 4 of Region0, 1 denotes src.stride [2] of Region0 is 1), a corresponding position when the output tensor C [2,8] is regarded as the target vector is 0 × 1+0 × 8+0+4 (where the first 0 denotes x 0, the second 0 denotes y 0, the third 0 denotes y is 0, and the corresponding position when the output tensor C [2,8] is regarded as the target vector is 0 × 1+ 0+4 (where x is 0, y is 0, the third 0+0 is 0 (where the first 0) when the output tensor C [2,8] is 0, 2] denotes 0, 3] of Region0, and the output tensor 3 is 0.3878);
coordinates (x is 0, y is 1) in the input tensor B [2,4], a corresponding position when the input tensor B [2,4] is regarded as the target vector is 1 × 1+0 × 4+0+ 1, and a corresponding position when the output tensor C [2,8] is regarded as the target vector is 1 × 1+0 × 8+0+4 is 5;
coordinates (x is 0, y is 2) in the input tensor B [2,4], a corresponding position when the input tensor B [2,4] is regarded as the target vector is 2 × 1+0 × 4+0+ 2, and a corresponding position when the output tensor C [2,8] is regarded as the target vector is 2 × 1+0 × 8+0+4 is 6;
coordinates (x is 0, y is 3) in the input tensor B [2,4], a corresponding position when the input tensor B [2,4] is regarded as the target vector is 3 × 1+0 × 4+0+ 3, and a corresponding position when the output tensor C [2,8] is regarded as the target vector is 3 × 1+0 × 8+0+4 is 7;
coordinates (x is 1, y is 0) in the input tensor B [2,4], a corresponding position when the input tensor B [2,4] is regarded as the target vector is 0 × 1+1 × 4+0+0 is 4, and a corresponding position when the output tensor C [2,8] is regarded as the target vector is 0 × 1+1 × 8+0+4 is 12;
coordinates (x is 1, y is 1) in the input tensor B [2,4], a corresponding position when the input tensor B [2,4] is regarded as the target vector is 1 × 1+1 × 4+0+0 is 5, and a corresponding position when the output tensor C [2,8] is regarded as the target vector is 1 × 1+1 × 8+0+4 is 13;
coordinates (x is 1, y is 2) in the input tensor B [2,4], a corresponding position when the input tensor B [2,4] is regarded as the target vector is 2 × 1+1 × 4+0+0 is 6, and a corresponding position when the output tensor C [2,8] is regarded as the target vector is 2 × 1+1 × 8+0+4 is 14;
the coordinates (x is 1, y is 3) in the input tensor B [2,4] correspond to a target vector at a position of 3 × 1+1 × 4+0+0 of 7, and when the output tensor C [2,8] is regarded as the target vector, the corresponding position is 3 × 1+1 × 8+0+4 of 15.
It can be seen that the coordinate mapping relationship that Region0+ Region1 can represent is consistent with the coordinate mapping relationship that would result from the actual execution of the Concat operator if tensors a [2,4] and B [2,4] were the inputs to the Concat operator and tensor C [2,8] were the outputs of the Concat operator.
Optionally, when the computation graph includes a plurality of first non-optimal operators connected in series, after the first mapping information of the plurality of first non-optimal operators is obtained through computation, the first mapping information of the plurality of first non-optimal operators may be merged to obtain merged first mapping information, where the merged first mapping information is used to characterize a coordinate mapping relationship between an input tensor of a first non-optimal operator of the plurality of first non-optimal operators and an output tensor of a last first non-optimal operator of the plurality of first non-optimal operators. Taking an example that an operator 2, an operator 3, and an operator 4 in fig. 3 are all first non-optimal operators, the operator 2, the operator 3, and the operator 4 are multiple first non-optimal operators connected in series, after merging first mapping information of the operator 2, the operator 3, and the operator 4, the obtained merged first mapping information is used for representing a coordinate mapping relationship between an input tensor of the operator 2 and an output tensor of the operator 4.
Based on this, step 222 may specifically include: merging the first mapping information of the plurality of first non-optimization operators to obtain merged first mapping information; and converting the plurality of first non-optimization operators in the calculation graph according to the merged first mapping information to obtain a converted optimization graph. The first mapping information of the plurality of first non-optimization operators connected in series is merged to obtain the merged first mapping information, and the plurality of first non-optimization operators in the calculation graph are converted according to the merged first mapping information, so that the optimization graph can be simplified, and the optimization capability can be further improved.
In another embodiment, when the computation graph includes the second non-optimization operator, the converting the non-optimization operator in the computation graph to obtain the converted optimization graph may specifically include the following steps 223 to 225.
Step 223, disassembling the second non-optimized operator in the calculation graph into a combination of the first non-optimized operator and the basic operator; wherein the second non-optimization operator is a non-optimization operator for changing element coordinates and element values, the first non-optimization operator is a non-optimization operator for changing element coordinates while element values remain unchanged, and the base operator includes a core operator for changing element values while element coordinates remain unchanged;
step 224, calculating first mapping information of the first non-optimized operator, where the first mapping information is used to characterize a coordinate mapping relationship between an input tensor and an output tensor of the first non-optimized operator;
step 225, according to the first mapping information, converting the first non-optimization operator in the calculation graph to obtain a converted optimization graph.
In the embodiment of the present application, the optimized code is obtained by writing on the basis of the basic operator, that is, the basic operator is a basis for writing the optimized code. The basic operator includes a core operator, considering that any operator for changing element coordinates and element values can be decomposed into a first non-optimized operator and a core operator for changing element values while element coordinates remain unchanged. The core operator may be, for example, a data type conversion (Cast) operator, a summation (Add) operator, or the like.
Optionally, the base operator may further include: common operators for changing element values and element coordinates. The common operator may be flexibly implemented according to a requirement, and the common operator may be, for example, a matrix multiplication (MatMul) operator, a pooling (Pool) operator, a Resize (Resize) operator, or the like. Through basic operator still includes operator commonly used can avoid disassembling operator commonly used, is favorable to reducing the operator quantity of disassembling.
Taking the second non-optimized operator as a convolution (Conv) operator and the base operator including the common operator MatMul as an example, the convolution operator may be decomposed into a combination of an IndexMapping operator and a MatMul operator based on the Im2Col algorithm, where the IndexMapping operator may be understood as the first non-optimized operator.
It should be noted that, the specific manner of calculating the first mapping information in step 224 is similar to that in step 221, and is not described herein again. The step 223 and the step 224 can realize that the second non-optimization operator in the computation graph is disassembled, and the coordinate mapping part is represented by the mapping information. It will be appreciated that after the second non-optimal operator in the computation graph is decomposed into a combination of the first non-optimal operator and the base operator, the second non-optimal operator in the computation graph is replaced with a combination of the first non-optimal operator and the base operator.
In an embodiment, a corresponding optimization code may be written for the base operator, in which case, the optimization operator in the embodiment of the present application may include the base operator. Based on this, the converting the first non-optimized operator in the computation graph according to the first mapping information may specifically include: based on the first mapping information, converting the first non-optimization in the computation graph into a preset operator, where the preset operator is used to copy an element in an input tensor of the first non-optimization operator to an output tensor according to the first mapping information. In this case, the optimization operator further includes the preset operator.
Converting the first non-optimal operator in the computation graph into a preset operator based on the first mapping information, so as to copy the elements in the input tensor of the first non-optimal operator to the output tensor by the preset operator based on the first mapping information.
For example, on the basis of the foregoing example 1, the Concat operator may be converted into a preset operator, so that the preset operator may be used to calculate the following formula (1) according to Region 0: calculating elements of coordinates (x is 0 and y is 0) in an input tensor a [2,4], setting a corresponding position when the input tensor a [2,4] is regarded as a target vector to 0, setting a corresponding position when the output tensor C [2,8] is regarded as a target vector to 0, and copying the element of the position 0 when the input tensor a [2,4] is regarded as a target vector to the position 0 when the output tensor C [2,8] is regarded as a target vector; … …, respectively; calculating coordinates (x is 0 and y is 0) in an input tensor B [2,4], setting a corresponding position of the input tensor B [2,4] as a target vector to 0, setting a corresponding position of the output tensor C [2,8] as a target vector to 4, and copying an element of the position 0 of the input tensor B [2,4] as a target vector to a position 4 of the output tensor C [2,8] as a target vector; … … are provided.
Assuming that operator 1 in the calculation diagram shown in fig. 3 is a data type conversion (Cast) operator, operators 2 and 3 are Transpose (Transpose) operators, operator 4 is a segment extraction (Slice) operator, operator 5 is a pooling (Pool) operator, and tensor a is specifically input tensor a1, the calculation diagram shown in fig. 4A can be obtained. In one embodiment, the processing of the computation graph of FIG. 4A may be as shown in FIG. 4B. In fig. 4A and 4B, the two transpose operators and the segment extraction operator are the first non-optimization operator, the data type conversion operator is the core operator, the pooling operator is the common operator, and a1, B1, C1, D1, E1, and F1 are tensors.
Referring to fig. 4B, in the first stage, first mapping information of a transpose operator with an input tensor of B1 and an output tensor of C1 (i.e., first mapping information B1 → C1) is calculated, first mapping information of a transpose operator with an input tensor of C1 and an output tensor of D1 (i.e., first mapping information C1 → D1) is calculated, and first mapping information of a segment extraction operator with an input tensor of D1 and an output tensor of E1 (i.e., first mapping information D1 → E1) is calculated. It should be noted that, since the data type conversion is a core operator that does not need to calculate mapping information, the data type conversion operator in the computation graph may not be processed in the first stage, and since the pooling operator is a common operator, the pooling operator may not be processed in the first stage.
Then, in the second stage, the calculated first mapping information B1 → C1, the first mapping information C1 → D1 and the first mapping information D1 → E1 may be merged to obtain the merged first mapping information (i.e., the first mapping information B1 → E1).
Finally, in a third stage, based on the first mapping information, two transpose operators and one segment in the computation graph are extracted and converted into a preset operator, wherein B1 is used as an input tensor of the preset operator, and E1 is used as an output tensor of the preset operator.
Thus, the computation graph shown in fig. 4A can obtain the optimization graph shown in fig. 4C after going through the processing procedure shown in fig. 4B. The data type conversion operator, the preset operator and the pooling operator in fig. 4C are optimization operators.
It should be noted that the processing shown in fig. 4B is only an example. It can be understood that, in the case that the computation graph includes the second non-optimal operator, the second non-optimal operator may also be decomposed into a combination of the first non-optimal operator and the base operator in the first stage, and the first mapping information of the decomposed first non-optimal operator may be calculated. In the case where the calculation map does not include the plurality of first non-optimization operators connected in series, the second stage of processing may not be performed.
In another embodiment, the corresponding optimization code may be written for the rewrite operator of the basic operator, in which case, the optimization operator in this embodiment may include the rewrite operator of the basic operator. The rewriting operator of a basic operator can acquire data from the input tensor and store the data in the output tensor according to the mapping information. Therefore, the coordinate mapping process required by the mapping information and the original operation of the basic operator can be fused into one step, the element copying process is omitted, the memory consumption caused by the multiple element copying processes is saved, the memory moving amount is reduced, and the performance is improved.
Take the example of the basic operator, the exponential (exp) operator, as follows:
the input tensor is: a (w, h) output tensor: b (w, h)
For y in range(0,h):
For x in range(0,w):
B[y*w+x]=exp(A[y*w+x])
The rewrite operator for the exponential operator may be as follows:
the input tensor is: a (arbitrary size) output tensor: b (arbitrary size)
Figure BDA0002781251970000171
Wherein, size [3] is range sub-information in the mapping information and is used for representing the coordinate range of the input tensor A; the View _ A is first position sub-information in the mapping information and is used for representing the corresponding position of an element of any coordinate in a coordinate range when the input tensor A is regarded as a target vector; view _ B is second position sub-information in the mapping information, and is used for representing a corresponding position of an element of any coordinate in a coordinate range when the output tensor B is regarded as a target vector; af is the corresponding position of the element of the coordinate (x, y) of the input tensor A calculated based on the View _ A and the formula (1) when the input tensor A is regarded as the target vector; bf is the corresponding position of the element of the input tensor a coordinate (x, y) calculated based on View _ B and equation (1) when the output tensor B is regarded as the target vector.
Based on this, the converting the first non-optimized operator in the computation graph according to the first mapping information may specifically include: determining a first base operator in the computed graph, the first base operator being a base operator for which an input tensor in the computed graph is an output tensor of the first non-optimized operator; based on the first mapping information, the first non-optimization operator and the first basic operator in the computation graph are converted into a rewriting operator of the first basic operator together, and the rewriting operator can fetch data from an input tensor and store the data in an output tensor according to the first mapping information. Thus, the first non-optimization operator + first basis can be converted to a rewrite operator in which a pre-written optimization code exists.
Thereby, an optimization graph consisting of a second basic operator and a rewriting operator of the basic operator can be obtained, wherein the second basic operator is the basic operator except the first basic operator in the calculation graph. Optionally, in order to reduce the number of operators for writing the optimized code as much as possible, the second basic operator may also be converted into a rewrite operator of the second basic operator, so that the optimized code may be written only for the rewrite operator of the basic operator.
Based on this, the converting the non-optimization operator in the computation graph to obtain the converted optimization graph may further include: converting a second basis in the calculation graph into a rewriting operator of the second basis operator based on second mapping information, wherein the rewriting operator can acquire data from an input tensor and store the data in an output tensor according to the second mapping information; wherein the second basic operator is a basic operator in the computation graph except for the first basic operator, and the second mapping information is used for representing a coordinate mapping relation between the input tensor of the second basic operator and the second basic operator. Assuming that the input tensor of the second basic operator is a [2,4], the second mapping information may be, for example, Region2, src.offset is 0 in Region2, src.stride [3] (0, 4, 1), dst.offset is 0, dst.stride [3] (0, 4, 1), and size [3] (1, 2, 4).
Optionally, in order to reduce redundant computation and improve performance, when computing mapping information of an operator, element usage of a subsequent operator of the operator may be considered. Taking the example that the computation graph includes a summation operator and a segment extraction operator, assuming that the input tensors of the summation operator are tensor a3 and tensor B3, the output tensor is tensor C3, the sizes of the three tensors are 512 × 512, the input tensor of the segment extraction operator is tensor C3, the output tensor is tensor D3, and the size of tensor D3 is 256 × 521, and the segment extraction operator is used to extract an element with an x coordinate of 256 to 215 from the input tensor, since the segment extraction operator does not need to use an element with an x coordinate of 0 to 255 in the input tensor C3, the mapping information for the summation operator may only consider the element with an x coordinate of 256 to 215, and ignore the element with an x coordinate of 0 to 255.
Assuming that operator 1 in the calculation graph shown in fig. 3 is a data type conversion (Cast) operator, operators 2 and 3 are Transpose (Transpose) operators, operator 4 is a segment extraction (Slice) operator, operator 5 is a pooling (Pool) operator, and tensor a is specifically input tensor a2, the calculation graph shown in fig. 5A can be obtained, and in another embodiment, the processing procedure of the calculation graph shown in fig. 5A can be shown in fig. 5B. In fig. 5A and 5B, the two transpose operators and the segment extraction operator are the first non-optimization operator, the data type conversion operator is the core operator, the pooling operator is the common operator, and a2, B2, C2, D2, E2, and F2 are tensors.
Referring to fig. 5B, in the first stage, first mapping information of a transpose operator with an input tensor of B2 and an output tensor of C2 (i.e., first mapping information B2 → C2) is calculated, first mapping information of a transpose operator with an input tensor of C2 and an output tensor of D2 (i.e., first mapping information C2 → D2) is calculated, and first mapping information of a segment extraction operator with an input tensor of D2 and an output tensor of E2 (i.e., first mapping information D2 → E2) is calculated. It should be noted that, since the data type conversion is a core operator that does not need to calculate mapping information, the data type conversion operator in the computation graph may not be processed in the first stage, and since the pooling operator is a common operator, the pooling operator may not be processed in the first stage.
Then, in the second stage, the calculated first mapping information B2 → C2, the first mapping information C2 → D2 and the first mapping information D2 → E2 may be merged to obtain the merged first mapping information (i.e., the first mapping information B2 → E2).
Finally, in the third phase, the data type conversion may be converted into a rewrite operator for the data type conversion operator (denoted as data type conversion 'operator) according to the second mapping information a2 → a2, and the pooling operator may be converted into a rewrite operator for the pooling operator (denoted as pooling' operator) according to the first mapping information B2 → E2. It is understood that the data type conversion operator in fig. 5B is the aforementioned second basic operator, and the pooling operator is the aforementioned first basic operator.
Thus, the computation graph shown in fig. 5A can obtain the optimization graph shown in fig. 5C after going through the processing procedure shown in fig. 5B. The data type conversion 'operator and pooling' operator in fig. 5C are optimization operators.
It should be noted that the processing shown in fig. 5B is only an example. It can be understood that, in the case that the computation graph includes the second non-optimal operator, the second non-optimal operator may also be decomposed into a combination of the first non-optimal operator and the base operator in the first stage, and the first mapping information of the decomposed first non-optimal operator may be calculated. In the case where the calculation map does not include the plurality of first non-optimization operators connected in series, the second stage of processing may not be performed.
According to the processing method of the calculation graph, the calculation graph containing the non-optimization operators is obtained, the non-optimization operators in the calculation graph are converted to obtain the converted optimization graph, the operators in the optimization graph are optimization operators, the function that the optimization operators with optimization codes complete the non-optimization operators without the optimization codes in the calculation graph is achieved, the operator range of the calculation graph capable of achieving optimization of the operator codes based on the pre-written optimization codes is expanded, and therefore the optimization capacity of the calculation graph is improved.
Fig. 6 is a flowchart illustrating a processing method of a computation graph according to another embodiment of the present application, where an execution subject of the present embodiment may be the computing device 12 in fig. 1. As shown in fig. 6, the method of this embodiment may include:
step 61, obtaining a calculation graph, wherein the calculation graph comprises a non-optimization operator;
step 62, converting the non-optimization operator in the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator;
and 63, sequentially calling the optimization codes of the operators in the optimization graph corresponding to the current hardware platform.
It should be noted that step 62 is similar to step 22, and is not described herein again.
After the transformed optimization graph is obtained, the optimization graph may be executed, and specifically, the optimization codes of the current hardware platform corresponding to the operators in the optimization graph may be sequentially called.
For example, for the optimization graph shown in FIG. 4C: firstly, taking tensor A1 as input, calling a data type conversion operator to obtain tensor B1 corresponding to an optimized code of a current hardware platform; then, taking tensor B1 and first mapping information B1 → E1 as input, calling an optimized code of a preset operator corresponding to the current hardware platform to obtain tensor E1; finally, the tensor E1 can be used as an input, the pooling operator is called to correspond to the optimized code of the current hardware platform, and the tensor F1 is obtained.
As another example, for the optimization graph shown in FIG. 5C: firstly, taking tensor A2 as input, calling a data type conversion operator to correspond to an optimized code of a current hardware platform, and obtaining tensor B2; then, the optimized code corresponding to the current hardware platform for the pooling' operator may be invoked with tensor B2 and first mapping information B2 → E2 as inputs, resulting in tensor F2.
In an embodiment, when a computation graph needs to be loaded, the non-optimization operator in the computation graph may be converted to obtain a converted optimization graph, and the optimization graph may be loaded after the optimization graph is obtained. In other embodiments, the non-optimal operator in the computational graph may also be transformed by other conditions triggering the computing device.
According to the processing method of the computation graph, the computation graph containing the non-optimization operator is obtained, the non-optimization operator in the computation graph is converted to obtain the optimization graph with the converted operator as the optimization operator, the function that the optimization operator with the optimization code is used for completing the non-optimization operator without the optimization code in the computation graph is achieved, the optimization code of the optimization operator is called to complete the execution of the computation graph, and the execution efficiency of the computation graph is improved.
Fig. 7 is a schematic structural diagram of a processing device for calculating a graph according to an embodiment of the present application; referring to fig. 7, the present embodiment provides a processing apparatus for a computation graph, which may perform the processing method for a computation graph according to the embodiment shown in fig. 2, and specifically, the processing apparatus for a computation graph may include:
an obtaining module 71, configured to obtain a computation graph, where the computation graph includes a non-optimization operator;
a conversion module 72, configured to convert the last non-optimization operator of the computation graph to obtain a converted optimization graph, where an operator in the optimization graph is an optimization operator.
Optionally, the conversion module 72 is specifically configured to:
calculating first mapping information of a first non-optimal operator in the computation graph if the first non-optimal operator is included in the computation graph; wherein the first non-optimized operator is a non-optimized operator for changing element coordinates while element values remain unchanged, and the first mapping information is used to characterize a coordinate mapping relationship between an input tensor and an output tensor of the first non-optimized operator;
and converting the first non-optimization operator in the calculation graph according to the first mapping information to obtain a converted optimization graph.
Optionally, the computation graph includes a plurality of the first non-optimization operators connected in series;
the conversion module 72 is configured to convert the first non-optimization operator in the computation graph according to the first mapping information to obtain a converted optimization graph, and specifically includes:
merging the first mapping information of the first non-optimal operators to obtain merged first mapping information, wherein the merged first mapping information is used for representing a coordinate mapping relation between an input tensor of a first non-optimal operator and an output tensor of a last first non-optimal operator in the first non-optimal operators; and converting the plurality of first non-optimization operators in the calculation graph according to the merged first mapping information to obtain a converted optimization graph.
Optionally, the conversion module 72 is specifically configured to:
in the case that a second non-optimization operator is included in the calculation graph, decomposing the second non-optimization operator in the calculation graph into a combination of the first non-optimization operator and the basic operator; wherein the second non-optimization operator is a non-optimization operator for changing element coordinates and element values, the first non-optimization operator is a non-optimization operator for changing element coordinates while element values remain unchanged, and the base operator includes a core operator for changing element values while element coordinates remain unchanged; calculating first mapping information of the first non-optimized operator, wherein the first mapping information is used for representing a coordinate mapping relation between an input tensor and an output tensor of the first non-optimized operator; and converting the first non-optimization operator in the calculation graph according to the first mapping information to obtain a converted optimization graph.
Optionally, the first mapping information includes range sub information, first location sub information, and second location sub information;
wherein the range sub-information is used to characterize a coordinate range of the input tensor of the first non-optimized operator; the first position sub-information is used for representing the corresponding position of an element of any coordinate in the coordinate range when the input tensor of the first non-optimized operator is regarded as a target vector; the second position sub-information is used for representing the corresponding position of the element of any coordinate in the coordinate range when the output tensor of the first non-optimized operator is regarded as the target vector.
Optionally, when the mode of accessing the memory is in a row bit unit, the target vector is a row vector; when the memory is accessed in a column unit, the target vector is a column vector.
Optionally, the converting module 72 is configured to convert the first non-optimization operator in the computation graph according to the first mapping information, and specifically includes:
based on the first mapping information, converting the first non-optimization in the computation graph into a preset operator, wherein the preset operator is used for copying elements in an input tensor of the first non-optimization operator to an output tensor according to the first mapping information; wherein the optimization operator comprises the preset operator and a base operator, and the base operator comprises a core operator for changing element values while element coordinates remain unchanged.
Optionally, the base operator further includes: common operators for changing element values and element coordinates.
Optionally, the converting module 72 is configured to convert the first non-optimization operator in the computation graph according to the first mapping information, and specifically includes:
determining a first base operator in the computed graph, the first base operator being a base operator for which an input tensor in the computed graph is an output tensor of the first non-optimized operator; converting the first non-optimization operator and the first basic operator in the computation graph into a rewriting operator of the first basic operator based on the first mapping information, wherein the rewriting operator can take data from an input tensor and store the data in an output tensor according to the first mapping information; wherein the optimization operator is a rewrite operator of a base operator, the base operator including a core operator for changing an element value while an element coordinate remains unchanged.
Optionally, the base operator further includes: common operators for changing element values and element coordinates.
Optionally, the conversion module 72 is further configured to:
converting a second basis in the calculation graph into a rewriting operator of the second basis operator based on second mapping information, wherein the rewriting operator can acquire data from an input tensor and store the data in an output tensor according to the second mapping information; wherein the second basic operator is a basic operator in the computation graph except for the first basic operator, and the second mapping information is used for representing a coordinate mapping relation between the input tensor of the second basic operator and the second basic operator.
The apparatus shown in fig. 7 can perform the method of the embodiment shown in fig. 2, and reference may be made to the related description of the embodiment shown in fig. 2 for a part of this embodiment that is not described in detail. The implementation process and technical effect of the technical solution refer to the description in the embodiment shown in fig. 2, and are not described herein again.
In one possible implementation, the structure of the processing means of the computation graph shown in fig. 7 may be implemented as a computing device. As shown in fig. 8, the computing device may include: a processor 81 and a memory 82. Wherein the memory 82 is used for storing a program for supporting a computing device to execute the processing method of the computation graph provided in the embodiment shown in fig. 2, and the processor 81 is configured to execute the program stored in the memory 82.
The program comprises one or more computer instructions which, when executed by the processor 81, are capable of performing the steps of:
obtaining a calculation graph, wherein the calculation graph comprises a non-optimization operator;
and converting the last non-optimization operator of the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator.
Optionally, the processor 81 is further configured to perform all or part of the steps in the foregoing embodiment shown in fig. 2.
The computing device may also include a communication interface 83 for the computing device to communicate with other devices or a communication network.
Fig. 9 is a schematic structural diagram of a processing device for calculating a map according to another embodiment of the present application; referring to fig. 9, the present embodiment provides a processing apparatus for a computation graph, which may perform the processing method for a computation graph according to the embodiment shown in fig. 6, and specifically, the processing apparatus for a computation graph may include:
an obtaining module 91, configured to obtain a computation graph, where the computation graph includes a non-optimization operator;
a conversion module 92, configured to convert the non-optimization operator in the computation graph to obtain a converted optimization graph, where an operator in the optimization graph is an optimization operator;
and the calling module 93 is configured to sequentially call the optimization codes of the operators in the optimization graph, where the operators correspond to the current hardware platform.
The apparatus shown in fig. 9 can execute the method of the embodiment shown in fig. 6, and reference may be made to the related description of the embodiment shown in fig. 6 for a part of this embodiment that is not described in detail. The implementation process and technical effect of the technical solution refer to the description in the embodiment shown in fig. 6, and are not described herein again.
In one possible implementation, the structure of the processing apparatus of the computation graph shown in fig. 9 may be implemented as a computing device. As shown in fig. 10, the computing device may include: a processor 101 and a memory 102. Wherein the memory 102 is used for storing a program for supporting a computing device to execute the processing method of the computation graph provided in the embodiment shown in fig. 6, and the processor 101 is configured to execute the program stored in the memory 102.
The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the processor 101, are capable of performing the steps of:
obtaining a calculation graph, wherein the calculation graph comprises a non-optimization operator;
converting the non-optimization operator in the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator;
and sequentially calling operators in the optimization graph corresponding to the optimization code of the current hardware platform.
Optionally, the processor 101 is further configured to perform all or part of the steps in the foregoing embodiment shown in fig. 6.
The computing device may further include a communication interface 103 for the computing device to communicate with other devices or a communication network.
In addition, the present application provides a computer storage medium for storing computer software instructions for a computing device, which includes a program for executing the method embodiment shown in fig. 2.
Embodiments of the present application provide a computer storage medium for storing computer software instructions for a computing device, which includes a program for executing the method embodiments shown in fig. 6.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With the understanding that the above-described aspects may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein (including but not limited to disk storage, CD-ROM, optical storage, etc.).
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (16)

1. A method of processing a computation graph, comprising:
obtaining a calculation graph, wherein the calculation graph comprises a non-optimization operator;
and converting the non-optimization operator in the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator.
2. The method of claim 1, wherein transforming the non-optimal operator in the computation graph to obtain a transformed optimization graph comprises:
calculating first mapping information of a first non-optimal operator in the computation graph if the first non-optimal operator is included in the computation graph; wherein the first non-optimized operator is a non-optimized operator for changing element coordinates while element values remain unchanged, and the first mapping information is used to characterize a coordinate mapping relationship between an input tensor and an output tensor of the first non-optimized operator;
and converting the first non-optimization operator in the calculation graph according to the first mapping information to obtain a converted optimization graph.
3. The method of claim 2, wherein the computational graph comprises a plurality of the first non-optimal operators in series;
the converting the first non-optimization operator in the computation graph according to the first mapping information to obtain a converted optimization graph includes:
merging the first mapping information of the first non-optimal operators to obtain merged first mapping information, wherein the merged first mapping information is used for representing a coordinate mapping relation between an input tensor of a first non-optimal operator and an output tensor of a last first non-optimal operator in the first non-optimal operators;
and converting the plurality of first non-optimization operators in the calculation graph according to the merged first mapping information to obtain a converted optimization graph.
4. The method of claim 1, wherein transforming the non-optimal operator in the computation graph to obtain a transformed optimization graph comprises:
in the case that a second non-optimization operator is included in the calculation graph, decomposing the second non-optimization operator in the calculation graph into a combination of the first non-optimization operator and the basic operator; wherein the second non-optimization operator is a non-optimization operator for changing element coordinates and element values, the first non-optimization operator is a non-optimization operator for changing element coordinates while element values remain unchanged, and the base operator includes a core operator for changing element values while element coordinates remain unchanged;
calculating first mapping information of the first non-optimized operator, wherein the first mapping information is used for representing a coordinate mapping relation between an input tensor and an output tensor of the first non-optimized operator;
and converting the first non-optimization operator in the calculation graph according to the first mapping information to obtain a converted optimization graph.
5. The method according to any of claims 2-4, wherein the first mapping information comprises range sub-information, first location sub-information and second location sub-information;
wherein the range sub-information is used to characterize a coordinate range of the input tensor of the first non-optimized operator; the first position sub-information is used for representing the corresponding position of an element of any coordinate in the coordinate range when the input tensor of the first non-optimized operator is regarded as a target vector; the second position sub-information is used for representing the corresponding position of the element of any coordinate in the coordinate range when the output tensor of the first non-optimized operator is regarded as the target vector.
6. The method of claim 5, wherein when the memory is accessed in units of column bits, the target vector is a column vector; when the memory is accessed in a column unit, the target vector is a column vector.
7. The method according to claim 2 or 4, wherein said converting the first non-optimized operator in the computation graph according to the first mapping information comprises:
based on the first mapping information, converting the first non-optimization in the computation graph into a preset operator, wherein the preset operator is used for copying elements in an input tensor of the first non-optimization operator to an output tensor according to the first mapping information;
wherein the optimization operator comprises the preset operator and a base operator, and the base operator comprises a core operator for changing element values while element coordinates remain unchanged.
8. The method of claim 7, wherein the base operator further comprises: common operators for changing element values and element coordinates.
9. The method according to claim 2 or 4, wherein said converting the first non-optimized operator in the computation graph according to the first mapping information comprises:
determining a first base operator in the computed graph, the first base operator being a base operator for which an input tensor in the computed graph is an output tensor of the first non-optimized operator;
converting the first non-optimization operator and the first basic operator in the computation graph into a rewriting operator of the first basic operator based on the first mapping information, wherein the rewriting operator can take data from an input tensor and store the data in an output tensor according to the first mapping information;
wherein the optimization operator is a rewrite operator of a base operator, the base operator including a core operator for changing an element value while an element coordinate remains unchanged.
10. The method of claim 9, wherein the base operator further comprises: common operators for changing element values and element coordinates.
11. The method of claim 9, wherein transforming the non-optimal operator in the computation graph to obtain a transformed optimization graph further comprises:
converting a second basis in the calculation graph into a rewriting operator of the second basis operator based on second mapping information, wherein the rewriting operator can acquire data from an input tensor and store the data in an output tensor according to the second mapping information; wherein the second basic operator is a basic operator in the computation graph except for the first basic operator, and the second mapping information is used for representing a coordinate mapping relation between the input tensor of the second basic operator and the second basic operator.
12. A method of processing a computation graph, comprising:
obtaining a calculation graph, wherein the calculation graph comprises a non-optimization operator;
converting the non-optimization operator in the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator;
and sequentially calling operators in the optimization graph corresponding to the optimization code of the current hardware platform.
13. A processing apparatus that calculates a graph, comprising:
the acquisition module is used for acquiring a calculation graph, and the calculation graph comprises a non-optimization operator;
and the conversion module is used for converting the last non-optimization operator of the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator.
14. A method of processing a computation graph, comprising:
the acquisition module is used for acquiring a calculation graph, and the calculation graph comprises a non-optimization operator;
the conversion module is used for converting the non-optimization operator in the calculation graph to obtain a converted optimization graph, wherein the operator in the optimization graph is an optimization operator;
and the calling module is used for calling the optimization codes of the operators in the optimization graph corresponding to the current hardware platform in sequence.
15. A computing device, comprising: a memory, a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the method of any of claims 1 to 11.
16. A computing device, comprising: a memory, a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the method of claim 12.
CN202011282533.5A 2020-11-16 2020-11-16 Processing method, device and equipment of calculation graph Pending CN113296780A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011282533.5A CN113296780A (en) 2020-11-16 2020-11-16 Processing method, device and equipment of calculation graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011282533.5A CN113296780A (en) 2020-11-16 2020-11-16 Processing method, device and equipment of calculation graph

Publications (1)

Publication Number Publication Date
CN113296780A true CN113296780A (en) 2021-08-24

Family

ID=77318439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011282533.5A Pending CN113296780A (en) 2020-11-16 2020-11-16 Processing method, device and equipment of calculation graph

Country Status (1)

Country Link
CN (1) CN113296780A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003306A (en) * 2021-10-27 2022-02-01 上海商汤科技开发有限公司 Video memory optimization method, device, equipment and storage medium
CN116258178A (en) * 2023-03-24 2023-06-13 美的集团(上海)有限公司 Model conversion method, device, electronic equipment and readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003306A (en) * 2021-10-27 2022-02-01 上海商汤科技开发有限公司 Video memory optimization method, device, equipment and storage medium
CN114003306B (en) * 2021-10-27 2024-03-15 上海商汤科技开发有限公司 Video memory optimization method, device, equipment and storage medium
CN116258178A (en) * 2023-03-24 2023-06-13 美的集团(上海)有限公司 Model conversion method, device, electronic equipment and readable storage medium
CN116258178B (en) * 2023-03-24 2023-09-22 美的集团(上海)有限公司 Model conversion method, device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
EP3612990B1 (en) Power-efficient deep neural network module configured for layer and operation fencing and dependency management
US11977388B2 (en) Quantizing autoencoders in a neural network
EP4036810A1 (en) Neural network processing method and apparatus, computer device and storage medium
CN111240640B (en) Data quantization method and device based on hardware environment and readable storage medium
EP3356932B1 (en) System and method for using ubershader variants without preprocessing macros
CN113296780A (en) Processing method, device and equipment of calculation graph
EP3137993B1 (en) Combining compute tasks for a graphics processing unit
EP3678037A1 (en) Neural network generator
CN111125617A (en) Data processing method, data processing device, computer equipment and storage medium
US9280382B1 (en) Parallel processing of multidimensional arrays
CN114968612A (en) Data processing method, system and related equipment
US11615306B2 (en) Statically generated compiled representations for processing data in neural networks
US9135065B1 (en) Parallel processing of multidimensional arrays
CN115129460A (en) Method and device for acquiring operator hardware time, computer equipment and storage medium
CN111860824A (en) Data processing method and related product
US20230196093A1 (en) Neural network processing
KR102592346B1 (en) Apparatus and method for image segmentation
WO2018170815A1 (en) Methods, systems and apparatus to improve deep learning resource efficiency
CN114219091A (en) Network model reasoning acceleration method, device, equipment and storage medium
US9519671B1 (en) Folding pair of adjacent indices based on optimum quantity of induces for parallel processing
CN112668659A (en) Model training method, platform and electronic equipment
CN113869517A (en) Inference method based on deep learning model
EP3540695A1 (en) Method for transfer of a style of a reference visual object to another visual object, and corresponding electronic device, computer readable program products and computer readable storage medium
CN116755714B (en) Method, device, equipment and storage medium for operating deep neural network model
CN111860825A (en) Data processing method and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210824

RJ01 Rejection of invention patent application after publication