WO2021052391A1 - 一种构建中间表达的方法、编译器和服务器 - Google Patents

一种构建中间表达的方法、编译器和服务器 Download PDF

Info

Publication number
WO2021052391A1
WO2021052391A1 PCT/CN2020/115759 CN2020115759W WO2021052391A1 WO 2021052391 A1 WO2021052391 A1 WO 2021052391A1 CN 2020115759 W CN2020115759 W CN 2020115759W WO 2021052391 A1 WO2021052391 A1 WO 2021052391A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage location
tensor
data
information
migration
Prior art date
Application number
PCT/CN2020/115759
Other languages
English (en)
French (fr)
Inventor
耿臻
狄鹏
淡孝强
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201911271859.5A external-priority patent/CN112527305A/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20865946.6A priority Critical patent/EP4024202A4/en
Publication of WO2021052391A1 publication Critical patent/WO2021052391A1/zh
Priority to US17/697,305 priority patent/US11789709B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code

Definitions

  • This application relates to the field of electronic technology, and in particular to a method, compiler and server for constructing intermediate expressions.
  • DSA domain-specific architecture
  • the embodiment of the application provides a method for constructing an intermediate expression, including:
  • the first IR includes a calculation sentence
  • the calculation sentence includes a tensor and an operator, wherein the operation represented by the operator is performed based on a calculation unit, and the calculation unit is used to sequentially pass Perform operations on the data transferred from the first storage location and the second storage location, and the tensor is the data used to execute the operations represented by the operator;
  • a second IR is generated based on the calculation sentence, the second IR includes first data migration information and data segmentation information, and the first data migration information indicates that the first migration path of the tensor includes the first migration path of the tensor.
  • the first data migration information indicates that the first migration path of the tensor includes the first migration path of the tensor. From the storage location to the second storage location, there may be multiple data transmission paths between the first storage location and the second storage location, and some data transmission paths may not have other storage locations, such as the first storage location described below.
  • the data transmission path between a storage location and the second storage location is directly connected; some data paths may have other storage locations, such as the third storage location mentioned below.
  • the data segmentation information indicates the size of the tensor when the tensor is migrated to the second storage location.
  • the calculation unit is further configured to perform calculations on data passing through a second migration path, and the second migration path includes At least one third storage location, to the second storage location, the first migration path and the second migration path are different data migration paths.
  • the at least one third storage location indicates a storage location on the on-chip cache.
  • the method further includes:
  • the first data flow information corresponding to the calculation sentence is generated based on the calculation sentence, the first data flow information includes second data migration information of the tensor, and the second data migration information indicates that the tensor is
  • the first storage location is sequentially transferred to the second storage location through at least one third transfer location;
  • second data flow information is generated, where the second data flow information includes the first data migration information.
  • the existence of a direct data transmission path between the first storage location and the second storage location can be understood as: there is a storage location that does not pass through any other storage location between the first storage location and the second storage location. Data transmission path.
  • the tensor is used to migrate to the computing unit through a first storage location, at least one third storage location, and a second storage location in sequence.
  • the tensor is used to migrate to the computing unit through at least one fourth storage location, a first storage location, and a second storage location in sequence.
  • the tensor is used to sequentially migrate to the computing unit through at least one fourth storage location, a first storage location, at least one third storage location, and a second storage location.
  • the tensor is used to sequentially migrate to the computing unit through at least one fourth storage location, a first storage location, at least one third storage location, a second storage location, and at least one fifth storage location.
  • the first storage location indicates a storage location on the on-chip cache
  • the second storage location indicates a storage location on the on-chip cache
  • the first storage location indicates a memory
  • the second storage location indicates a storage location on the on-chip cache.
  • the tensor includes multiple dimensions, and each dimension corresponds to an axis variable, and the axis variable is used to represent the size of the tensor in the corresponding dimension, and
  • the data segmentation information includes a plurality of axis variables and a segmentation axis variable corresponding to each axis variable, and the segmentation axis variable is used to represent the tensor size of the corresponding dimension when the tensor is migrated.
  • the second IR further includes at least one target variable and a value range of each target variable, and the axis variable is linear with the at least one target variable.
  • the axis variable represents the tensor size of the corresponding dimension through the at least one target variable and the value range of the at least one target variable.
  • the second IR is an IR in a tree structure, the tree structure includes a root node and child nodes, the root node corresponds to the calculation sentence, and the The child node corresponds to the second storage location, wherein the second IR includes the information of the child node corresponding to the second storage location, and the information of the child node corresponding to the second storage location includes all The first data migration information and the data segmentation information.
  • the second IR further includes: data movement information, where the data movement information indicates that the tensor is moved from the first storage location to the The second storage location.
  • the method further includes:
  • the first storage location and the second storage location are determined according to the type of the calculation unit.
  • this application provides a compiler, including:
  • the acquiring unit is configured to acquire a first intermediate expression IR, the first IR includes a calculation sentence, the calculation sentence includes a tensor and an operator, wherein the operation represented by the operator is performed based on a calculation unit, the calculation unit Configured to perform operations on data migrated through the first storage location and the second storage location in turn, where the tensor is the data used to execute the operations represented by the operator;
  • the processing unit is configured to generate a second IR based on the calculation sentence, the second IR includes first data migration information and data segmentation information, and the first data migration information indicates that the first migration path of the tensor includes From the first storage location to the second storage location, the data segmentation information indicates the size of the tensor when the tensor is migrated to the second storage location.
  • the calculation unit is further configured to perform calculations on data passing through a second migration path, and the second migration path includes At least one third storage location, to the second storage location, the first migration path and the second migration path are different data migration paths.
  • the at least one third storage location indicates a storage location on the on-chip cache.
  • the processing module is further configured to: generate first data flow information corresponding to the calculation sentence based on the calculation sentence, where the first data flow information includes Second data migration information of the tensor, where the second data migration information indicates that the tensor is sequentially migrated from the first storage location to the second storage location through at least one third migration location;
  • second data flow information is generated, where the second data flow information includes the first data migration information.
  • the first storage location indicates a storage location on the on-chip cache
  • the second storage location indicates a storage location on the on-chip cache
  • the first storage location indicates a memory
  • the second storage location indicates a storage location on the on-chip cache.
  • the tensor includes multiple dimensions, and each dimension corresponds to an axis variable, and the axis variable is used to represent the size of the tensor of the corresponding dimension, and
  • the data segmentation information includes a plurality of axis variables and a segmentation axis variable corresponding to each axis variable, and the segmentation axis variable is used to represent the tensor size of the corresponding dimension when the tensor is migrated.
  • the second IR further includes at least one target variable, and a value range of each target variable
  • the axis variable is linear with the at least one target variable.
  • the axis variable represents the tensor size of the corresponding dimension through the at least one target variable and the value range of the at least one target variable.
  • the second IR is an IR in a tree structure, the tree structure includes a root node and child nodes, the root node corresponds to the calculation sentence, and the The child node corresponds to the second storage location, wherein the second IR includes the information of the child node corresponding to the second storage location, and the information of the child node corresponding to the second storage location includes all The first data migration information and the data segmentation information.
  • the second IR further includes: data movement information, where the data movement information indicates that the tensor is moved from the first storage location to the The second storage location.
  • processing unit is further configured to:
  • the first storage location and the second storage location are determined according to the type of the calculation unit.
  • this application provides a computer system, including: a processor and a memory;
  • the processor and the memory are electrically connected;
  • the processor is configured to call the code in the memory to execute the method according to any one of the above-mentioned first aspects.
  • the present application provides a computer storage medium, the computer-readable storage medium stores instructions, and when the instructions run on a computer, the computer executes the method described in any one of the above-mentioned first aspects. .
  • the embodiments of the present application provide a method for constructing an intermediate expression, including:
  • the first IR includes a calculation sentence
  • the calculation sentence includes a tensor and an operator
  • the operation represented by the operator is performed based on the calculation unit, and the tensor is used to pass the first
  • the storage location and the second storage location are migrated to the calculation unit, and the tensor is data used to execute the operation represented by the operator;
  • a second IR is generated based on the calculation sentence, the second IR includes first data migration information and data segmentation information, and the first data migration information indicates that the migration path of the tensor includes the first storage location To the second storage location, the data segmentation information indicates the size of the tensor that is migrated to the second storage location each time.
  • the first storage location may be a starting storage location on the tensor migration path.
  • first storage location and the second storage location may be adjacent storage locations.
  • the tensor is used to migrate to the computing unit through a first storage location, at least one third storage location, and a second storage location in sequence.
  • the tensor is used to migrate to the computing unit through at least one fourth storage location, a first storage location, and a second storage location in sequence.
  • the tensor is used to sequentially migrate to the computing unit through at least one fourth storage location, a first storage location, at least one third storage location, and a second storage location.
  • the tensor is used to sequentially migrate to the computing unit through at least one fourth storage location, a first storage location, at least one third storage location, a second storage location, and at least one fifth storage location.
  • the generating the second IR based on the calculation sentence includes: based on the calculation sentence, the storage size corresponding to the first storage location, and the first storage location.
  • the storage size corresponding to the second storage location generates a second IR.
  • the second IR includes first data migration information and data segmentation information, and the first data migration information indicates that the migration path of the tensor includes the migration path of the tensor.
  • the storage location is to the second storage location, and the data segmentation information indicates the size of the tensor that is migrated to the second storage location each time.
  • the first storage location indicates a storage location on the on-chip cache
  • the second storage location indicates a storage location on the on-chip cache
  • the first storage location is a memory
  • the second storage location indicates a storage location on the on-chip cache.
  • the tensor corresponds to a target variable in the calculation sentence
  • the second IR further includes: a value range of the target variable, and The mapping of the target variable to the tensor of the first storage location.
  • the data segmentation information includes a value range of the target variable corresponding to each time the tensor is migrated to the second storage location.
  • the tensor includes a target axis variable
  • the segmentation tensor includes a segmentation axis variable of the target axis variable
  • the data segmentation information includes Each time the tensor is migrated to the second storage location, the size relationship between the split axis variable and the target axis variable.
  • the second IR includes node information corresponding to the second cache, and the node information includes the first data migration information and the data switch. Sub-information.
  • the second IR further includes: reading and writing information, and the reading and writing information indicates that the second storage location reads the information from the first storage location.
  • the tensor is not limited to: reading and writing information, and the reading and writing information indicates that the second storage location reads the information from the first storage location.
  • the method further includes:
  • first data flow information corresponding to the calculation sentence, where the first data flow information includes first data migration information associated with the tensor, and the first data migration information indicates that the tensor is determined by the tensor.
  • the first storage location is moved to the second storage location.
  • the generating of the first data stream information corresponding to the calculation sentence includes:
  • the first data flow information includes the second data migration information of the tensor, and the second data migration information indicates that the tensor is sequentially stored from the first storage location. Migrate to the second storage location through at least one third migration location;
  • the first data flow information is generated, and the first data flow information includes the data migration associated with the tensor information.
  • the third migration location is an on-chip cache.
  • the method further includes:
  • the first storage location and the second storage location are determined according to the type of the calculation unit.
  • this application provides a compiler, including:
  • the obtaining unit is configured to obtain a first intermediate expression IR, the first IR includes a calculation sentence, the calculation sentence includes a tensor and an operator, and the operation represented by the operator is performed based on the calculation unit, and the tensor is used Migrating to the calculation unit through the first storage location and the second storage location in turn, the tensor is the data used to execute the operation represented by the operator;
  • the processing unit is configured to generate a second IR based on the calculation sentence, where the second IR includes first data migration information and data segmentation information, and the first data migration information indicates that the migration path of the tensor includes the From the first storage location to the second storage location, the data segmentation information indicates the size of the tensor that is migrated to the second storage location each time.
  • the first storage location indicates a storage location on the on-chip cache
  • the second storage location indicates a storage location on the on-chip cache
  • the first storage location is a memory
  • the second storage location indicates a storage location on the on-chip cache.
  • the tensor corresponds to a target variable in the calculation sentence
  • the second IR further includes: a value range of the target variable, and The mapping of the target variable to the tensor of the first storage location.
  • the data segmentation information includes a value range of the target variable corresponding to each time the tensor is migrated to the second storage location.
  • the tensor includes a target axis variable
  • the segmentation tensor includes a segmentation axis variable of the target axis variable
  • the data segmentation information includes Each time the tensor is migrated to the second storage location, the size relationship between the split axis variable and the target axis variable.
  • the second IR includes node information corresponding to the second cache, and the node information includes the first data migration information and the data switch. Sub-information.
  • the second IR further includes: reading and writing information, and the reading and writing information indicates that the second storage location reads the information from the first storage location.
  • the tensor is not limited to: reading and writing information, and the reading and writing information indicates that the second storage location reads the information from the first storage location.
  • processing unit is further configured to:
  • first data flow information corresponding to the calculation sentence, where the first data flow information includes first data migration information associated with the tensor, and the first data migration information indicates that the tensor is determined by the tensor.
  • the first storage location is moved to the second storage location.
  • the processing unit is specifically configured to:
  • the first data flow information includes the second data migration information of the tensor, and the second data migration information indicates that the tensor is sequentially stored from the first storage location. Migrate to the second storage location through at least one third migration location;
  • the first data flow information is generated, and the first data flow information includes the data migration associated with the tensor information.
  • the third migration location is an on-chip cache.
  • processing unit is further configured to:
  • the first storage location and the second storage location are determined according to the type of the calculation unit.
  • the present application provides a computer system, including: a processor and a memory;
  • the processor and the memory are electrically connected;
  • the processor is configured to call the code in the memory to execute the method according to any one of the above fifth aspects.
  • the present application provides a computer storage medium, the computer-readable storage medium stores instructions, and when the instructions run on a computer, the computer executes the method described in any of the above fifth aspects .
  • An embodiment of the present application provides a method for constructing an intermediate expression, including: obtaining a first intermediate expression IR, the first IR includes a calculation sentence, the calculation sentence includes a tensor and an operator, wherein the operator represents The calculation of is performed based on a calculation unit, the calculation unit is configured to perform calculations on data that is transferred sequentially through the first storage location and the second storage location, and the tensor is the data used to perform the calculation represented by the operator; based on The calculation sentence generates a second IR, the second IR includes first data migration information and data segmentation information, and the first data migration information indicates that the first migration path of the tensor includes the first storage Location to the second storage location, and the data segmentation information indicates the size of the tensor when the tensor is migrated to the second storage location.
  • the compiler can construct an intermediate expression that can express the migration of the tensor between different storage locations (on-chip cache or memory) (including the direction of migration and the size of the migration), which can be applied to AI for DSA The construction of the IR on the chip.
  • FIG. 1 is a schematic diagram of the application architecture of an embodiment of the application
  • Figure 2 is a schematic flow chart of a method for constructing an intermediate expression provided by an embodiment of the application
  • FIG. 3 is a schematic diagram of the structure of an AI core in an AI chip provided by an embodiment of the application.
  • FIG. 4 is an abstract schematic diagram of data stream information provided by an embodiment of this application.
  • FIG. 5 is a schematic diagram of an embodiment of a method for constructing an intermediate expression provided by an embodiment of the application
  • FIG. 6 is an abstract schematic diagram of data stream information provided by an embodiment of this application.
  • FIG. 7 is an abstract schematic diagram of data flow information provided by an embodiment of this application.
  • FIG. 8 is an abstract schematic diagram of data flow information provided by an embodiment of this application.
  • FIG. 9 is a schematic structural diagram of a compiler provided by an embodiment of the application.
  • the embodiments of this application provide a method, compiler, and server for constructing intermediate expressions, which can be constructed to express the migration of tensors between different storage locations (on-chip cache or memory) (including the direction of migration and the size of migration). ) In the middle of the expression.
  • Figure 1 is a schematic diagram of the application architecture of an embodiment of this application.
  • this application can be applied to a server, where the server can include the AI training and inference framework of the software part, and AI training and inference
  • the framework may include a compiler.
  • the compiler may obtain source code from the memory, compile the source code into an intermediate expression, and a machine language that can be recognized and executed by the AI chip.
  • FIG. 2 is a schematic flowchart of a method for constructing an intermediate expression provided by an embodiment of the application. As shown in FIG. 2, the method for constructing an intermediate expression provided by the present application includes:
  • the compiler obtains a first intermediate expression IR, where the first IR includes a calculation statement, the calculation statement includes a tensor and an operator, and the operation represented by the operator is performed based on a calculation unit, and the calculation unit uses To perform an operation on the data migrated through the first storage location and the second storage location in turn, the tensor is the data used to execute the operation represented by the operator.
  • the compiler may obtain a first intermediate representation (intermediate representation, IR).
  • the first IR can be generated by parsing and compiling computer source code.
  • the first IR can be generated by decompiling an existing computer program. Or, obtain the first IR from the outside.
  • the source code used to generate the first IR can be written using an application editing interface of a high-level editing language.
  • the high-level programming language may be a domain specific language (domain specific language, DSL).
  • the first IR may be stored in a memory (external storage or internal storage of the server).
  • the compiler may read the first IR from the memory (external storage or internal storage of the server).
  • the first IR may be described by a DSL language.
  • the DSL language may be: Halide, Graphlt, Spatial, or other customized domain-specific languages. Among them, Halide is suitable for vector and tensor operations, GraphIt is suitable for the field of graph computing, Spatial is suitable for the field of programmable hardware, and customized domain-specific languages are suitable for the corresponding customized fields.
  • the compiler may sequentially traverse each calculation statement in the first IR obtained, and analyze the calculation unit that will be used in each calculation statement.
  • the calculation unit may at least include: a scalar calculation unit, a vector calculation unit, and a tensor cube calculation unit.
  • a vector calculation unit can support addition operations (add), subtraction operations (sub), multiplication operations (mul), and reciprocal operations. (rec), take exponential operation (exp), take logarithm operation (log), and quantize operation, etc.
  • the cube computing unit can support convolution operations.
  • AI chip architectures for example, AI chips based on domain-specific architecture (DSA)
  • DSA domain-specific architecture
  • many dedicated on-chip caches are used to shorten the distance of data movement and reduce the overhead caused by data movement.
  • DSA domain-specific architecture
  • VTA versatile tensor accelerator
  • the types of on-chip caches can be 5 or more, which is not limited in this application.
  • FIG. 3 is a schematic diagram of the structure of an AI core in an AI chip provided by an embodiment of the application.
  • the AI core includes multiple on-chip buffers (L1buffer, LOAbuffer, LOBbuffer, LOC buffer, Unified buffer), multiple computing units (cube computing unit, vector computing unit), data transfer processing unit, and bus interface components.
  • the bus interface component can obtain the tensor in the memory of the AI chip, and migrate the tensor to the corresponding calculation unit through the above-mentioned on-chip cache, so as to realize the corresponding operation.
  • the data (tensor) migration route can be determined based on the specific architecture of the AI core.
  • the migration route may not be unique, that is, the data can pass through multiple migration routes and pass through different
  • the on-chip cache is migrated to the computing unit.
  • the "migration" in this application can mean the reading of data.
  • the migration of a tensor from buffer1 to buffer2 can mean that buffer2 reads the tensor in buffer1.
  • the AI core may further include a data control unit, which can control the migration direction of the tensor in the on-chip cache.
  • the on-chip cache in FIG. 3 and the memory in the server in FIG. 1 can be understood as different storage media (the on-chip cache is the storage medium in the AI chip).
  • the calculation sentence of the first IR may include at least one tensor and at least one tensor-related operator, where the tensor is data, for example, it can be understood as a multi-dimensional vector.
  • an operator can represent a certain operation rule, for example, a certain operator is a multiplication operation. Accordingly, the operator needs to be executed based on the vector calculation unit, and the migration of the tensor from the memory to the vector calculation unit requires sequential Through: memory, bus interface components, unified buffer, can be migrated to the vector computing unit. That is, the vector calculation is used to obtain the tensor by sequentially migrating the memory and the unified buffer, and then realize the operation of the operator.
  • the first IR may refer to the following IR illustration:
  • input_1 (i1, i2) represents the tensor A
  • input_2 (i1, i2) represents the tensor B
  • result (i1, i2) is the result.
  • for(i1,0,64) represents a layer of for loop
  • the traversal rule is that the target variable i1 starts from 0, each time it is accumulated by 1, and it is accumulated for 64 times.
  • result(i1,i2) is a two-dimensional tensor, which is the product of tensor A and tensor B, and the size of each dimension is 64.
  • the target variable may be a loop variable of a calculation statement, and a part of the loop variable may be a target variable of a tensor included in it.
  • the compiler can obtain the calculation statement in the above-mentioned first IR, and the calculation statement includes tensor A, tensor B, tensor (result), and operator (multiplication operation), and further, the compiler can It is determined that the calculation unit corresponding to the operator (multiplication operation) is a vector calculation unit, and the vector calculation is used to sequentially obtain the tensor from the memory and the unified buffer, and then realize the operation of the operator (multiplication operation).
  • the compiler may generate first data flow information corresponding to the calculation sentence, where the first data flow information includes first data migration information associated with the tensor, and The first data migration information indicates that the tensor is migrated from the first storage location to the second storage location.
  • the first storage location indicates a storage location on the on-chip cache
  • the second storage location indicates a storage location on the on-chip cache
  • the first storage location indicates a storage location on the memory
  • the second storage location indicates a storage location on the on-chip cache.
  • the data flow information in the embodiments of the present application may indicate the migration of the tensor in the AI core.
  • it may be a data stack structure that may indicate the migration path of the tensor in the AI core. It does not limit the specific implementation of the data stream information.
  • Fig. 4 is an abstract diagram of a data stream information provided by an embodiment of the application.
  • the compiler can generate the first data stream corresponding to the calculation statement.
  • Information the first data stream information can indicate that tensor A is migrated from memory to unified buffer, tensor B is migrated from memory to unified buffer, tensor A and tensor B are migrated to Vector computing unit, Vector computing unit You can perform a product operation on tensor A and tensor B to get the result of the operation, and then the result of the operation is migrated to the parallel memory.
  • the data migration information related to tensor A in the first data stream information is: "Memory (A)-UB (A)", it should be noted that UB in FIG. 4 is the above-mentioned Unified buffer.
  • the data migration information related to tensor B is: “Memory (B)-UB (B)”.
  • the data migration information related to the tensor result is: "UB (result)-memory (result)”.
  • the compiler generates a second IR based on the calculation statement, where the second IR includes first data migration information and data segmentation information, and the first data migration information indicates that the first migration path of the tensor includes From the first storage location to the second storage location, the data segmentation information indicates the size of the tensor when the tensor is migrated to the second storage location.
  • the compiler may generate the second IR, where the second IR and the first IR may be intermediate representations based on descriptions in different languages.
  • the second IR may be implemented based on polyhedron coding technology.
  • the second IR may be implemented by a scheduling tree based on polyhedron technology, which may include a root node and sub-nodes, where the root node is a domain node, including calculation statements of operators, and changes in variables therein. range.
  • the band node in the specific range is marked as child or schedule in the second IR.
  • the second IR may include data migration information, and the data migration information indicates that the tensor is migrated from the first storage location to the second storage location.
  • the first part represents a sentence of the first IR and the change range of its related target variables (i1 and i2).
  • Mark is a mark node, where "realize_UB” means that the on-chip buffer type is Unified buffer, namely: Mark: "Realize_UB” represents node information corresponding to the second cache (UB).
  • the node information corresponding to the second cache (UB) includes data migration information and data segmentation information corresponding to the second cache (UB).
  • the second part represents the data migration information and data segmentation information of the tensor, that is, the migration mapping relationship from the memory to the on-chip cache UB, where i1, i2 represent the target variables related to the sentence, and arg0, arg1 represent the axis variables of the tensor , Arg0', arg1' represent the split axis variables of the first tensor.
  • the tensor needs to be segmented, so that each subsequent tensor migration process is performed based on the segmented tensor.
  • [i1,i2]->L1read[[[i1,i2]->A[arg0,arg1]]->A_local_L1[arg0',arg1']] means that the first tensor is migrated from memory to on-chip cache UB Mapping relationship, -> indicates a level of mapping relationship.
  • [[i1,i2]->A[arg0,arg1]] represents the mapping from the target variable i1,i2 to the tensor from the first storage location (memory)
  • [i1,i2]->A[arg0, arg1]]->A_local_L1[arg0',arg1'] represents the mapping from the tensor (memory) of the first storage location to the tensor (UB) of the second storage location
  • A_local_L1[arg0',arg1'] represents The tensor needs to perform data segmentation every time it migrates to the second storage location.
  • the tensor includes multiple dimensions, and each dimension corresponds to an axis variable (for example, the above-mentioned arg0 and arg1), the axis variable is used to represent the size of the tensor of the corresponding dimension, and the data segmentation information Including multiple axis variables and a segmentation axis variable corresponding to each axis variable, the segmentation axis variable is used to represent the tensor size of the corresponding dimension when the tensor is migrated
  • an axis variable for example, the above-mentioned arg0 and arg1
  • the axis variable is used to represent the size of the tensor of the corresponding dimension
  • the data segmentation information Including multiple axis variables and a segmentation axis variable corresponding to each axis variable, the segmentation axis variable is used to represent the tensor size of the corresponding dimension when the tensor is migrated
  • the second IR further includes at least one target variable and a value range of each target variable, the axis variable is linearly related to the at least one target variable, and the axis variable passes through the at least one target variable.
  • the variable and the value range of the at least one target variable represent the size of the tensor of the corresponding dimension.
  • the tensor size of the dimension corresponding to the axis variable can be expressed by the linear combination of the target variable and the value range, for example:
  • the second IR may further include: the magnitude relationship between the split axis variable and the axis variable.
  • the second IR may include the size relationship between the split axis variable arg0' and the axis variable arg0, and the size relationship between the split axis variable arg1' and the axis variable arg1 in each migration process.
  • one axis variable in the embodiment of the present application corresponds to one dimension of the tensor, and multiple axis variables may represent the size of the tensor.
  • the split axis variable is a part of the axis variable obtained after the axis variable is divided (or divided).
  • segmentation may refer to dividing the axis variable into multiple sub-axis variables, for example, dividing a 64*64 axis variable into two 32*64 segmentation axis variables.
  • the axis variable arg1 is migrated according to half of the axis variable arg1 each time (the size of the axis variable depends on the value of the corresponding target variable Range). At this time, it takes two times to complete the migration of the axis variable arg1.
  • the compiler can construct an intermediate expression that can express the migration (including the migration direction and the size of the migration) of the tensor between different storage locations.
  • the second IR may also include operation statements related to operators in calculation statements. Since this application only focuses on the migration process of tensors between on-chip caches, regarding the compilation of operators This application will not go into details.
  • the compiler may obtain the first data stream information in the first data stream information (for example, memory (A) to UE (A) in Figure 4, according to the currently acquired data Flow information, determine the insertion position of the data flow information in the second IR, as described in the above embodiment, at this time, the insertion position corresponding to the data flow information is "realize UB", and the tensor is determined based on the size of the tensor
  • the segmentation size of can be the segmentation of the axis variable corresponding to the tensor.
  • the compiler obtains the direct movement information of the tensor cached on different slices in the current segmentation space under the current segmentation space,
  • the moving information can be as follows:
  • i0, i1 represent the relevant target variables of the sentence;
  • A[arg0,arg1] represents the tensor A and its axis variables;
  • A_local_L1[arg0',arg1'] represents the tensor A_local_L1 on the on-chip cache L1 and the tensor Axis variable;
  • the compiler can perform a presburger operation identity based on the transport information between the on-chip caches obtained above.
  • mapping to itself can be obtained:
  • data migration information and data segmentation information can be obtained:
  • L1read represents the reading of data.
  • the compiler can insert the above-mentioned data migration information and data segmentation information under the child node of "realize UB" to obtain the second IR.
  • An embodiment of the present application provides a method for constructing an intermediate expression, including: obtaining a first intermediate expression IR, the first IR includes a calculation sentence, the calculation sentence includes a tensor and an operator, wherein the operator represents The calculation of is performed based on a calculation unit, the calculation unit is configured to perform calculations on data that is transferred sequentially through the first storage location and the second storage location, and the tensor is the data used to perform the calculation represented by the operator; based on The calculation sentence generates a second IR, the second IR includes first data migration information and data segmentation information, and the first data migration information indicates that the first migration path of the tensor includes the first storage Location to the second storage location, and the data segmentation information indicates the size of the tensor when the tensor is migrated to the second storage location.
  • the compiler can construct an intermediate expression that can express the migration of the tensor between different storage locations (on-chip cache or memory) (including the direction of migration and the size of the migration), which can be applied to AI for DSA The construction of the IR on the chip.
  • FIG. 5 is a schematic diagram of an embodiment of a method for constructing an intermediate expression provided by an embodiment of the application. As shown in FIG. 5, the method for constructing an intermediate expression includes:
  • the compiler obtains the first intermediate expression IR.
  • step 501 For your specific description of step 501, please refer to step 401, which will not be repeated here.
  • the compiler generates first data flow information corresponding to the calculation sentence based on the calculation sentence, where the first data flow information includes second data migration information of the tensor, and the second data migration information represents the The tensor is transferred from the first storage location to the second storage location through at least one third transfer location in turn.
  • the compiler generates second data flow information based on a data transmission path directly connected between the first storage location and the second storage location, where the second data flow information includes the first data migration information.
  • the compiler may generate first data flow information corresponding to the calculation sentence, where the first data flow information includes second data migration information of the tensor, and the second data migration information indicates that the tensor
  • the quantity is transferred from the first storage location to the second storage location through at least one third transfer location in turn, based on the existence of a direct data transmission path between the first storage location and the second storage location, where (direct The connected data transmission path can be understood as: there is no other storage location between the first storage location and the second storage location, and the tensor can be directly transferred from the first storage location to the second storage location without passing through other storage locations) , Generating the first data stream information, where the first data stream information includes the first data migration information.
  • the third migration location indicates a storage location on the on-chip cache.
  • the compiler may generate the second data flow information corresponding to the calculation statement.
  • the initialized second data flow information It includes a lot of redundant migration processes.
  • the dotted line indicates the optional migration path that can be optimized, for example, from UB (F (A)), DDR (F (A)) to L1 ( In the migration route of F(A)) (UB corresponds to the first storage location, DDR corresponds to at least one second migration location, and L1 corresponds to the second storage location), at this time, referring to FIG.
  • the first data flow information includes first data migration information associated with the tensor F(A), and the first data migration information indicates that the tensor F(A) The amount is migrated from the first storage location (UB buffer) to the second storage location (L1buffer).
  • a certain weight can be assigned to each migration.
  • the data flow weight table can refer to the following table 1 for illustration:
  • the weight of the DDR represents the quantification of the performance cost of obtaining the tensor from the memory.
  • the weight of L1/UB represents the quantification of the performance cost of obtaining tensors from L1buffer and UBbuffer.
  • the weight of L01/L0B/L0C represents the quantification of the performance cost of the tensor obtained from the L01 buffer, the L0B buffer, and the L0C buffer.
  • the compiler also needs to mark whether the attribute of each edge in the data flow information is data movement or data calculation. Only the edges where the data is moved can be optimized in this processing flow, while the edges for data calculation cannot be optimized during optimization.
  • the compiler can traverse all nodes in the data flow graph, and calculate all the starting node sets with zero in-degree (for example, DDR(A), DDR(B), and DDR(D) in Figure 7) and the end with zero out-degree
  • a node set for example, DDR(RES) in FIG. 7
  • all feasible path tables of the data flow graph are obtained, and the path table can indicate possible paths from the start node set to the end node set.
  • Fig. 8 shows the optimized first data flow diagram.
  • a first intermediate expression IR is obtained, where the first IR includes a calculation sentence, and the calculation sentence includes a tensor; the first data flow information corresponding to the calculation sentence is generated based on the calculation sentence, the The first data flow information includes second data migration information of the tensor, and the second data migration information indicates that the tensor is sequentially migrated from the first storage location to the second storage location through at least one third migration location.
  • Storage location generating second data flow information based on a direct data transmission path between the first storage location and the second storage location, where the second data flow information includes the first data migration information .
  • FIG. 9 is a schematic structural diagram of a compiler provided by an embodiment of the application. As shown in FIG. 9, the compiler includes:
  • the obtaining unit 901 is configured to obtain a first intermediate expression IR, the first IR includes a calculation sentence, the calculation sentence includes a tensor and an operator, wherein the calculation represented by the operator is performed based on the calculation unit, and the calculation The unit is configured to perform operations on the data migrated through the first storage location and the second storage location in turn, and the tensor is the data used to execute the operations represented by the operator;
  • the processing unit 902 is configured to generate a second IR based on the calculation sentence, where the second IR includes first data migration information and data segmentation information, and the first data migration information represents a first migration path of the tensor Including from the first storage location to the second storage location, the data segmentation information indicates the size of the tensor when the tensor is migrated to the second storage location.
  • the calculation unit is further configured to perform calculations on data passing through a second migration path, where the second migration path includes the path from the first storage location, passing through at least one third storage location, to the second Storage location, where the first migration path and the second migration path are different data migration paths.
  • the at least one third storage location indicates a storage location on the on-chip cache.
  • the processing module is further configured to: generate first data flow information corresponding to the calculation sentence based on the calculation sentence, the first data flow information includes second data migration information of the tensor, and The second data migration information indicates that the tensor is sequentially migrated from the first storage location to the second storage location through at least one third migration location;
  • second data flow information is generated, where the second data flow information includes the first data migration information.
  • the first storage location indicates a storage location on the on-chip cache
  • the second storage location indicates a storage location on the on-chip cache
  • the first storage location indicates a memory
  • the second storage location indicates a storage location on the on-chip cache.
  • the tensor includes multiple dimensions, each dimension corresponds to an axis variable, and the axis variable is used to indicate the size of the tensor of the corresponding dimension, and the data segmentation information includes multiple axis variables and A segmentation axis variable corresponding to each axis variable, where the segmentation axis variable is used to represent the size of the tensor of the corresponding dimension when the tensor is migrated.
  • the second IR further includes at least one target variable and a value range of each target variable, the axis variable is linearly related to the at least one target variable, and the axis variable passes through the at least one target variable.
  • the variable and the value range of the at least one target variable represent the size of the tensor of the corresponding dimension.
  • the second IR is an IR of a tree structure, the tree structure includes a root node and child nodes, the root node corresponds to the calculation sentence, and the child node corresponds to the second storage location, wherein ,
  • the second IR includes the information of the child node corresponding to the second storage location, and the information of the child node corresponding to the second storage location includes the first data migration information and the data segmentation information.
  • the second IR further includes: data movement information, where the data movement information indicates that the tensor is moved from the first storage location to the second storage location.
  • processing unit is further configured to:
  • the first storage location and the second storage location are determined according to the type of the calculation unit.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by software plus necessary general hardware.
  • it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve.
  • all functions completed by computer programs can be easily implemented with corresponding hardware.
  • the specific hardware structures used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the various embodiments described in this application method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, training device, or data.
  • the center transmits to another website, computer, training equipment, or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

一种构建中间表达的方法,方法包括:获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包括张量和算子,其中所述算子所表示的运算基于计算单元执行,所述计算单元用于对依次通过第一存储位置和第二存储位置迁移的数据执行运算,所述张量为执行所述算子所表示的运算使用的数据(201);基于所述计算语句生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的第一迁移路径包括由所述第一存储位置到所述第二存储位置,所述数据切分信息表示所述张量迁移至所述第二存储位置时所述张量的大小(202)。所述方法可以构建出能表达张量在不同存储位置之间的迁移的中间表达。

Description

一种构建中间表达的方法、编译器和服务器
本申请要求于2019年09月18日提交中国专利局、申请号为201910896548.1、发明名称为“一种构建中间表达的方法、编译器和服务器”,以及于2019年12月11日提交中国专利局、申请号为201911271859.5、发明名称为“一种构建中间表达的方法、编译器和服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及电子技术领域,尤其涉及一种构建中间表达的方法、编译器和服务器。
背景技术
随着人工智能(artificial intelligence,AI)技术的发展,基于领域特定架构(domain specific architecture,DSA)设计的AI芯片不断涌现。不同于现代微处理器,基于DSA的AI芯片,使用了多个存储位置来缩短数据搬移的距离,以减少数据搬移带来的开销。随着DSA的演进,计算单元的增加以及算子复杂度的提升,存储位置的数量成倍增加,此时需要通过多个存储位置将张量搬移至对应的计算单元,来实现算子的执行。
然而现有技术中,并没有一种可以针对于需要通过多个存储位置,将数据迁移至计算单元的中间表达(intermediate representation,IR)的构建方法。
发明内容
本申请实施例提供了一种构建中间表达的方法,包括:
获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包括张量和算子,其中所述算子所表示的运算基于计算单元执行,所述计算单元用于对依次通过第一存储位置和第二存储位置迁移的数据执行运算,所述张量为执行所述算子所表示的运算使用的数据;
基于所述计算语句生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的第一迁移路径包括由所述第一存储位置到所述第二存储位置,其中,第一存储位置和所述第二存储位置之间可以存在多条数据传输通路,有的数据传输通路上可以没有其他存储位置,例如下文描述的第一存储位置和所述第二存储位置之间的数据传输通路是直连的;有的数据通路上可以有其他存储位置,例如下文提及的第三存储位置。所述数据切分信息表示所述张量迁移至所述第二存储位置时所述张量的大小。
可选的,在第一方面的一种可选设计中,所述计算单元还用于对通过第二迁移路径的数据执行运算,所述第二迁移路径包括由所述第一存储位置,经过至少一个第三存储位置,到所述第二存储位置,所述第一迁移路径与所述第二迁移路径为不同的数据迁移路径。
可选的,在第一方面的一种可选设计中,所述至少一个第三存储位置指示片上缓存上的存储位置。
可选的,在第一方面的一种可选设计中,所述方法还包括:
基于所述计算语句生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括所述张量的第二数据迁移信息,所述第二数据迁移信息表示所述张量由所述第一存储位置依次通过至少一个第三迁移位置迁移至所述第二存储位置;
基于所述第一存储位置与所述第二存储位置之间存在直连的数据传输通路,生成第二数据流信息,其中所述第二数据流信息包括所述第一数据迁移信息。
本实施例中,第一存储位置与所述第二存储位置之间存在直连的数据传输通路可以理解为:第一存储位置与所述第二存储位置之间存在不经过其他任意一个存储位置的数据传输通路。
例如,所述张量用于依次通过第一存储位置、至少一个第三存储位置以及第二存储位置迁移至所述计算单元。又例如,所述张量用于依次通过至少一个第四存储位置、第一存储位置以及第二存储位置迁移至所述计算单元。又例如,所述张量用于依次通过至少一个第四存储位置、第一存储位置、至少一个第三存储位置以及第二存储位置迁移至所述计算单元。又例如,所述张量用于依次通过至少一个第四存储位置、第一存储位置、至少一个第三存储位置、第二存储位置以及至少一个第五存储位置迁移至所述计算单元。
可选的,在第一方面的一种可选设计中,所述第一存储位置指示片上缓存上的存储位置,所述第二存储位置指示片上缓存上的存储位置。
可选的,在第一方面的一种可选设计中,所述第一存储位置指示内存,所述第二存储位置指示片上缓存上的存储位置。
可选的,在第一方面的一种可选设计中,所述张量包括多个维度,每个维度对应于一个轴变量,所述轴变量用于表示对应维度的张量大小,所述数据切分信息包括多个轴变量和与每个轴变量对应的切分轴变量,所述切分轴变量用于表示张量迁移时对应维度的张量大小。
可选的,在第一方面的一种可选设计中,所述第二IR还包括至少一个目标变量,以及每个目标变量的取值范围,所述轴变量与所述至少一个目标变量线性相关,所述轴变量通过所述至少一个目标变量,以及所述至少一个目标变量的取值范围来表示对应维度的张量大小。
可选的,在第一方面的一种可选设计中,所述第二IR为树结构的IR,所述树结构包括根节点和子节点,所述根节点对应于所述计算语句,所述子节点对应于所述第二存储位置,其中,所述第二IR包括与所述第二存储位置对应的子节点的信息,所述与所述第二存储位置对应的子节点的信息包括所述第一数据迁移信息和所述数据切分信息。
可选的,在第一方面的一种可选设计中,所述第二IR还包括:数据搬移信息,所述数据搬移信息表示所述张量被从所述第一存储位置搬移至所述第二存储位置。
可选的,在第一方面的一种可选设计中,所述方法还包括:
根据所述计算单元的类型确定第一存储位置和第二存储位置。
第二方面,本申请提供了一种编译器,包括:
获取单元,用于获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包 括张量和算子,其中所述算子所表示的运算基于计算单元执行,所述计算单元用于对依次通过第一存储位置和第二存储位置迁移的数据执行运算,所述张量为执行所述算子所表示的运算使用的数据;
处理单元,用于基于所述计算语句生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的第一迁移路径包括由所述第一存储位置到所述第二存储位置,所述数据切分信息表示所述张量迁移至所述第二存储位置时所述张量的大小。
可选的,在第二方面的一种可选设计中,所述计算单元还用于对通过第二迁移路径的数据执行运算,所述第二迁移路径包括由所述第一存储位置,经过至少一个第三存储位置,到所述第二存储位置,所述第一迁移路径与所述第二迁移路径为不同的数据迁移路径。
可选的,在第二方面的一种可选设计中,所述至少一个第三存储位置指示片上缓存上的存储位置。
可选的,在第二方面的一种可选设计中,所述处理模块还用于:基于所述计算语句生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括所述张量的第二数据迁移信息,所述第二数据迁移信息表示所述张量由所述第一存储位置依次通过至少一个第三迁移位置迁移至所述第二存储位置;
基于所述第一存储位置与所述第二存储位置之间存在直连的数据传输通路,生成第二数据流信息,其中所述第二数据流信息包括所述第一数据迁移信息。
可选的,在第二方面的一种可选设计中,所述第一存储位置指示片上缓存上的存储位置,所述第二存储位置指示片上缓存上的存储位置。
可选的,在第二方面的一种可选设计中,所述第一存储位置指示内存,所述第二存储位置指示片上缓存上的存储位置。
可选的,在第二方面的一种可选设计中,所述张量包括多个维度,每个维度对应于一个轴变量,所述轴变量用于表示对应维度的张量大小,所述数据切分信息包括多个轴变量和与每个轴变量对应的切分轴变量,所述切分轴变量用于表示张量迁移时对应维度的张量大小。
可选的,在第二方面的一种可选设计中,所述第二IR还包括至少一个目标变量,以及每个目标变量的取值范围,所述轴变量与所述至少一个目标变量线性相关,所述轴变量通过所述至少一个目标变量,以及所述至少一个目标变量的取值范围来表示对应维度的张量大小。
可选的,在第二方面的一种可选设计中,所述第二IR为树结构的IR,所述树结构包括根节点和子节点,所述根节点对应于所述计算语句,所述子节点对应于所述第二存储位置,其中,所述第二IR包括与所述第二存储位置对应的子节点的信息,所述与所述第二存储位置对应的子节点的信息包括所述第一数据迁移信息和所述数据切分信息。
可选的,在第二方面的一种可选设计中,所述第二IR还包括:数据搬移信息,所述数据搬移信息表示所述张量被从所述第一存储位置搬移至所述第二存储位置。
可选的,在第二方面的一种可选设计中,所述处理单元还用于:
根据所述计算单元的类型确定第一存储位置和第二存储位置。
第三方面,本申请提供了一种计算机系统,包括:处理器和存储器;
所述处理器和所述存储器电连接;
所述处理器用于调用所述存储器中的代码开执行如上述第一方面任一所述的方法。
第四方面,本申请提供了一种计算机存储介质,所述计算机可读存储介质存储指令,当所述指令在计算机上运行时,使得所述计算机执行如上述第一方面任一所述的方法。
第五方面,本申请实施例提供了一种构建中间表达的方法,包括:
获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包括张量和算子,所述算子所表示的运算基于计算单元执行,所述张量用于依次通过第一存储位置和第二存储位置迁移至所述计算单元,所述张量为执行所述算子所表示的运算使用的数据;
基于所述计算语句生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的迁移路径包括由所述第一存储位置到所述第二存储位置,所述数据切分信息表示每次迁移至所述第二存储位置的所述张量的大小。
可选的,所述第一存储位置可以为所述张量迁移路径上的起始存储位置。
可选的,第一存储位置和第二存储位置可以为相邻的存储位置。
可选的,第一存储位置和第二存储位置之间也可以间隔有其他存储位置。
可选的,第二存储位置与计算单元之间也可以间隔有其他存储位置。
例如,所述张量用于依次通过第一存储位置、至少一个第三存储位置以及第二存储位置迁移至所述计算单元。又例如,所述张量用于依次通过至少一个第四存储位置、第一存储位置以及第二存储位置迁移至所述计算单元。又例如,所述张量用于依次通过至少一个第四存储位置、第一存储位置、至少一个第三存储位置以及第二存储位置迁移至所述计算单元。又例如,所述张量用于依次通过至少一个第四存储位置、第一存储位置、至少一个第三存储位置、第二存储位置以及至少一个第五存储位置迁移至所述计算单元。
可选的,在第五方面的一种可选设计中,所述基于所述计算语句生成第二IR,包括:基于所述计算语句、所述第一存储位置对应的存储大小以及所述第二存储位置对应的存储大小生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的迁移路径包括由所述第一存储位置到所述第二存储位置,所述数据切分信息表示每次迁移至所述第二存储位置的所述张量的大小。
可选的,在第五方面的一种可选设计中,所述第一存储位置指示片上缓存上的存储位置,所述第二存储位置指示片上缓存上的存储位置。
可选的,在第五方面的一种可选设计中,所述第一存储位置为内存,所述第二存储位置指示片上缓存上的存储位置。
可选的,在第五方面的一种可选设计中,所述张量对应于所述计算语句中的目标变量,所述第二IR还包括:所述目标变量的取值范围,以及所述目标变量至所述第一存储位置的张量的映射。
可选的,在第五方面的一种可选设计中,所述数据切分信息包括所述张量每次迁移至所述第二存储位置时对应的所述目标变量的取值范围。
可选的,在第五方面的一种可选设计中,所述张量包括目标轴变量,所述切分张量包括所述目标轴变量的切分轴变量,所述数据切分信息包括所述张量每次迁移至所述第二存储位置,所述切分轴变量与所述目标轴变量之间的大小关系。
可选的,在第五方面的一种可选设计中,所述第二IR包括与所述第二缓存对应的节点信息,所述节点信息包括所述第一数据迁移信息和所述数据切分信息。
可选的,在第五方面的一种可选设计中,所述第二IR还包括:读写信息,所述读写信息表示所述第二存储位置从所述第一存储位置读取所述张量。
可选的,在第五方面的一种可选设计中,所述方法还包括:
生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括与所述张量相关联的第一数据迁移信息,所述第一数据迁移信息表示所述张量由所述第一存储位置迁移至所述第二存储位置。
可选的,在第五方面的一种可选设计中,所述生成所述计算语句对应的第一数据流信息,包括:
生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括所述张量的第二数据迁移信息,所述第二数据迁移信息表示所述张量由第一存储位置依次通过至少一个第三迁移位置迁移至第二存储位置;
基于所述第一存储位置与所述第二存储位置之间存在数据传输通路,生成所述第一数据流信息,所述第一数据流信息包括与所述张量相关联的所述数据迁移信息。
可选的,在第五方面的一种可选设计中,所述第三迁移位置为片上缓存。
可选的,在第五方面的一种可选设计中,所述方法还包括:
根据所述计算单元的类型确定第一存储位置和第二存储位置。
第六方面,本申请提供了一种编译器,包括:
获取单元,用于获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包括张量和算子,所述算子所表示的运算基于计算单元执行,所述张量用于依次通过第一存储位置和第二存储位置迁移至所述计算单元,所述张量为执行所述算子所表示的运算使用的数据;
处理单元,用于基于所述计算语句生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的迁移路径包括由所述第一存储位置到所述第二存储位置,所述数据切分信息表示每次迁移至所述第二存储位置的所述张量的大小。
可选的,在第六方面的一种可选设计中,所述第一存储位置指示片上缓存上的存储位置,所述第二存储位置指示片上缓存上的存储位置。
可选的,在第六方面的一种可选设计中,所述第一存储位置为内存,所述第二存储位置指示片上缓存上的存储位置。
可选的,在第六方面的一种可选设计中,所述张量对应于所述计算语句中的目标变量,所述第二IR还包括:所述目标变量的取值范围,以及所述目标变量至所述第一存储位置的张量的映射。
可选的,在第六方面的一种可选设计中,所述数据切分信息包括所述张量每次迁移至所述第二存储位置时对应的所述目标变量的取值范围。
可选的,在第六方面的一种可选设计中,所述张量包括目标轴变量,所述切分张量包括所述目标轴变量的切分轴变量,所述数据切分信息包括所述张量每次迁移至所述第二存储位置,所述切分轴变量与所述目标轴变量之间的大小关系。
可选的,在第六方面的一种可选设计中,所述第二IR包括与所述第二缓存对应的节点信息,所述节点信息包括所述第一数据迁移信息和所述数据切分信息。
可选的,在第六方面的一种可选设计中,所述第二IR还包括:读写信息,所述读写信息表示所述第二存储位置从所述第一存储位置读取所述张量。
可选的,在第六方面的一种可选设计中,所述处理单元还用于:
生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括与所述张量相关联的第一数据迁移信息,所述第一数据迁移信息表示所述张量由所述第一存储位置迁移至所述第二存储位置。
可选的,在第六方面的一种可选设计中,所述处理单元具体用于:
生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括所述张量的第二数据迁移信息,所述第二数据迁移信息表示所述张量由第一存储位置依次通过至少一个第三迁移位置迁移至第二存储位置;
基于所述第一存储位置与所述第二存储位置之间存在数据传输通路,生成所述第一数据流信息,所述第一数据流信息包括与所述张量相关联的所述数据迁移信息。
可选的,在第六方面的一种可选设计中,所述第三迁移位置为片上缓存。
可选的,在第六方面的一种可选设计中,所述处理单元还用于:
根据所述计算单元的类型确定第一存储位置和第二存储位置。
第七方面,本申请提供了一种计算机系统,包括:处理器和存储器;
所述处理器和所述存储器电连接;
所述处理器用于调用所述存储器中的代码开执行如上述第五方面任一所述的方法。
第八方面,本申请提供了一种计算机存储介质,所述计算机可读存储介质存储指令,当所述指令在计算机上运行时,使得所述计算机执行如上述第五方面任一所述的方法。
本申请实施例提供了一种构建中间表达的方法,包括:获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包括张量和算子,其中所述算子所表示的运算基于计算单元执行,所述计算单元用于对依次通过第一存储位置和第二存储位置迁移的数据执行运算,所述张量为执行所述算子所表示的运算使用的数据;基于所述计算语句生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的第一迁移路径包括由所述第一存储位置到所述第二存储位置,所述数据切分信息表示所述张量迁移至所述第二存储位置时所述张量的大小。通过上述方式,编译器可以构建出能表达出张量在不同存储位置(片上缓存或内存)之间的迁移(包括迁移的方向和迁移的大小)的中间表达,可以应用在针对于DSA的AI芯片上的IR的构建。
附图说明
图1为本申请实施例的应用架构示意图;
图2为本申请实施例提供的一种构建中间表达的方法的流程图示意;
图3为本申请实施例提供的一种AI芯片中的AI核的结构示意;
图4为本申请实施例提供的一种数据流信息的抽象示意;
图5为本申请实施例提供的一种构建中间表达的方法的实施例示意图;
图6为本申请实施例提供的一种数据流信息的抽象示意;
图7为本申请实施例提供的一种数据流信息的抽象示意;
图8为本申请实施例提供的一种数据流信息的抽象示意;
图9为本申请实施例提供的一种编译器的结构示意图。
具体实施方式
本申请实施例提供了一种构建中间表达的方法、编译器和服务器,可以构建出能表达出张量在不同存储位置(片上缓存或内存)之间的迁移(包括迁移的方向和迁移的大小)的中间表达。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
参照图1,图1为本申请实施例的应用架构示意图,如图1中示出的那样,本申请可以应用在服务器中,其中服务器可以包括软件部分的AI训练和推理框架,AI训练和推理框架可以包括编译器,本申请实施例中,编译器可以从存储器中获取源代码,并将源代码编译为中间表达,以及可以让AI芯片识别并执行的机器语言。
参照图2,图2为本申请实施例提供的一种构建中间表达的方法的流程图示意,如图2中示出的那样,本申请提供的构建中间表达的方法包括:
201、编译器获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包括张量和算子,其中所述算子所表示的运算基于计算单元执行,所述计算单元用于对依次通过第一存储位置和第二存储位置迁移的数据执行运算,所述张量为执行所述算子所表示的运算使用的数据。
可选地,本申请实施例中,编译器可以获取第一中间表示(intermediate representation,IR)。
应理解,可以通过解析以及编译计算机源代码等方式生成第一IR。或者,可通过现有计算机程序的反编译来生成第一IR。或者,从外部获得第一IR。其中,用于生成第一IR的 源代码可使用高级编辑语言的应用编辑接口来编写。该高级编程语言可以是领域专用语言(domain specific language,DSL)。
第一IR可以存储在存储器(服务器的外部存储或内部存储)中,本申请实施例中,编译器可以从存储器(服务器的外部存储或内部存储)读取第一IR。具体的,第一IR可以由DSL语言来描述,本申请实施例中,所述DSL语言可以为:Halide、Graphlt、Spatial、或其他定制的领域特定语言。其中,Halide适用于向量和张量运算,GraphIt适用于图计算领域,Spatial适用于可编程硬件领域,定制的领域特定语言适用于对应的定制领域。
本申请实施例中,编译器可以依次遍历获取的第一IR中的每一条计算语句,并分析每条计算语句的将会使用到的计算单元。
在一种实施例中,计算单元可以至少包括:标量计算单元,向量vector计算单元和张量cube计算单元。
本申请实施例中,针对于不同的算子类型,将会使用到的计算单元不同,例如vector计算单元可以支持加法运算(add),减法运算(sub),乘法运算(mul),取倒数运算(rec),取指数运算(exp),取对数运算(log)、以及量化运算等。cube计算单元可以支持卷积运算。
在一些AI芯片的架构中(例如基于特定领域架构(domain specific architecture,DSA)的AI芯片),使用了许多专用片上缓存来缩短数据搬移的距离,以减少数据搬移带来的开销,随着DSA架构的演进,计算单元的增加以及算子复杂度的提升,片上缓存类型成倍增加,例如:多用途张量加速器(versatile tensor accelerator,VTA)有3种类型的片上缓存,在其他的处理器中,片上缓存的类型可以为5个甚至更多,本申请并不限定。
参照图3,图3为本申请实施例提供的一种AI芯片中的AI核的结构示意,如图3中示出的那样,AI核包括多个片上缓存(L1buffer、L0A buffer、L0B buffer、L0C buffer、Unified buffer)、多个计算单元(cube计算单元、vector计算单元)、数据中转处理单元以及总线接口部件。其中,总线接口部件可以获取到AI芯片内存中的张量,通过上述片上缓存,将张量迁移至对应的计算单元,实现对应的运算。
需要说明的是,在确定了计算单元之后,可以基于AI核的具体架构,确定数据(张量)的迁移路线,其中,迁移路线可以不唯一,即数据可以通过多种迁移路线,经过不同的片上缓存迁移至计算单元。
需要说明的是,本申请中的“迁移”可以表示数据的读取,例如张量由buffer1迁移至buffer2,可以表示buffer2读取buffer1中的张量。
需要说明的是,尽管图3中未示出,AI核还可以包括数据控制单元,该数据控制单元可以控制张量在片上缓存的迁移方向。
需要说明的是,图3中的片上缓存与图1中的服务器中的存储器可以理解为不同的存储介质(片上缓存为AI芯片中的存储介质)。
本申请实施例中,第一IR的计算语句可以包括至少一个张量以及至少一个张量相关的的算子,其中张量为数据,例如可以理解为多维度的向量。
本申请实施例中,算子可以表示某一运算规则,例如某一算子为乘法运算,相应的,该算子需要基于vector计算单元来执行,而张量从内存迁移至vector计算单元需要依次通 过:内存、总线接口部件、Unified buffer,才能迁移至vector计算单元。即,所述vector计算用于依次通过内存和Unified buffer的迁移来获取所述张量,进而实现所述算子的运算。
以第一IR为halide IR为例,示例性的,所述第一IR可以参照如下的IR示意:
#第一IR示意
for(i1,0,64){
   for(i2,0,64){
      result(i1,i2)=input_1(i1,i2)*input_2(i1,i2)}}
其中,input_1(i1,i2)表示张量A,input_2(i1,i2)表示张量B,result(i1,i2)为结果。for(i1,0,64)表示一层for循环,遍历规则为目标变量i1从0开始,每次累加1,累加64次。result(i1,i2)是一个两维的张量,其为张量A和张量B的乘积,其每个维度的大小为64。
需要说明的是,目标变量可以是计算语句的循环变量,其中循环变量的一部分可以为其包括的张量的目标变量。
本申请实施例中,编译器可以获取到上述第一IR中的计算语句,该计算语句包括张量A、张量B、张量(result)和算子(乘法运算),进而,编译器可以确定该算子(乘法运算)对应的计算单元为vector计算单元,所述vector计算用于依次通过内存和Unified buffer获取所述张量,进而实现所述算子(乘法运算)的运算。
可选的,本申请实施例中,编译器可以生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括与所述张量相关联的第一数据迁移信息,所述第一数据迁移信息表示所述张量由第一存储位置迁移至第二存储位置。
可选的,所述第一存储位置指示片上缓存上的存储位置,所述第二存储位置指示片上缓存上的存储位置。
可选的,所述第一存储位置指示内存上的存储位置,所述第二存储位置指示片上缓存上的存储位置。
需要说明的是,本申请实施例中的数据流信息可以表示张量在AI核中的迁移,例如可以是一种数据栈结构,该结构可以表示张量在AI核中的迁移路径,本申请并不限定数据流信息的具体实现方式。
以上述为例,参照图4,图4为本申请实施例提供的一种数据流信息的抽象示意,如图4中示出的那样,编译器可以生成所述计算语句对应的第一数据流信息,第一数据流信息可以表示张量A被从内存迁移至Unified buffer中,张量B被从内存迁移至Unified buffer中,张量A和张量B被迁移至Vector计算单元,Vector计算单元可以对张量A和张量B进行乘积运算得到运算结果result,之后,运算结果result被迁移至并内存。
其中,第一数据流信息中与张量A相关的数据迁移信息为:“内存(A)—UB(A)”,需要说明的是,图4中的UB为上述Unified buffer。与张量B相关的数据迁移信息为:“内存(B)—UB(B)”。与张量result相关的数据迁移信息为:“UB(result)—内存(result)”。
202、编译器基于所述计算语句生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的第一迁移路径包括由所述第一存储位置 到所述第二存储位置,所述数据切分信息表示所述张量迁移至所述第二存储位置时所述张量的大小。
本申请实施例中,编译器可以生成第二IR,其中第二IR和第一IR可以是基于不同语言描述的中间表示。可选的,第二IR可以基于多面体编译技术来实现。
本申请实施例中,第二IR可以是基于多面体技术的调度树实现的,其可以包括根节点和子节点,其中,根节点为域节点,包含了算子的计算语句,以及其中的变量的变化范围。特定范围节点(band node),在第二IR中的标记为child或schedule。
本申请实施例中,第二IR可以包括数据迁移信息,所述数据迁移信息表示所述张量被由第一存储位置迁移至第二存储位置。
参照如下的第二IR示意:
Figure PCTCN2020115759-appb-000001
其中,第一部分表示第一IR的一条语句,及其相关的目标变量(i1和i2)的变化范围,Mark为标记节点,其中,”realize_UB”表示片上缓存的类型为Unified buffer,即:Mark:“realize_UB”表示与所述第二缓存(UB)对应的节点信息。相应的,与第二缓存(UB)对应的节点信息下包含与第二缓存(UB)对应的数据迁移信息和数据切分信息。
其中,第二部分表示张量的数据迁移信息以及数据切分信息,即从内存到片上缓存UB的搬移映射关系,其中i1,i2表示语句相关的目标变量,arg0,arg1表示张量的轴变量,arg0’,arg1’表示第一张量的切分轴变量。
需要说明的是,由于片上缓存之间的存储量大小可能会不同,因此需要对张量进行切分,使得后续每一次张量的迁移过程都是基于切分后的张量进行的。
具体的,[i1,i2]->L1read[[[i1,i2]->A[arg0,arg1]]->A_local_L1[arg0’,arg1’]]表示第一张量由内存迁移到片上缓存UB映射关系,->表示一层映射关系。其中,[[i1,i2]->A[arg0,arg1]]表示目标变量i1,i2至所述第一存储位置(内存)的张量的映射,[i1,i2]->A[arg0,arg1]]->A_local_L1[arg0’,arg1’]表示所述第一存储位置的张量(内存)到第二存储位置的张量(UB)的映射,其中A_local_L1[arg0’,arg1’]表示所述张量在每次迁移至所述第二存储位置时需要进行数据切分。
可选的,所述张量包括多个维度,每个维度对应于一个轴变量(例如上述的arg0和arg1),所述轴变量用于表示对应维度的张量大小,所述数据切分信息包括多个轴变量和与每个轴变量对应的切分轴变量,所述切分轴变量用于表示张量迁移时对应维度的张量大小
可选的,所述第二IR还包括至少一个目标变量,以及每个目标变量的取值范围,所述轴变量与所述至少一个目标变量线性相关,所述轴变量通过所述至少一个目标变量,以及所述至少一个目标变量的取值范围来表示对应维度的张量大小。
本申请实施例中,轴变量对应的维度的张量大小可以由目标变量的线性组合以及取值范围来表达,示例性的:
for(io,0,265){
    for(i1,0,512)
    }
arg0>=64i0and 0<=arg0<=255and arg0<=63+64i0and arg1>=512i1and0<=arg1<=511and arg1<=511+512i1,上述表达了一种轴变量arg0以及轴变量arg1对应的维度的张量大小的表示方式的示意,其中,arg0对应的维度的张量大小由i0来表达,arg1对应的维度的张量大小由i1来表达。
需要说明的是,上述轴变量的表达方式仅为一种示意,这里并不限定。
可选的,第二IR还可以包括:所述切分轴变量与所述轴变量之间的大小关系。例如,第二IR可以包括每一次迁移过程中切分轴变量arg0’与轴变量arg0之间的大小关系,以及切分轴变量arg1’与轴变量arg1之间的大小关系。
需要说明的是,本申请实施例中的一个轴变量对应于张量的一个维度,多个轴变量可以表示张量的大小。切分轴变量为轴变量进行切分(或划分)后得到的一部分轴变量。其中,切分可以指将轴变量划分为多个子轴变量,例如将64*64的轴变量划分为两个32*64的切分轴变量。
例如,若规定2arg1’=arg1,则相当于张量在迁移过程中,轴变量arg1每次是按照一半的轴变量arg1的大小进行迁移的(轴变量的大小取决于对应的目标变量的取值范围),此时,需要两次才能完成,轴变量arg1的迁移。
通过上述方式,编译器可以构建出能表达出张量在不同存储位置之间的迁移(包括迁移的方向和迁移的大小)的中间表达。
需要说明的是,尽管未示出,第二IR还可以包括与计算语句中的算子相关的运算语句,由于本申请仅关注于张量在片上缓存之间的迁移过程,关于算子的编译本申请不再赘述。
可选的,在一种实施例中,编译器可以获取第一数据流信息中最靠前的数据流信息(例如图4中为内存(A)至UE(A),根据当前获取到的数据流信息,确定该数据流信息在第二IR中的插入位置,如上述实施例中描述的,此时,数据流信息对应的插入位置为“realize UB”,并基于张量的大小确定张量的切分大小,可以是对张量对应的轴变量的切分。
编译器获取在当前切分空间下,张量在当前切分下在不同片上缓存直接的搬移信息,
示例性的,搬移信息可以如下所示;
[[io,i1]->A[arg0,arg1]]->A_local_l1[arg0’,arg1’];
2arg0’=arg0,2arg1’=arg1
其中,i0,i1表示语句的相关目标变量;A[arg0,arg1]表示张量A,及其轴变量;A_local_L1[arg0’,arg1’]表示片上缓存L1上的张量A_local_L1以及该张量的轴变量; 2arg0’=arg0,2arg1’=arg1表示张量A_local_L1与张量A之间的轴变量的关系。
编译器可以基于上述得到的片上缓存间的搬运信息,执行普锐伯自映射运算(presburger operation identity)。普锐伯自映射运算主要实现一个自身到自动的映射关系:I={i->i;i∈S}
示例性的,可以得到如下映射到自身的映射:
[[[io,i1]->A[arg0,arg1]]->A_local_l1[arg0’,arg1’]]->
[[[io,i1]->A[arg0,arg1]]->A_local_l1[arg0’,arg1’]];
编译器可以基于得到的自映射信息,执行普锐伯域乘运算(Presburger operation DFDP)。其中,DFDP是一个domain域的乘法映射关系变换:DFDP={i->k:j:[i->j]->k∈S}
示例性的,可以得到如下乘法映射结果:
[io,i1]->[[[io,i1]->A[arg0,arg1]]->A_local_l1[arg0’,arg1’]]。
得到乘法映射结果后,可以设置范围域的读写信息;
示例性的,可以得到如下数据迁移信息和数据切分信息:
[io,i1]->L1read[[[io,i1]->A[arg0,arg1]]->A_local_l1[arg0’,arg1’]]。
其中,L1read表示数据的读取。
编译器可以将该上述数据迁移信息和数据切分信息插入到“realize UB”的子节点下,得到第二IR。
本申请实施例提供了一种构建中间表达的方法,包括:获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包括张量和算子,其中所述算子所表示的运算基于计算单元执行,所述计算单元用于对依次通过第一存储位置和第二存储位置迁移的数据执行运算,所述张量为执行所述算子所表示的运算使用的数据;基于所述计算语句生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的第一迁移路径包括由所述第一存储位置到所述第二存储位置,所述数据切分信息表示所述张量迁移至所述第二存储位置时所述张量的大小。通过上述方式,编译器可以构建出能表达出张量在不同存储位置(片上缓存或内存)之间的迁移(包括迁移的方向和迁移的大小)的中间表达,可以应用在针对于DSA的AI芯片上的IR的构建。
参照图5,图5为本申请实施例提供的一种构建中间表达的方法的实施例示意图,如图5中示出的那样,构建中间表达的方法包括:
501、编译器获取第一中间表达IR。
步骤501的你具体描述可以参照步骤401,此处不再赘述。
502、编译器基于所述计算语句生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括所述张量的第二数据迁移信息,所述第二数据迁移信息表示所述张量由所述第一存储位置依次通过至少一个第三迁移位置迁移至所述第二存储位置。
503、编译器基于所述第一存储位置与所述第二存储位置之间存在直连的数据传输通路,生成第二数据流信息,其中所述第二数据流信息包括所述第一数据迁移信息。
可选的,编译器可以生成所述计算语句对应的第一数据流信息,所述第一数据流信息 包括所述张量的第二数据迁移信息,所述第二数据迁移信息表示所述张量由第一存储位置依次通过至少一个第三迁移位置迁移至第二存储位置,并基于所述第一存储位置与所述第二存储位置之间存在直连的数据传输通路,其中,(直连的数据传输通路可以理解为:第一存储位置与所述第二存储位置之间没有其他的存储位置,张量从第一存储位置可以不经过其他存储位置,直接传输到第二存储位置),生成所述第一数据流信息,所述第一数据流信息包括所述第一数据迁移信息。可选的,所述第三迁移位置指示片上缓存上的存储位置。
本申请实施例中,在计算语句包括多个算子的场景中,编译器可以生成所述计算语句对应的第二数据流信息,如图6中示出的那样,初始化的第二数据流信息包括了很多冗余的迁移过程,如图7中示出的那样,虚线指出了可优化的可选迁移路径,例如,从UB(F(A))、DDR(F(A))至L1(F(A))的迁移路线中(UB对应于第一存储位置,DDR对应于至少一个第二迁移位置,L1对应于第二存储位置),此时,参照图3,由于UB buffer至L1buffer之间存在数据传输通路,因此,UB buffer可以直接将张量F(A)传递至L1buffer。相应的,如图8中示出的那样,此时第一数据流信息包括包括与所述张量F(A)相关联的第一数据迁移信息,所述第一数据迁移信息表示所述张量由第一存储位置(UB buffer)迁移至第二存储位置(L1buffer)。
可选的,可以对每条迁移赋予一定的权值,权值越高,表示迁移对性能影响的代价越大。数据流权值表可以参照如下的表1的示意:
表1
Figure PCTCN2020115759-appb-000002
其中,DDR的权值表示从内存中获取张量对性能的代价的量化。L1/UB的权值表示从L1buffer和从UB buffer中获取张量对性能的代价的量化。L01/L0B/L0C的权值表示从L01buffer、L0B buffer和L0C buffer中获取张量对性能的代价的量化。
需要说明的是,编译器还需要标记数据流信息中每条边的属性是数据搬移,还是数据计算。只有数据搬移的边,在该处理流程中可以优化掉,而对于数据计算的边,在优化时是不能优化掉的。
例如,图6中示出的,从DDR(A)至UB(F(A))的路线,由于其实涉及了对张量A进行数据计算(得到F(A)),因此,DDR(A)至UB(F(A))的迁移路线不能删掉。
编译器可以遍历数据流图中所有节点,并计算所有入度为零的开始节点集(例如图7中的DDR(A)、DDR(B)和DDR(D))和出度为零的结束节点集(例如图7中的DDR(RES)),得到该数据流图的所有可行的路径表,该路径表可以表示从开始节点集到结束节点集的可能路径。
计算每条路径对应的权值,该权值为路径上所有迁移过程权值之和,并确定权值之和最小的路径对应的数据流信息为第一数据流信息。例如,如图8中的示出的那样,图8示 出了优化后的第一数据流图。
本申请实施例中,获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包括张量;基于所述计算语句生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括所述张量的第二数据迁移信息,所述第二数据迁移信息表示所述张量由所述第一存储位置依次通过至少一个第三迁移位置迁移至所述第二存储位置;基于所述第一存储位置与所述第二存储位置之间存在直连的数据传输通路,生成第二数据流信息,其中所述第二数据流信息包括所述第一数据迁移信息。通过上述方式,将所述第二数据迁移信息中多余的迁移路径删掉,在保证可以迁移到第二存储位置的基础上,减小了张量的搬移代价,减少了系统的开销。
参照图9,图9为本申请实施例提供的一种编译器的结构示意图,如图9中示出的那样,编译器包括:
获取单元901,用于获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包括张量和算子,其中所述算子所表示的运算基于计算单元执行,所述计算单元用于对依次通过第一存储位置和第二存储位置迁移的数据执行运算,所述张量为执行所述算子所表示的运算使用的数据;
处理单元902,用于基于所述计算语句生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的第一迁移路径包括由所述第一存储位置到所述第二存储位置,所述数据切分信息表示所述张量迁移至所述第二存储位置时所述张量的大小。
可选的,所述计算单元还用于对通过第二迁移路径的数据执行运算,所述第二迁移路径包括由所述第一存储位置,经过至少一个第三存储位置,到所述第二存储位置,所述第一迁移路径与所述第二迁移路径为不同的数据迁移路径。
可选的,所述至少一个第三存储位置指示片上缓存上的存储位置。
可选的,所述处理模块还用于:基于所述计算语句生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括所述张量的第二数据迁移信息,所述第二数据迁移信息表示所述张量由所述第一存储位置依次通过至少一个第三迁移位置迁移至所述第二存储位置;
基于所述第一存储位置与所述第二存储位置之间存在直连的数据传输通路,生成第二数据流信息,其中所述第二数据流信息包括所述第一数据迁移信息。
可选的,所述第一存储位置指示片上缓存上的存储位置,所述第二存储位置指示片上缓存上的存储位置。
可选的,所述第一存储位置指示内存,所述第二存储位置指示片上缓存上的存储位置。
可选的,所述张量包括多个维度,每个维度对应于一个轴变量,所述轴变量用于表示对应维度的张量大小,所述数据切分信息包括多个轴变量和与每个轴变量对应的切分轴变量,所述切分轴变量用于表示张量迁移时对应维度的张量大小。
可选的,所述第二IR还包括至少一个目标变量,以及每个目标变量的取值范围,所述 轴变量与所述至少一个目标变量线性相关,所述轴变量通过所述至少一个目标变量,以及所述至少一个目标变量的取值范围来表示对应维度的张量大小。
可选的,所述第二IR为树结构的IR,所述树结构包括根节点和子节点,所述根节点对应于所述计算语句,所述子节点对应于所述第二存储位置,其中,所述第二IR包括与所述第二存储位置对应的子节点的信息,所述与所述第二存储位置对应的子节点的信息包括所述第一数据迁移信息和所述数据切分信息。
可选的,所述第二IR还包括:数据搬移信息,所述数据搬移信息表示所述张量被从所述第一存储位置搬移至所述第二存储位置。
可选的,所述处理单元还用于:
根据所述计算单元的类型确定第一存储位置和第二存储位置。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介 质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (24)

  1. 一种构建中间表达的方法,其特征在于,包括:
    获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包括张量和算子,其中所述算子所表示的运算基于计算单元执行,所述计算单元用于对依次通过第一存储位置和第二存储位置迁移的数据执行运算,所述张量为执行所述算子所表示的运算使用的数据;
    基于所述计算语句生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的第一迁移路径包括由所述第一存储位置到所述第二存储位置,所述数据切分信息表示所述张量迁移至所述第二存储位置时所述张量的大小。
  2. 根据权利要求1所述的方法,其特征在于,所述计算单元还用于对通过第二迁移路径的数据执行运算,所述第二迁移路径包括由所述第一存储位置,经过至少一个第三存储位置,到所述第二存储位置,所述第一迁移路径与所述第二迁移路径为不同的数据迁移路径。
  3. 根据权利要求2所述的方法,其特征在于,所述至少一个第三存储位置指示片上缓存上的存储位置。
  4. 根据权利要求2或3所述的方法,其特征在于,所述方法还包括:
    基于所述计算语句生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括所述张量的第二数据迁移信息,所述第二数据迁移信息表示所述张量由所述第一存储位置依次通过至少一个第三迁移位置迁移至所述第二存储位置;
    基于所述第一存储位置与所述第二存储位置之间存在直连的数据传输通路,生成第二数据流信息,其中所述第二数据流信息包括所述第一数据迁移信息。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述第一存储位置指示片上缓存上的存储位置,所述第二存储位置指示片上缓存上的存储位置。
  6. 根据权利要求1至5任一所述的方法,其特征在于,所述第一存储位置指示内存,所述第二存储位置指示片上缓存上的存储位置。
  7. 根据权利要求1至6任一所述的方法,其特征在于,所述张量包括多个维度,每个维度对应于一个轴变量,所述轴变量用于表示对应维度的张量大小,所述数据切分信息包括多个轴变量和与每个轴变量对应的切分轴变量,所述切分轴变量用于表示张量迁移时对应维度的张量大小。
  8. 根据权利要求7所述的方法,其特征在于,所述第二IR还包括至少一个目标变量,以及每个目标变量的取值范围,所述轴变量与所述至少一个目标变量线性相关,所述轴变量通过所述至少一个目标变量,以及所述至少一个目标变量的取值范围来表示对应维度的张量大小。
  9. 根据权利要求1至8任一所述的方法,其特征在于,所述第二IR为树结构的IR,所述树结构包括根节点和子节点,所述根节点对应于所述计算语句,所述子节点对应于所述第二存储位置,其中,所述第二IR包括与所述第二存储位置对应的子节点的信息,所述与所述第二存储位置对应的子节点的信息包括所述第一数据迁移信息和所述数据切分信 息。
  10. 根据权利要求1至9任一所述的方法,其特征在于,所述第二IR还包括:数据搬移信息,所述数据搬移信息表示所述张量被从所述第一存储位置搬移至所述第二存储位置。
  11. 根据权利要求1至10任一所述的方法,其特征在于,所述方法还包括:
    根据所述计算单元的类型确定第一存储位置和第二存储位置。
  12. 一种编译器,其特征在于,包括:
    获取单元,用于获取第一中间表达IR,所述第一IR包括计算语句,所述计算语句包括张量和算子,其中所述算子所表示的运算基于计算单元执行,所述计算单元用于对依次通过第一存储位置和第二存储位置迁移的数据执行运算,所述张量为执行所述算子所表示的运算使用的数据;
    处理单元,用于基于所述计算语句生成第二IR,所述第二IR包括第一数据迁移信息和数据切分信息,所述第一数据迁移信息表示所述张量的第一迁移路径包括由所述第一存储位置到所述第二存储位置,所述数据切分信息表示所述张量迁移至所述第二存储位置时所述张量的大小。
  13. 根据权利要求12所述的编译器,其特征在于,所述计算单元还用于对通过第二迁移路径的数据执行运算,所述第二迁移路径包括由所述第一存储位置,经过至少一个第三存储位置,到所述第二存储位置,所述第一迁移路径与所述第二迁移路径为不同的数据迁移路径。
  14. 根据权利要求13所述的编译器,其特征在于,所述至少一个第三存储位置指示片上缓存上的存储位置。
  15. 根据权利要求12或13所述的编译器,其特征在于,所述处理模块还用于:基于所述计算语句生成所述计算语句对应的第一数据流信息,所述第一数据流信息包括所述张量的第二数据迁移信息,所述第二数据迁移信息表示所述张量由所述第一存储位置依次通过至少一个第三迁移位置迁移至所述第二存储位置;
    基于所述第一存储位置与所述第二存储位置之间存在直连的数据传输通路,生成第二数据流信息,其中所述第二数据流信息包括所述第一数据迁移信息。
  16. 根据权利要求12至15任一所述的编译器,其特征在于,所述第一存储位置指示片上缓存上的存储位置,所述第二存储位置指示片上缓存上的存储位置。
  17. 根据权利要求12至16任一所述的编译器,其特征在于,所述第一存储位置指示内存,所述第二存储位置指示片上缓存上的存储位置。
  18. 根据权利要求12至17任一所述的编译器,其特征在于,所述张量包括多个维度,每个维度对应于一个轴变量,所述轴变量用于表示对应维度的张量大小,所述数据切分信息包括多个轴变量和与每个轴变量对应的切分轴变量,所述切分轴变量用于表示张量迁移时对应维度的张量大小。
  19. 根据权利要求18所述的编译器,其特征在于,所述第二IR还包括至少一个目标变量,以及每个目标变量的取值范围,所述轴变量与所述至少一个目标变量线性相关,所述轴变量通过所述至少一个目标变量,以及所述至少一个目标变量的取值范围来表示对应 维度的张量大小。
  20. 根据权利要求12至19任一所述的编译器,其特征在于,所述第二IR为树结构的IR,所述树结构包括根节点和子节点,所述根节点对应于所述计算语句,所述子节点对应于所述第二存储位置,其中,所述第二IR包括与所述第二存储位置对应的子节点的信息,所述与所述第二存储位置对应的子节点的信息包括所述第一数据迁移信息和所述数据切分信息。
  21. 根据权利要求12至20任一所述的编译器,其特征在于,所述第二IR还包括:数据搬移信息,所述数据搬移信息表示所述张量被从所述第一存储位置搬移至所述第二存储位置。
  22. 根据权利要求12至21任一所述的编译器,其特征在于,所述处理单元还用于:
    根据所述计算单元的类型确定第一存储位置和第二存储位置。
  23. 一种计算机系统,其特征在于,包括:处理器和存储器;
    所述处理器和所述存储器电连接;
    所述处理器用于调用所述存储器中的代码开执行如权利要求1至11任一所述的方法。
  24. 一种计算机存储介质,其特征在于,所述计算机可读存储介质存储指令,当所述指令在计算机上运行时,使得所述计算机执行权利要求1至11中任一项所述的方法。
PCT/CN2020/115759 2019-09-18 2020-09-17 一种构建中间表达的方法、编译器和服务器 WO2021052391A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20865946.6A EP4024202A4 (en) 2019-09-18 2020-09-17 METHOD FOR CONSTRUCTING AN INTERMEDIATE REPRESENTATION, COMPILER AND SERVER
US17/697,305 US11789709B2 (en) 2019-09-18 2022-03-17 Intermediate representation construction method, compiler, and server

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201910896548.1 2019-09-18
CN201910896548 2019-09-18
CN201911271859.5 2019-12-11
CN201911271859.5A CN112527305A (zh) 2019-09-18 2019-12-11 一种构建中间表达的方法、编译器和服务器

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/697,305 Continuation US11789709B2 (en) 2019-09-18 2022-03-17 Intermediate representation construction method, compiler, and server

Publications (1)

Publication Number Publication Date
WO2021052391A1 true WO2021052391A1 (zh) 2021-03-25

Family

ID=74884329

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/115759 WO2021052391A1 (zh) 2019-09-18 2020-09-17 一种构建中间表达的方法、编译器和服务器

Country Status (3)

Country Link
US (1) US11789709B2 (zh)
EP (1) EP4024202A4 (zh)
WO (1) WO2021052391A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102457152B1 (ko) * 2021-06-16 2022-10-20 주식회사 모레 프로그램의 중간표현에 대한 최적화 적용 가능성을 판단하는 방법 및 시스템
KR102457153B1 (ko) * 2021-06-16 2022-10-20 주식회사 모레 프로그램에 대한 중간 표현을 관리하는 방법 및 시스템
WO2022265410A1 (ko) * 2021-06-16 2022-12-22 주식회사 모레 중간표현을 생성하는 방법 및 시스템
WO2022265413A1 (ko) * 2021-06-16 2022-12-22 주식회사 모레 가속기에서 실행되는 프로그램에 대한 중간표현을 생성하는 방법 및 시스템

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180136912A1 (en) * 2016-11-17 2018-05-17 The Mathworks, Inc. Systems and methods for automatically generating code for deep learning systems
CN108292241A (zh) * 2015-10-28 2018-07-17 谷歌有限责任公司 处理计算图
CN108292374A (zh) * 2015-11-09 2018-07-17 谷歌有限责任公司 训练表示为计算图的神经网络
CN108345937A (zh) * 2017-01-06 2018-07-31 谷歌有限责任公司 循环与库融合
WO2018217222A1 (en) * 2017-05-26 2018-11-29 The Charles Stark Draper Laboratory, Inc. Machine intelligence and learning for graphic chip accessibility and execution

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8468432B2 (en) * 2009-07-01 2013-06-18 Silicon Motion, Inc. Coder-decoder and method for encoding and decoding an error correction code
US11036392B2 (en) * 2013-02-26 2021-06-15 Pure Storage, Inc. Determining when to use convergent encryption
US11728964B2 (en) * 2014-07-31 2023-08-15 Pure Storage, Inc. Performance aided data migration in a distributed storage network
EP3241310B1 (en) * 2015-01-02 2019-07-31 Systech Corporation Control infrastructure
US9715502B1 (en) 2015-03-25 2017-07-25 Amazon Technologies, Inc. Distributed data migration using chunking
US9817643B2 (en) 2015-07-17 2017-11-14 Microsoft Technology Licensing, Llc Incremental interprocedural dataflow analysis during compilation
US9715373B2 (en) * 2015-12-18 2017-07-25 International Business Machines Corporation Dynamic recompilation techniques for machine learning programs
US10592213B2 (en) 2016-10-19 2020-03-17 Intel Corporation Preprocessing tensor operations for optimal compilation
WO2018094087A1 (en) 2016-11-17 2018-05-24 The Mathworks, Inc. Systems and methods for generating code for parallel processing units
US20190130270A1 (en) 2017-10-27 2019-05-02 Wave Computing, Inc. Tensor manipulation within a reconfigurable fabric using pointers
WO2019127538A1 (zh) 2017-12-29 2019-07-04 深圳市大疆创新科技有限公司 数据处理方法、设备、dma控制器及计算机可读存储介质
CN110515872B (zh) 2018-05-21 2020-07-31 阿里巴巴集团控股有限公司 直接内存存取方法、装置、专用计算芯片及异构计算系统
CN109558565B (zh) 2018-11-30 2023-04-07 上海寒武纪信息科技有限公司 运算方法、装置及相关产品
CN109558348A (zh) 2018-12-19 2019-04-02 深圳开立生物医疗科技股份有限公司 数据搬移方法、装置及系统
US11422785B2 (en) * 2019-07-23 2022-08-23 Paypal, Inc. Container orchestration framework

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108292241A (zh) * 2015-10-28 2018-07-17 谷歌有限责任公司 处理计算图
CN108292374A (zh) * 2015-11-09 2018-07-17 谷歌有限责任公司 训练表示为计算图的神经网络
US20180136912A1 (en) * 2016-11-17 2018-05-17 The Mathworks, Inc. Systems and methods for automatically generating code for deep learning systems
CN108345937A (zh) * 2017-01-06 2018-07-31 谷歌有限责任公司 循环与库融合
WO2018217222A1 (en) * 2017-05-26 2018-11-29 The Charles Stark Draper Laboratory, Inc. Machine intelligence and learning for graphic chip accessibility and execution

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102457152B1 (ko) * 2021-06-16 2022-10-20 주식회사 모레 프로그램의 중간표현에 대한 최적화 적용 가능성을 판단하는 방법 및 시스템
KR102457153B1 (ko) * 2021-06-16 2022-10-20 주식회사 모레 프로그램에 대한 중간 표현을 관리하는 방법 및 시스템
WO2022265411A1 (ko) * 2021-06-16 2022-12-22 주식회사 모레 프로그램의 중간표현에 대한 최적화 적용 가능성을 판단하는 방법 및 시스템
WO2022265410A1 (ko) * 2021-06-16 2022-12-22 주식회사 모레 중간표현을 생성하는 방법 및 시스템
WO2022265412A1 (ko) * 2021-06-16 2022-12-22 주식회사 모레 프로그램에 대한 중간 표현을 관리하는 방법 및 시스템
WO2022265413A1 (ko) * 2021-06-16 2022-12-22 주식회사 모레 가속기에서 실행되는 프로그램에 대한 중간표현을 생성하는 방법 및 시스템

Also Published As

Publication number Publication date
EP4024202A4 (en) 2022-10-26
EP4024202A1 (en) 2022-07-06
US11789709B2 (en) 2023-10-17
US20220206765A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
WO2021052391A1 (zh) 一种构建中间表达的方法、编译器和服务器
US10310812B2 (en) Matrix ordering for cache efficiency in performing large sparse matrix operations
US9798648B2 (en) Transitive source code violation matching and attribution
US8479171B2 (en) Generating test sets using intelligent variable selection and test set compaction
US8869113B2 (en) Software architecture for validating C++ programs using symbolic execution
US9678730B2 (en) Decision tree ensemble compilation
US8943487B2 (en) Optimizing libraries for validating C++ programs using symbolic execution
US8381094B1 (en) Incremental visual comparison of web browser screens
EP3438813A1 (en) Component management platform
US10666774B2 (en) Message processing
WO2017181866A1 (en) Making graph pattern queries bounded in big graphs
Sagebaum et al. Expression templates for primal value taping in the reverse mode of algorithmic differentiation
US20120192162A1 (en) Optimizing Handlers for Application-Specific Operations for Validating C++ Programs Using Symbolic Execution
WO2023040372A1 (zh) 一种基于图算法的ai建模流程编排方法和系统
US9658938B2 (en) Iterative test generation based on data source analysis
US11934927B2 (en) Handling system-characteristics drift in machine learning applications
US20200409670A1 (en) Automatic software generation for computer systems
US20160350090A1 (en) Information processing apparatus, method of compiling, and storage medium
JP5761200B2 (ja) 情報処理装置
US10140202B1 (en) Source code annotation for a system on chip
JP5932707B2 (ja) 計算機、プログラム及びデータ生成方法
CN112527305A (zh) 一种构建中间表达的方法、编译器和服务器
US10255394B1 (en) Reduced overhead for massive parallel processing
US10891412B1 (en) Offline analysis of hierarchical electronic design automation derived data
US20230119724A1 (en) Derivation Graph Querying Using Deferred Join Processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20865946

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020865946

Country of ref document: EP

Effective date: 20220330