CN110399124B - Code generation method, device, equipment and readable storage medium - Google Patents

Code generation method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN110399124B
CN110399124B CN201910655665.9A CN201910655665A CN110399124B CN 110399124 B CN110399124 B CN 110399124B CN 201910655665 A CN201910655665 A CN 201910655665A CN 110399124 B CN110399124 B CN 110399124B
Authority
CN
China
Prior art keywords
code
many
core
target
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910655665.9A
Other languages
Chinese (zh)
Other versions
CN110399124A (en
Inventor
朱效民
赵雅倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201910655665.9A priority Critical patent/CN110399124B/en
Publication of CN110399124A publication Critical patent/CN110399124A/en
Application granted granted Critical
Publication of CN110399124B publication Critical patent/CN110399124B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/447Target code generation

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Devices For Executing Special Programs (AREA)
  • Stored Programmes (AREA)

Abstract

The application discloses a code generation method, which comprises the following steps: acquiring a target code for realizing Stencil calculation; carrying out parameter transmission on a target array processed by the target code to generate a parameter transmission code; distributing each target cycle to different cores of the heterogeneous many-core processor according to the cycle mark of the target cycle to generate a task distribution code; the target cycle is a row or a column in the target array; converting a target operation code in the target code into a many-core subcode; the parameter passing code, the task allocation code, and the many-core subcodes are combined into a many-core code capable of running on a heterogeneous many-core processor. According to the method and the device, the complex Stencil computing codes can be converted into the many-core codes which can be run by the heterogeneous many-core processor, so that the generation efficiency and accuracy of the many-core codes are improved. Accordingly, the code generation device, the apparatus and the readable storage medium disclosed in the present application also have the above technical effects.

Description

Code generation method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of software development technologies, and in particular, to a code generation method, apparatus, device, and readable storage medium.
Background
The Stencil calculation is a common calculation mode in the field of calculation, and has the remarkable characteristics that: there are many loop computations (e.g., for loops), with high density and complexity. The method has important application value in the aspects of ocean numerical simulation, climate and weather calculation and the like. Today, processors of the many-core architecture can significantly improve computational efficiency, but due to the heterogeneous nature of many-core processors, they cannot directly run code for implementing the Stencil computation.
If a many-core processor is required to run codes for realizing the Stencil calculation, technicians are required to manually many-core original codes, but the efficiency of manual conversion is low due to the circularity and complexity of the Stencil calculation. Moreover, manual conversion inevitably causes omission, so that the accuracy of the obtained many-core code is difficult to guarantee. The many-core code is a code capable of running in a heterogeneous many-core processor.
Therefore, how to improve the generation efficiency and accuracy of many-core codes is a problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, an object of the present application is to provide a code generation method, apparatus, device and readable storage medium, so as to improve generation efficiency and accuracy of many-core codes. The specific scheme is as follows:
in a first aspect, the present application provides a code generation method, including:
acquiring a target code for realizing Stencil calculation;
carrying out parameter transmission on a target array processed by the target code to generate a parameter transmission code;
distributing each target cycle to different cores of the heterogeneous many-core processor according to the cycle mark of the target cycle to generate a task distribution code; the target cycle is a row or a column in the target array;
converting a target operation code in a target code into a many-core subcode, wherein the many-core subcode at least comprises a many-core calculation code, a memory management code and a data loading and storing code;
the parameter passing code, the task allocation code, and the many-core subcodes are combined into a many-core code capable of running on a heterogeneous many-core processor.
Preferably, the parameter passing the target array processed by the target code to generate the parameter passing code comprises:
and creating a static array, adding the name, the dimension and the address of the target array, and the start-stop index of each dimension to the static array, and generating a parameter transfer code.
Preferably, converting the target operation code in the target code into many-core subcode comprises:
converting a calculation equation in the target operation code according to a character string conversion rule to generate a many-core calculation code;
memory management code and data load and/or store back code are generated from the array to which the many-core computing code relates.
Preferably, generating the memory management code and the data load and/or store-back code from the array to which the many-core computation code relates comprises:
analyzing an array related to the many-core calculation code, and inserting the name of the array and column information corresponding to the name into the dynamic array;
generating a memory management code according to the dynamic array;
the data load and store-back code is generated according to the computational equations in the many-core computational code.
Preferably, generating the data load and store-back code according to a computation equation in the many-core computation code comprises:
determining attribute information of an array related to the many-core calculation code according to a calculation equation in the many-core calculation code;
and generating data loading and/or saving codes according to the attribute information.
Preferably, generating the data load and/or store-back code according to the attribute information comprises:
determining a reusable array in the arrays related to the many-core calculation code, and adding a reusable mark to the reusable array;
and generating data loading and/or saving codes according to the reusable marks and the attribute information.
Preferably, after the parameter delivery code, the task allocation code and the many-core subcode are combined into the many-core code capable of running in the heterogeneous many-core processor, the method further comprises the following steps:
and controlling the multi-core code to run in the heterogeneous multi-core processor, and returning a running result.
In a second aspect, the present application provides a code generation apparatus, comprising:
the acquisition module is used for acquiring a target code for realizing Stencil calculation;
the parameter transmission module is used for transmitting parameters of a target array processed by the target codes and generating parameter transmission codes;
the task allocation module is used for allocating each target cycle to different cores of the heterogeneous many-core processor according to the cycle mark of the target cycle and generating a task allocation code; the target cycle is a row or a column in the target array;
the conversion module is used for converting the target operation code in the target code into a many-core subcode, wherein the many-core subcode at least comprises a many-core calculation code, a memory management code and a data loading and storing code;
and the combination module is used for combining the parameter transmission codes, the task allocation codes and the many-core subcodes into many-core codes capable of running in the heterogeneous many-core processor.
In a third aspect, the present application provides a code generation apparatus, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the code generation method disclosed in the foregoing.
In a fourth aspect, the present application provides a readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the code generation method disclosed in the foregoing.
According to the scheme, the application provides a code generation method, which comprises the following steps: acquiring a target code for realizing Stencil calculation; carrying out parameter transmission on a target array processed by the target code to generate a parameter transmission code; distributing each target cycle to different cores of the heterogeneous many-core processor according to the cycle mark of the target cycle to generate a task distribution code; the target cycle is a row or a column in the target array; converting a target operation code in a target code into a many-core subcode, wherein the many-core subcode at least comprises a many-core calculation code, a memory management code and a data loading and storing code; the parameter passing code, the task allocation code, and the many-core subcodes are combined into a many-core code capable of running on a heterogeneous many-core processor.
As can be seen, after obtaining the code for implementing the stenil calculation, the present application first performs parameter transfer on the target array processed by the target code to generate a parameter transfer code, where the function of the parameter transfer code is: and determining the relevant information of the array required to be calculated by the current Stencil calculation. And then distributing each target cycle to different cores of the heterogeneous many-core processor according to the cycle mark of the target cycle to generate a task distribution code, wherein the task distribution code has the following functions: and allocating the rows or columns in the target array to different cores for processing, such as: lines 1-10 are assigned to core number 1. And then converting the operation codes related in the target codes into multi-core subcodes which can be identified by the heterogeneous multi-core processor, wherein the multi-core subcodes have the following functions: each core in the heterogeneous many-core processor is made to know which operations need to be executed, and the memory and related data needed by the operations are determined, namely the functions of the many-core calculation code, the memory management code and the data loading and storing code. And finally, combining the generated codes to obtain the many-core code capable of running in the heterogeneous many-core processor, wherein the many-core code can realize Stencil calculation. The method and the device can analyze the complex Stencil calculation codes, so that corresponding code segments can be generated, the code segments can be combined into a many-core code which can be operated by a heterogeneous many-core processor, and the generation efficiency and accuracy of the many-core code are improved.
Accordingly, the code generation device, the apparatus and the readable storage medium provided by the present application also have the above technical effects.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a first code generation method disclosed herein;
FIG. 2 is a flow chart of a second method of code generation as disclosed herein;
FIG. 3 is a schematic diagram of a code generation framework disclosed herein;
FIG. 4 is a schematic diagram of a code generation apparatus disclosed herein;
fig. 5 is a schematic diagram of a code generation apparatus disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Currently, the prior art needs technicians to manually nucleate the original code, but the efficiency of manual transformation is low due to the circularity and complexity of the Stencil calculation. Moreover, manual conversion inevitably causes omission, so that the accuracy of the obtained many-core code is difficult to guarantee. Therefore, the code generation scheme is provided, and the generation efficiency and accuracy of the many-core code can be improved.
Referring to fig. 1, an embodiment of the present application discloses a first code generation method, including:
s101, acquiring a target code for realizing Stencil calculation;
s102, parameter transmission is carried out on a target array processed by the target code, and a parameter transmission code is generated;
in this embodiment, the parameter passing the array information of the target array processed by the target code to generate the parameter passing code includes: and creating a static array, adding the name, the dimension and the address of the target array, and the start-stop index of each dimension to the static array, and generating a parameter transfer code. The static array is mainly used for parameter transmission, namely: and performing many-core processing on the target array processed by the target code, so that the heterogeneous many-core processor can identify data needing to be processed.
S103, distributing each target cycle to different cores of the heterogeneous many-core processor according to the cycle mark of the target cycle, and generating a task distribution code; the target cycle is a row or a column in the target array;
the target array processed by the target code is generally a multidimensional array, and the row or the column can be regarded as one target loop in the processing process, so that a plurality of target loops exist, and each target loop has a loop mark. Thus, each target loop may be assigned to a different core based on the loop label, allowing different cores in a heterogeneous many-core processor to process different data.
It should be noted that the columns have column marks, the rows have row marks, and the column marks and the row marks are the cycle marks in this embodiment. The specific process of task allocation in this embodiment can be seen in the following: if the target array is a 100-row two-dimensional array with 0-99 rows of data, where 0, 1, and 2 … … 99 are the loop labels for each row, if a task allocation is performed in rows, knowing that a heterogeneous many-core processor has 10 cores and each core can be allocated 10 rows of data, the result of the task allocation can be: lines 0-9 are assigned to core number 1, lines 10-19 are assigned to core number 2, and so on, until the two-dimensional array is assigned.
The operation mode of each core for processing data is the same, and only the processing objects are different, namely: the processing data is different. And because each core has no sequential limitation on the processing of the data, each core can process the data in parallel, thereby improving the processing efficiency.
S104, converting a target operation code in the target code into a many-core sub-code, wherein the many-core sub-code at least comprises a many-core calculation code, a memory management code and a data loading and storing code;
for each calculation code in the target code, in addition to the need of converting the calculation code into a character string which can be recognized by a heterogeneous many-core processor, a memory and data required by the memory need to be provided for the running of each calculation code. Therefore, when converting the target operation code in the target code, the corresponding generation of many-core calculation code, memory management code and data load and store-back code is needed so as to operate the many-core calculation code, the memory management code and the data load and store-back code. The many-core calculation codes are calculation codes which can be identified by the heterogeneous many-core processor, and the memory management codes and the data loading and storing codes correspond to the many-core calculation codes. The memory management code comprises: memory allocation codes and memory release codes. Memory management here is to manage the memory of a heterogeneous many-core processor.
It should be noted that the target operation code in this embodiment refers to a statement of any one loop operation in the target code. If the target code implements a cyclic addition (e.g., C is a + B, and a and B are both assignable parameters), the target opcode refers to the computation of any addition, i.e., C _ part is a _ part + B _ part.
And S105, combining the parameter transmission codes, the task allocation codes and the many-core subcodes into many-core codes capable of running in the heterogeneous many-core processor.
Therefore, each core in the heterogeneous many-core processor can know which operation needs to be executed and determine the memory and related data needed by the operation, so that the parameter transfer code, the task allocation code and the many-core sub-code are combined to obtain the many-core code capable of running in the heterogeneous many-core processor, and the many-core code can realize Stencil calculation. A heterogeneous many-core processor such as SW 26010.
When combining these codes, it is necessary to combine them in the order of processing data. For example: the memory allocation code needs to be before the many-core computation code runs, and the memory release code needs to be after the many-core computation code runs.
It can be understood that the heterogeneous many-core processor is an accelerator, which can only perform computation operation and cannot implement logic operation (such as data communication), and therefore after the heterogeneous many-core processor runs the many-core code, the running result needs to be returned to the CPU so that the CPU can perform subsequent operation. Namely, after the parameter delivery code, the task allocation code and the many-core subcode are combined into the many-core code which can run on the heterogeneous many-core processor, the method further comprises the following steps: and controlling the multi-core code to run in the heterogeneous multi-core processor, and returning a running result.
As can be seen, after obtaining the code for implementing the stenil calculation, in the embodiment of the present application, first, parameter transfer is performed on a target array processed by a target code, and a parameter transfer code is generated, where the function of the parameter transfer code is: and determining the relevant information of the array required to be calculated by the current Stencil calculation. And then distributing each target cycle to different cores of the heterogeneous many-core processor according to the cycle mark of the target cycle to generate a task distribution code, wherein the task distribution code has the following functions: and allocating the rows or columns in the target array to different cores for processing, such as: lines 1-10 are assigned to core number 1. And then converting the operation codes related in the target codes into multi-core subcodes which can be identified by the heterogeneous multi-core processor, wherein the multi-core subcodes have the following functions: each core in the heterogeneous many-core processor is made to know which operations need to be executed, and the memory and related data needed by the operations are determined, namely the functions of the many-core calculation code, the memory management code and the data loading and storing code. And finally, combining the generated codes to obtain the many-core code capable of running in the heterogeneous many-core processor, wherein the many-core code can realize Stencil calculation. The method and the device can analyze the complex Stencil calculation codes, so that corresponding code segments can be generated, the code segments can be combined into a many-core code which can be operated by a heterogeneous many-core processor, and the generation efficiency and accuracy of the many-core code are improved.
Referring to fig. 2, an embodiment of the present application discloses a second code generation method, including:
s201, acquiring a target code for realizing Stencil calculation;
s202, parameter transmission is carried out on a target array processed by the target code, and a parameter transmission code is generated;
s203, distributing each target cycle to different cores of the heterogeneous many-core processor according to the cycle mark of the target cycle, and generating a task distribution code; the target cycle is a row or a column in the target array;
s204, converting a calculation equation in the target operation code according to a character string conversion rule to generate a many-core calculation code;
s205, generating memory management codes and data loading and/or storing codes according to the arrays related to the many-core calculation codes.
S206, combining the parameter transmission codes, the task allocation codes, the many-core calculation codes, the memory management codes and the data loading and/or storing codes into the many-core codes capable of running on the heterogeneous many-core processor.
In this embodiment, generating the memory management code and the data load and/or store-back code according to the array involved by the many-core computation code includes: analyzing an array related to the many-core calculation code, and inserting the name of the array and column information corresponding to the name into the dynamic array; generating a memory management code according to the dynamic array; the data load and store-back code is generated according to the computational equations in the many-core computational code.
The dynamic array corresponds to the static array. The static array information is determined, and the array name, the dimension and the start-stop coordinates of each dimension are determined. The dynamic array is related to calculation, the dynamic information is related to each calculation clause, and the dynamic information mainly comprises: for the current calculation division (i.e. the corresponding row number or column number), which rows or columns of the current array participate in the calculation (in the same calculation target code, there may be a plurality of adjacent rows or columns participate in the calculation); determining whether the lines or columns corresponding to the data need to be loaded or copied according to the equation assignment information; and determining whether the corresponding adjacent row numbers or column numbers exist in the data according to the calculated adjacent row numbers or column numbers. This information is associated with the particular computing object code, is dynamically changing, and is therefore referred to as a dynamic array.
In this embodiment, generating the data load and store-back code according to the computation equation in the many-core computation code comprises: determining attribute information of an array related to the many-core calculation code according to a calculation equation in the many-core calculation code; and generating data loading and/or saving codes according to the attribute information.
Specifically, an array in the dynamic array has certain attribute information, and the attribute information indicates whether the current array needs to be loaded and stored back. Wherein, loading means: the array is transferred from the main memory on the CPU side in the traditional sense to the heterogeneous many-core processor. The storage refers to: and transmitting the array from the heterogeneous many-core processor to a main memory at the CPU end in the traditional sense for storage. Data load and/or store back code may be generated based on the attribute information. Because the array to the left of the equation typically needs to be loaded, the array to the right of the equation typically needs to be saved back.
In this embodiment, generating the data loading and/or saving code according to the attribute information includes: determining a reusable array in the arrays related to the many-core calculation code, and adding a reusable mark to the reusable array; data load and/or restore codes are generated based on the reusable markers and the attribute information to reduce unnecessary data loads.
It should be noted that since different many-core computation codes may involve the same array, data multiplexing may be implemented based on this aspect. For example: the first many-core computation code computes A + B, and the second many-core computation code computes B + C, where B is a reusable array. Specifically, calculating a + B requires loading a and B from the CPU side, and calculating B + C requires loading B and C from the CPU side, and B can be loaded only once because both are used for B. Namely: a and B are loaded when A + B is calculated, and after the calculation is finished, A is deleted and B is reserved, so that when B + C is calculated, the calculation of B + C can be finished only by loading C, and the multiplexing of B is realized. The A, B, C are typically adjacent row or column data.
It should be noted that other implementation steps in this embodiment are the same as or similar to those in the above embodiment, and therefore, the description of this embodiment is omitted here.
As can be seen from the above, after obtaining the code for implementing the stenil calculation, the embodiment of the present application first performs parameter transfer on the target array processed by the target code, and generates a parameter transfer code, where the function of the parameter transfer code is: and determining the relevant information of the array required to be calculated by the current Stencil calculation. And then distributing each target cycle to different cores of the heterogeneous many-core processor according to the cycle mark of the target cycle to generate a task distribution code, wherein the task distribution code has the following functions: and allocating the rows or columns in the target array to different cores for processing, such as: lines 1-10 are assigned to core number 1. And then converting the operation codes related in the target codes into multi-core subcodes which can be identified by the heterogeneous multi-core processor, wherein the multi-core subcodes have the following functions: each core in the heterogeneous many-core processor is made to know which operations need to be executed, and the memory and related data needed by the operations are determined, namely the functions of the many-core calculation code, the memory management code and the data loading and storing code. And finally, combining the generated codes to obtain the many-core code capable of running in the heterogeneous many-core processor, wherein the many-core code can realize Stencil calculation. The method and the device can analyze the complex Stencil calculation codes, so that corresponding code segments can be generated, the code segments can be combined into a many-core code which can be operated by a heterogeneous many-core processor, and the generation efficiency and accuracy of the many-core code are improved.
The following code generation framework can be constructed according to the core idea provided by the present application, and refer to fig. 3 in detail. In the code generation framework shown in fig. 3, it includes: an input part and an output part; wherein the input section includes: the system comprises an array dimension module, a circulation subscript module and a calculation mode module; the output section includes: the system comprises a parameter transmission module, a task division module (namely the above mentioned task allocation), a memory allocation and recovery module and a calculation module, wherein the calculation module in the output part comprises: the device comprises a data loading unit, a data calculating unit and a data storing unit. See fig. 3 for a mapping relationship between the input part and the output part.
Specifically, the array dimension module is used for describing an array to which the code to be processed relates, the loop subscript module is used for distinguishing loop subscripts of each target loop, and the calculation mode module is used for describing a calculation mode of each target operation code. The parameter transmission module can carry out many-core processing on the array related to the code to be processed, so that the heterogeneous many-core processor can identify data needing to be processed; the task dividing module is used for dividing each target cycle to different cores; the memory allocation and recovery module is used for allocating and recovering the memory for each many-core calculation code; the computing module may multi-core the computing code, provide memory and data for its operation, obtain the multi-core computing code and data load and/or store back code. The computing module comprises a data loading unit, a data computing unit and a data storage unit, wherein the data loading unit, the data computing unit and the data storage unit are used for generating corresponding data loading and/or storing codes for each many-core computing code. Of course, the memory allocation and reclamation module may also be built into the computing module.
Based on the framework, the information to be filled (i.e. the information depending on the input part) is only required to be combed according to the mapping relation in fig. 3, and then the fixed part of the code is determined, so that the corresponding many-core code can be obtained.
If the automatic generation of the code is realized according to the framework shown in fig. 3, the method specifically includes:
and establishing a static array according to the array dimension module of the input part. The static array comprises the name, the dimension and the address of a target array related to the code to be processed, and the start-stop index of each dimension, so that the subsequent operation can be conveniently searched according to the name. Static arrays may be stored structured for subsequent management and retrieval.
The task division does not need any intermediate information, the mapping between the circulation subscript of each target circulation and the task division is established, the inherent codes in the framework can complete the task division according to the mapping, and the task division codes are generated. The iteration index in each task also needs to be filled into the frame. The iteration index in each task is: the iteration markers upon which each core processes multiple target loops.
Respectively converting the target operation code in the target code into a many-core subcode, which specifically comprises the following steps:
(a) carrying out character string processing on any target operation code, and distinguishing elements participating in calculation, wherein the elements mainly comprise: constants, variables, operators, arrays (including subscript information). The method is characterized in that the constants, variables and operators in the data are not specially processed, and the arrays in the data are analyzed and processed, and specifically the method comprises the following steps:
and adding information such as the name, the dimension, the address of the array, the start-stop index of each dimension and the like to the dynamic array. If the dynamic array has a corresponding name, finding a corresponding storage position and adding other information; and if the dynamic array does not have the corresponding name, creating the corresponding name and adding other information. And marking column information which needs to be accessed by the current target operation code, namely the column information corresponding to the current dynamic array. Because, in general, current computation needs to use elements with adjacent spatial positions but not adjacent storage positions in addition to adjacent elements. For example: if the code to be processed is a Fortran code and a C language code runs in a heterogeneous many-core processor, the positions of the same elements are changed because columns in the Fortran code are rows in the C language code.
(b) And converting the calculation equation in the target operation code into a many-core calculation code. The specific conversion mode is as follows: the name of the array corresponding to the calculation equation is kept unchanged, the suffix is added with 'slave' to indicate that the array runs in a many-core, the column information corresponding to the array is based on J, and J +1 is converted into JA 1. Wherein A represents an addition, i.e., add; 1 translates to the corresponding string "1". Thus, the array corresponding to the calculation equation in the target operation code can be presented by using the many-core calculation code in the form of columns (for example, Fortran, the column storage is continuous).
In addition, it is also necessary to determine the starting index of the dimension according to the name of the array, and subtract the starting index from the C language code. Such as: "Drhs (I, j + 1)" is converted to "Drhs _ JA1_ slave [ I-IminS ]", where "IminS" is the index (i.e., subscript) of the array in the I dimension (i.e., row).
(c) And determining attribute information of the arrays involved by the many-core calculation code according to the calculation equation in the many-core calculation code, namely determining whether the arrays on both sides of the calculation equation need to be loaded or stored back. For example: for a calculation equation, the array on the left side of the equation is typically required to be loaded back, unless it has been marked to be stored back (i.e., its value has been changed in the previous calculation, so this step does not require reloading of data); thus all arrays to the right of the equation are marked as loaded and the arrays to the left of the equation are marked as saved back.
After a certain target operation code is processed according to the steps (a), (b) and (c), a corresponding many-core calculation code can be generated, and the code can be temporarily stored in a temporary variable and is combined in a subsequent unification mode.
Further, the memory management code may be generated according to the dynamic array, and the memory management code includes: memory allocation codes and memory release codes. Namely: and determining the memory required by the running of each many-core computing code so as to carry out memory allocation and release. Specifically, according to the dynamic array in (a), column information corresponding to the name of the dynamic array is determined, a storage space is established according to the column information, and memory allocation is performed, so that a memory allocation code can be generated, and after the current many-core calculation code is run, a corresponding memory release code can be generated. It should be noted that the target operation code and the many-core calculation code are calculated identically, so that the related arrays of the target operation code and the many-core calculation code are identical, and the memory management code can be generated according to the corresponding dynamic array.
And (c) determining whether the array needs to be loaded or stored back according to the attribute information in the step (c), thereby generating data loading and/or storing back codes, temporarily storing the obtained codes in a temporary variable, and merging the codes in a subsequent unification mode. In the step of loading array, array multiplexing can be realized. Specifically, if the first many-core calculation code calculates 1+2 and the second many-core calculation code calculates 2+3, where 1, 2, and 3 are column indexes. Then 2 corresponds to column data that is a reusable array. In one calculation, the line data corresponding to the non-maximum line index can be loaded to the temporary variables of the preloading statement outside the loop in advance, and the line data corresponding to the maximum index can be loaded to the temporary variables of the iterative loading statement according to loop iteration in the loop. Wherein the cycle is: an inner iterative loop of a many-core computation code, which generally processes a continuous row or column of data (in C, a row of data is continuous, and in Fortran, a column of data is continuous); the circulation is as follows: code outside the iterative loop is computed.
At this point, the task division code, the many-core calculation code, the memory management code and the data loading and/or storing code are combined, so that the many-core code which can run in the heterogeneous many-core processor is obtained.
Therefore, the many-core code capable of running on the heterogeneous many-core processor can be automatically generated based on the original code, the coding and debugging time is obviously reduced, and the generation efficiency and accuracy of the many-core code are improved.
To further illustrate the technical effects of the present application, the following tests were performed on the same batch of data.
Specifically, aiming at 50 cycles in a common ocean numerical mode, the implementation time of manual implementation and debugging is about 1.5 months, but by utilizing the method provided by the invention, many-core codes can be generated in the time of minute magnitude, and the work which can be completed within 1 month can be completed within 1 day by the aid of implementation of other necessary integral computing frameworks.
Secondly, the efficiency of the many-core code generated by the invention is higher, and the comparison effect is shown in table 1, so that the code generated automatically is basically consistent with the code efficiency of the high-level programmer for manual programming and optimization, and the performance is superior to the code efficiency manually written by the common programmer.
TABLE 1
Version(s) Calculating time(s) Acceleration ratio
CPU version 1154 1
OpenACC version 522 2.21
Version without data multiplexing 305 3.784
Primary programmer's version 304 3.796
Advanced programmer version 299 3.85
The invention realizes the data multiplexing version 300 3.847
In the following, a code generation apparatus provided by an embodiment of the present application is introduced, and a code generation apparatus described below and a code generation method described above may be referred to each other.
Referring to fig. 4, an embodiment of the present application discloses a code generation apparatus, including:
an obtaining module 401, configured to obtain a target code for implementing a stenil calculation;
a parameter transferring module 402, configured to perform parameter transferring on a target array processed by a target code, and generate a parameter transferring code;
the task allocation module 403 is configured to allocate each target cycle to a different core of the heterogeneous many-core processor according to the cycle label of the target cycle, and generate a task allocation code; the target cycle is a row or a column in the target array;
a conversion module 404, configured to convert a target operation code in the target code into a many-core sub-code, where the many-core sub-code at least includes a many-core calculation code, a memory management code, and a data load and store code;
a combining module 405 to combine the parameter passing code, the task assigning code, and the many-core subcodes into a many-core code capable of running on a heterogeneous many-core processor.
In a specific embodiment, the parameter passing module is specifically configured to:
and creating a static array, adding the name, the dimension and the address of the target array, and the start-stop index of each dimension to the static array, and generating a parameter transfer code.
In one embodiment, the conversion module comprises:
the character string conversion unit is used for converting a calculation equation in the target operation code according to a character string conversion rule to generate a many-core calculation code;
and the data processing unit is used for generating memory management codes and data loading and/or storing back codes according to the arrays related to the many-core calculation codes.
In one embodiment, the data processing unit comprises:
the analysis subunit is used for analyzing the array related to the many-core calculation code and inserting the name of the array and the column information corresponding to the name into the dynamic array;
the first generation subunit is used for generating a memory management code according to the dynamic array;
and the second generation subunit is used for generating the data loading and storing code according to the calculation equation in the many-core calculation code.
In a specific embodiment, the second generating subunit is specifically configured to:
determining attribute information of an array related to the many-core calculation code according to a calculation equation in the many-core calculation code;
and generating data loading and/or saving codes according to the attribute information.
In a specific embodiment, the second generating subunit is specifically configured to:
determining a reusable array in the arrays related to the many-core calculation code, and adding a reusable mark to the reusable array;
and generating data loading and/or saving codes according to the reusable marks and the attribute information.
In a specific embodiment, the method further comprises the following steps:
and the control module is used for controlling the many-core code to run in the heterogeneous many-core processor and returning a running result.
For more specific working processes of each module and unit in this embodiment, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not described here again.
Therefore, the embodiment provides a code generation device which can automatically generate the many-core code capable of running in the heterogeneous many-core processor based on the original code. The time for coding and debugging is obviously reduced, and the generation efficiency and the accuracy of the many-core code are improved.
In the following, a code generation device provided by an embodiment of the present application is introduced, and a code generation device described below and a code generation method and apparatus described above may be referred to each other.
Referring to fig. 5, an embodiment of the present application discloses a code generation apparatus, including:
a memory 501 for storing a computer program;
a processor 502 for executing the computer program to implement the method disclosed in any of the embodiments above.
A readable storage medium provided by the embodiments of the present application is introduced below, and a readable storage medium described below and a code generation method, apparatus, and device described above may be referred to each other.
A readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the code generation method disclosed in the foregoing embodiments. For the specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, which are not described herein again.
References in this application to "first," "second," "third," "fourth," etc., if any, are intended to distinguish between similar elements and not necessarily to describe a particular order or sequence. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, or apparatus.
It should be noted that the descriptions in this application referring to "first", "second", etc. are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of readable storage medium known in the art.
The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (8)

1. A code generation method, comprising:
acquiring a target code for realizing Stencil calculation;
performing parameter transmission on the target array processed by the target code to generate a parameter transmission code;
distributing each target cycle to different cores of the heterogeneous many-core processor according to the cycle mark of the target cycle to generate a task distribution code; the target cycle is a row or a column in the target array;
converting a target operation code in the target code into a many-core subcode, wherein the many-core subcode at least comprises a many-core calculation code, a memory management code and a data loading and/or storing code;
combining the parameter delivery code, the task allocation code and the many-core subcode into a many-core code capable of running on the heterogeneous many-core processor;
wherein, the parameter transferring the target array processed by the target code to generate a parameter transferring code comprises:
creating a static array, adding the name, the dimension and the address of the target array, and the start-stop index of each dimension to the static array, and generating the parameter transfer code;
wherein the converting the target operation code in the target code into many-core subcode comprises:
converting a calculation equation in the target operation code according to a character string conversion rule to generate the many-core calculation code;
and generating the memory management code and the data loading and/or storing code according to the array related to the many-core computing code.
2. The code generation method of claim 1, wherein generating the memory management code and the data load and/or store-back code according to the array to which the many-core computation code relates comprises:
analyzing an array related to the many-core calculation code, and inserting the name of the array and column information corresponding to the name into a dynamic array;
generating the memory management code according to the dynamic array;
and generating the data loading and storing code according to a calculation equation in the many-core calculation code.
3. The code generation method of claim 2, wherein generating the data load and store-back code according to a computation equation in the many-core computation code comprises:
determining attribute information of an array related to the many-core calculation code according to a calculation equation in the many-core calculation code;
and generating the data loading and/or storing back code according to the attribute information.
4. The code generation method of claim 3, wherein the generating the data load and/or store-back code according to the attribute information comprises:
determining a reusable array in the arrays related to the many-core calculation code, and adding a reusable mark to the reusable array;
and generating the data loading and/or saving code according to the reusable mark and the attribute information.
5. The code generation method of any of claims 1-4, wherein the combining the parameter passing code, the task assignment code, and the many-core subcode after the many-core code that is executable on the heterogeneous many-core processor further comprises:
and controlling the many-core code to run in the heterogeneous many-core processor, and returning a running result.
6. A code generation apparatus, comprising:
the acquisition module is used for acquiring a target code for realizing Stencil calculation;
the parameter transmission module is used for transmitting parameters of the target array processed by the target codes and generating parameter transmission codes;
the task allocation module is used for allocating each target cycle to different cores of the heterogeneous many-core processor according to the cycle mark of the target cycle and generating a task allocation code; the target cycle is a row or a column in the target array;
the conversion module is used for converting the target operation code in the target code into a many-core subcode, wherein the many-core subcode at least comprises a many-core calculation code, a memory management code and a data loading and storing code;
the combination module is used for combining the parameter transmission codes, the task allocation codes and the many-core subcodes into many-core codes capable of running on the heterogeneous many-core processor;
wherein, the parameter transmission module is specifically configured to:
creating a static array, adding the name, the dimension and the address of the target array, and the start-stop index of each dimension to the static array, and generating the parameter transfer code;
wherein, the conversion module includes:
the character string conversion unit is used for converting a calculation equation in the target operation code according to a character string conversion rule to generate the many-core calculation code;
and the data processing unit is used for generating the memory management code and the data loading and/or storing back code according to the array related to the many-core calculation code.
7. A code generation apparatus, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the code generation method of any of claims 1 to 5.
8. A readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the code generation method of any one of claims 1 to 5.
CN201910655665.9A 2019-07-19 2019-07-19 Code generation method, device, equipment and readable storage medium Active CN110399124B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910655665.9A CN110399124B (en) 2019-07-19 2019-07-19 Code generation method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910655665.9A CN110399124B (en) 2019-07-19 2019-07-19 Code generation method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN110399124A CN110399124A (en) 2019-11-01
CN110399124B true CN110399124B (en) 2022-04-22

Family

ID=68324714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910655665.9A Active CN110399124B (en) 2019-07-19 2019-07-19 Code generation method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN110399124B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324355B (en) * 2020-02-11 2022-05-31 苏州浪潮智能科技有限公司 Method and system for debugging many-core code
CN113869801B (en) * 2021-11-30 2022-06-14 阿里云计算有限公司 Maturity state evaluation method and device for enterprise digital middleboxes

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3221301A (en) * 1959-10-13 1965-11-30 Graphic Arts Res Foundation Apparatus for recognition and recording of text matter
CN102968388A (en) * 2012-10-26 2013-03-13 无锡江南计算技术研究所 Method and device for structuring data
CN105242909A (en) * 2015-11-24 2016-01-13 无锡江南计算技术研究所 Method for many-core circulation partitioning based on multi-version code generation
CN105242962A (en) * 2015-11-24 2016-01-13 无锡江南计算技术研究所 Quick lightweight thread triggering method based on heterogeneous many-core
CN107729118A (en) * 2017-09-25 2018-02-23 复旦大学 Towards the method for the modification Java Virtual Machine of many-core processor
CN108541321A (en) * 2016-02-26 2018-09-14 谷歌有限责任公司 Program code is mapped to the technique of compiling of the programmable graphics processing hardware platform of high-performance, high effect

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9292414B2 (en) * 2012-11-26 2016-03-22 Nvidia Corporation System, method, and computer program product for debugging graphics programs locally utilizing a system with a single GPU

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3221301A (en) * 1959-10-13 1965-11-30 Graphic Arts Res Foundation Apparatus for recognition and recording of text matter
CN102968388A (en) * 2012-10-26 2013-03-13 无锡江南计算技术研究所 Method and device for structuring data
CN105242909A (en) * 2015-11-24 2016-01-13 无锡江南计算技术研究所 Method for many-core circulation partitioning based on multi-version code generation
CN105242962A (en) * 2015-11-24 2016-01-13 无锡江南计算技术研究所 Quick lightweight thread triggering method based on heterogeneous many-core
CN108541321A (en) * 2016-02-26 2018-09-14 谷歌有限责任公司 Program code is mapped to the technique of compiling of the programmable graphics processing hardware platform of high-performance, high effect
CN107729118A (en) * 2017-09-25 2018-02-23 复旦大学 Towards the method for the modification Java Virtual Machine of many-core processor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向局部性和并行优化的循环分块技术;刘松;《计算机研究与发展》;20150531;1160-1176 *

Also Published As

Publication number Publication date
CN110399124A (en) 2019-11-01

Similar Documents

Publication Publication Date Title
US11093526B2 (en) Processing query to graph database
US11243816B2 (en) Program execution on heterogeneous platform
DE112012003714T5 (en) Compiling code for an extended application binary interface (ABI) with decryption time instruction optimization
DE112012003780T5 (en) Associate code for an extended application binary interface (ABI) with decryption time instruction optimization
CN110399124B (en) Code generation method, device, equipment and readable storage medium
CN103034499A (en) General data format conversion method and system for network data transmission of airborne equipment
US10102043B2 (en) Method and system for mapping an integral into a thread of a parallel architecture
US11733983B2 (en) Method and apparatus for generating metadata by a compiler
US9864693B2 (en) Data processing method, information processing device, and recording medium
CN105302525A (en) Parallel processing method for reconfigurable processor with multilayer heterogeneous structure
CN116467061B (en) Task execution method and device, storage medium and electronic equipment
CN115525287A (en) Multi-stage compiler architecture
CN105404611A (en) Matrix model based multi-calculation-engine automatic selection method
Fang et al. Aristotle: A performance impact indicator for the OpenCL kernels using local memory
CN105637474B (en) System and method for register distribution
CN111158901B (en) Optimization method, optimization device, computer equipment and storage medium for calculation graph
CN110163791B (en) GPU processing method and device of data computation flow graph
CN111712811A (en) Scalable graphic SLAM for HD maps
US20210182041A1 (en) Method and apparatus for enabling autonomous acceleration of dataflow ai applications
JP6519228B2 (en) Data allocation determination device, data allocation determination program, and data allocation determination method
CN110333870B (en) Simulink model variable distribution processing method, device and equipment
US20060070050A1 (en) Method and apparatus for reducing instruction dependencies in extended SSA form instructions
Zhu et al. An auto code generator for stencil on SW26010
CN107256158B (en) Method and system for detecting load reduction of power system
CN116049029B (en) Memory sharing method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant