WO2015032331A1 - Procédé de compilation de programme opencl et compilateur - Google Patents

Procédé de compilation de programme opencl et compilateur Download PDF

Info

Publication number
WO2015032331A1
WO2015032331A1 PCT/CN2014/085885 CN2014085885W WO2015032331A1 WO 2015032331 A1 WO2015032331 A1 WO 2015032331A1 CN 2014085885 W CN2014085885 W CN 2014085885W WO 2015032331 A1 WO2015032331 A1 WO 2015032331A1
Authority
WO
WIPO (PCT)
Prior art keywords
data transmission
mode
data
operation data
transmission mode
Prior art date
Application number
PCT/CN2014/085885
Other languages
English (en)
Chinese (zh)
Inventor
刘颖
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015032331A1 publication Critical patent/WO2015032331A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation

Definitions

  • the present application relates to the field of computer processing technologies, and more particularly to an OpenCL program compilation method and compiler.
  • OpenCL Open Computing Language
  • OpenCL Open Computing Language
  • the OpenCL program is mainly divided into two parts: the device program and the host program.
  • the device program when a heterogeneous system is composed of a CPU and a GPU, when the program running on the CPU is a host program, the program running on the GPU is a device program.
  • the execution process of the OpenCL program mainly includes: the host program controls the data transfer from the host end to the device end, the device end executes the device program to process the data, and the host program control transfers the processed result data from the device end to the host end.
  • the OpenCL program provides two data transfer modes, namely, a copy mode and a map mode.
  • the copy mode refers to copying data from the host memory to the device memory, or from the device memory to the host memory. Since the data needs to be copied and transferred in the system, the OpenCL program takes a long time in the data transfer phase in the copy mode, but When the device program is executed, because the data is already in the device memory, the device program execution phase takes a short time; the mapping mode means that during the data transfer phase, only the device memory to host memory mapping relationship is established, and the data is still located at the host. In the memory, the data transfer phase takes a short time, but when the device program is executed, it needs to access the data in the host memory, which causes the device execution phase to take a long time.
  • Taiwan and other features choose the appropriate data transfer mode to write OpenCL program, but the existing way, the user subjectivity is greater, and can not effectively guarantee the execution efficiency of OpenCL program.
  • the application provides an OpenCL program compilation method and a compiler to solve the technical problem that the efficiency of the OpenCL program cannot be effectively guaranteed in the prior art.
  • an open computing language OpenCL program compilation method including:
  • a compiled execution code file is generated in accordance with the compiled data transfer mode.
  • the calculating a program execution consumption time of the operation data in the first data transmission mode and the second data transmission mode respectively includes:
  • a second possible implementation manner of the first aspect is further provided.
  • the first data transmission mode is a replication mode
  • the second data transmission mode is a mapping mode
  • the verifying that the operation data is processed according to the second data transmission mode, and whether the operation data is secure comprises:
  • the verifying the operation data is processed according to the second data transmission mode, and whether the operation data is secure comprises:
  • a third possible implementation manner of the first aspect is further provided, where the first data transmission mode is a replication mode, When the second data transmission mode is the mapping mode; or, the first data transmission mode is a mapping mode, and the second data transmission mode is a replication mode;
  • the calculating the execution consumption time of the operation data in the first data transmission mode and the second data transmission mode respectively includes:
  • the data transfer time calculated in the mapping mode and the sum of device program executions are taken as the execution time of the operation data in the mapping mode.
  • a fourth possible implementation manner of the first aspect is further provided, where the total amount of memory access to the operation data is defined according to a source program file. The number of work items for the device program and the amount of memory access data for the unit work item are calculated.
  • a fifth possible implementation manner of the first aspect is further provided, where the data transmission rate, the memory access rate of the access device end, or the access host end
  • the memory access rate is predetermined based on the hardware characteristics of the current heterogeneous system execution hardware platform.
  • a compiler comprising:
  • a mode determining module configured to acquire a source program file of the OpenCL program, and determine a first data transmission mode of the operation data defined in the source program file;
  • the execution consumption time includes a data transmission time of the operation data and a device program execution time
  • a mode selection module configured to select a data transmission mode that consumes less time as a compiled data transmission mode of the operation data when the source program file is compiled.
  • a compiling module is configured to generate a compiled execution code file according to the compiled data transfer mode.
  • the method further includes:
  • a verification module configured to verify whether the operation data is safe when the operation data is processed according to the second data transmission mode, and if so, trigger the calculation module.
  • a second possible implementation manner of the second aspect is further provided, where the verification module is specifically configured to: when the first data transmission mode is a replication mode, The second data transmission mode is a mapping mode, and analyzes whether there is a write operation of the operation data by the host end during the execution of the program, and if not, determining that the operation data is secure; or, when the first data mode is In the mapping mode, when the second data transmission mode is the replication mode, it is analyzed whether there is a write operation of the operation data by the device end during the execution of the program, and if not, the data security is determined.
  • the first data transmission mode is a replication mode
  • the second data transmission mode is a mapping mode
  • the first The data transmission mode is a mapping mode
  • the second data transmission mode is a replication mode
  • the calculation module includes:
  • a first transmission time calculation module configured to calculate a data transmission time of the operation data in the replication mode according to the total data volume of the operation data and the data transmission rate;
  • a first execution time calculation module configured to calculate a device program execution time of the operation data in the copy mode according to a total amount of memory accesses to the operation data and a memory access rate of the access device end during execution of the device program;
  • a first consumption time calculation module configured to use a sum of a data transmission time calculated in the copy mode and a device program execution time as an execution consumption time of the operation data in the copy mode
  • a second transmission time calculation module configured to calculate and eliminate time according to a mapping relationship between the host end and the device end, and calculate a data transmission time of the operation data in the mapping mode
  • a second execution time calculation module configured to calculate a device program execution time of the operation data in the mapping mode according to a total amount of memory accesses to the operation data and a memory access rate of the access host during execution of the device program;
  • a second consumption time calculation module configured to use the data transmission time calculated in the mapping mode and the sum of device program executions as the execution time of the operation data in the mapping mode
  • the embodiment of the present application provides an OpenCL program compiling method and a compiler, and the compiler obtains a source program file of the OpenCL program, and determines a definition in the source program file. a first data transmission mode of the operation data; calculating an execution consumption time of the operation data in the first data transmission mode and the second data transmission mode, respectively, selecting the data transmission mode in which the execution consumption time is small as the The compiled data transfer mode of the operation data when the source program file is compiled, and the compiled execution code file is generated according to the compiled data transfer mode.
  • the OpenCL program compiled according to the embodiment of the present application can reduce the program execution time, improve the program execution efficiency, and can effectively ensure the execution efficiency in different heterogeneous systems.
  • FIG. 1 is a flowchart of an embodiment of an OpenCL program compiling method according to an embodiment of the present application
  • FIG. 2 is a flowchart of another embodiment of an OpenCL program compiling method according to an embodiment of the present application
  • FIG. 3 is a flowchart of still another embodiment of an OpenCL program compiling method according to an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of an embodiment of a compiler according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of another embodiment of a compiler according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a computing module in a compiler according to an embodiment of the present disclosure
  • FIG. 7 is a schematic structural diagram of an embodiment of a computing device according to an embodiment of the present application.
  • the OpenCL program compiled according to the embodiment of the present application can reduce the program execution time, improve the program execution efficiency, and effectively ensure the execution efficiency in different heterogeneous systems.
  • FIG. 1 is a flowchart of an embodiment of an open operation language OpenCL program compiling method according to an embodiment of the present application, which may include the following steps:
  • the OpenCL program is mainly divided into two parts: the host program and the device program (that is, the Kernel program).
  • the host program runs on the host side and the device program runs on the device side.
  • Host program control will operate on data Transfer from the host to the device, and transfer the data from the device to the host after processing the data.
  • the device program is executed by the device side to complete the processing of the operation data.
  • OpenCL program provides two data transfer modes, copy mode and map mode.
  • Copy mode refers to copying operational data from host-side memory to device-side memory, or from device-side memory to host-side memory.
  • the mapping mode means that only the mapping relationship between device memory and host memory is established during the data transmission phase, and the operation data still exists in the host side memory.
  • the data transmission phase may take a long time, and the device program execution phase takes a short time; if the data transmission mode of the operation data is the mapping mode, the data transmission phase may take a short time, the device The program execution phase takes a long time.
  • the replication mode is mainly applicable to application scenarios in which data is transmitted once and used multiple times.
  • the mapping mode is mainly applicable to application scenarios with large data transmission volume and small amount of access.
  • the source program file is written by the user, but because the user's subjectivity is large and the user experience is high, the execution according to the data transfer mode of the operation data defined in the source program file cannot effectively guarantee the execution of the program execution. effectiveness.
  • OpenCL programs are well ported and can be executed in different heterogeneous systems, OpenCL programs are more efficient to execute in one heterogeneous system, but perform efficiency in another heterogeneous system. Not necessarily high.
  • the inventor has changed the thinking mode in the process of implementing the present invention. Since the OpenCL program needs to be compiled during the execution process, the source program file is changed into a binary language that can be recognized by the computer. Therefore, the embodiment of the present application utilizes The compiler has improved the compilation process when compiling the source program files.
  • the operation data defined in the source program file and the first data transmission mode of the operation data are determined by analyzing the source program file.
  • the first data transmission mode may be a copy mode or a mapping mode, and is known by a function defined in the source program file. For example, when the operation function corresponding to the operation data is clWriteBuffer, it indicates that the data transmission mode of the operation data is a copy mode, when corresponding The operation function is clEnqueueMapBuffer, which indicates that the data transfer mode of the operation data is the mapping mode.
  • the source file of a section of OpenCL program includes:
  • the defined operation data is A
  • the data type is Double (double-precision floating-point type)
  • the data transmission mode is known as the copy mode according to the function clWriteBuffer.
  • the operation data defined in the source program file may include a plurality of processing operations performed by the embodiments of the present application for each operation data.
  • the second data transmission mode is a data transmission mode different from the first data transmission mode.
  • the second data transmission mode when the first data transmission mode is the replication mode, the second data transmission mode is a mapping mode; the first data transmission When the mode is the mapping mode, the second data transmission mode is the copy mode.
  • the execution consumption time of the operation data in the first data transmission mode and the second data transmission mode that is, the execution consumption time in the copy mode and the mapping mode, respectively, is calculated.
  • the execution consumption time includes a data transmission time of the operation data and a device program execution time.
  • the data transfer time is related to the total data amount of the operation data and the data transfer rate.
  • the total amount of data for this operational data can be known from the definition of operational data in the OpenCL source.
  • the data transmission rate can be predetermined in conjunction with the hardware characteristics of the execution platform of the current heterogeneous system.
  • the data transfer time includes the time when the operation data is transferred from the host side memory to the device side memory, and the time from the device side memory to the host side memory, The secondary transmission time is approximately the same. Therefore, the data transmission time can be equal to twice the product of the total data amount of the operation data and the data transmission rate.
  • the device program execution time is related to the total amount of memory access to the operation data and the memory access rate to the device-side memory during the execution of the device program.
  • the total amount of memory access to the operational data can be calculated according to the number of work items of the device program and the amount of memory access data of the unit work item.
  • the number of work items and the amount of memory access data per unit of work items can be passed to the source program file. The analysis was obtained.
  • the operational data is not actually transmitted between the host and the device, but is achieved by establishing a mapping relationship.
  • the data transmission time in the mapping mode is determined by the time when the mapping relationship is established, which includes the mapping relationship establishment time and the mapping relationship elimination time.
  • the mapping relationship elimination time is the same as the mapping relationship establishment time, so the data transmission time in the mapping mode can be equal to the mapping.
  • the relationship is established or eliminated twice as much.
  • the execution time of the device program is related to the total amount of memory access to the operation data during the execution of the device program and the memory access rate of the host memory.
  • the total amount of memory access to the operational data can be calculated based on the number of work items of the device program and the amount of memory access data per unit of work.
  • the access rate of the device program to the device side memory and the memory access rate to the host side can be determined in advance in conjunction with the hardware characteristics of the current heterogeneous execution platform.
  • the data transmission mode with less consumption time may be selected as the operation data when the source program file is compiled. Compiling the data transfer mode, so that when compiling, the corresponding compiled execution code file is generated according to the compiled data transfer mode, so that when the OpenCL program is executed, the operation data is transmitted and processed according to the selected compiled data transfer mode, which can be shortened. Execution time is spent and execution efficiency is improved.
  • the source program file is acquired, and the first data transmission mode of the operation data defined in the source program is determined; and then the execution of the operation data in the first data transmission mode and the second data transmission mode respectively is respectively calculated.
  • the selected compiled data transmission mode processes the operation data, shortens the execution time, can effectively improve the execution efficiency, and when the program is transplanted to another heterogeneous system, the embodiment of the present application is adopted.
  • the program can determine that the operational data conforms to the data transmission mode of the heterogeneous system, thereby effectively ensuring the execution efficiency of the program in different heterogeneous systems.
  • FIG. 2 is a flowchart of another embodiment of an open computing language OpenCL program compiling method according to an embodiment of the present application, which may include the following steps:
  • step 202 Verify that the operation data is safe according to the second data transmission mode, and if yes, execute step 203, and if no, end the process.
  • the second data transmission mode is different from the first data transmission mode.
  • the second data transmission mode is a mapping mode; when the first data transmission mode is the mapping mode, the first The second data transmission mode is the copy mode.
  • Whether the operation data is safe or not can be determined by judging whether the operation data is processed according to the second data transmission mode, and whether the operation of the program is erroneous, for example, whether the operation data is consistent on the host side and the device side.
  • the second data transmission mode is a mapping mode
  • the host side and the device side respectively save the operation data, and if the program side performs the write operation on the operation data, the operation data will be operated with the device side. Inconsistent data.
  • the operation data is processed according to the mapping mode, the operation data exists only with the host side, and during the execution of the program, the operation data processed by the device side is consistent with the operation data of the host side, This will result in different processing than the operation data in the copy mode, making the operation data unsafe and an error in program execution.
  • the verifying the operation data is processed according to the second data transmission mode, and whether the operation data is secure may be:
  • the operation data exists only with the host side. If the operation data is processed according to the copy mode, the operation data exists on both the host side and the device side. If the device side has a write operation on the operation data, the data of the device side will change, but the data of the host side will not be changed at the same time. This will result in inconsistent data between the device and the host. When the operation data is processed in the copy mode, it is not safe, and an error occurs in program execution.
  • the operation flow of the embodiment is continued only when it is determined that the operation data is processed in accordance with the second data transmission mode.
  • the data flow analysis technology can be used to analyze the definition and usage of the data in the OpenCL source program to determine whether there is a host end or a device end pair. Write operation of the operation data.
  • the first data transmission mode may be a replication mode, and the second data transmission mode may be a mapping mode; or the first data transmission mode may be a mapping mode, and the second data transmission mode may be a replication mode. .
  • the calculating the execution consumption time of the operation data in the first data transmission mode and the second data transmission mode may include:
  • the data transmission time of the operation data in the copy mode is calculated according to the total data amount of the operation data and the data transmission rate.
  • the data transmission rate may be represented by a unit data transmission consumption time, and the data transmission time may be equal to twice the total data amount of the operation data and the unit data transmission consumption time.
  • the device program execution time of the operation data is calculated according to the total amount of memory accesses to the operation data and the memory access rate of the access device end during execution of the device program.
  • the memory access rate of the access device may be determined according to the hardware characteristics of the current heterogeneous system program execution platform.
  • the total amount of memory access to the operational data may be equal to: the number of work items of the device program and the amount of memory access data per unit of work items.
  • the work item work-item is the smallest execution unit.
  • the number of work items indicates how many units the computer is divided into.
  • the amount of memory access data for each work item can be known according to the definition in the OpenCL source program. Analysis of flow analysis techniques.
  • the sum of the data transfer time calculated in the copy mode and the device program execution time is taken as the execution time of the operation data in the copy mode.
  • the time is established and eliminated according to the mapping relationship between the host end and the device end, and the data transmission time of the operation data in the mapping mode is calculated.
  • mapping relationship establishment and elimination time can be predetermined based on the hardware characteristics of the heterogeneous system execution platform.
  • the device program execution time of the operation data is calculated according to the total amount of memory access of the operation data and the memory access rate of the access host during execution of the device program.
  • the memory access rate of the access host may be predetermined according to the hardware characteristics of the heterogeneous system execution platform.
  • the total amount of memory access to the operational data may be equal to the product of the number of work items of the device program and the amount of memory access data per unit of work item.
  • the sum of the data transmission time calculated in the mapping mode and the device program execution time is taken as the execution time of the operation data in the mapping mode.
  • the compiled data transmission mode may be the first of the operation data.
  • the OpenCL program can be run on the machine, reducing execution time and improving execution efficiency.
  • the source file is obtained, and the first data transmission mode of the operation data defined in the source program is determined, and the operation data is verified. If it is processed according to the second data transmission mode, when it is secure, Calculating an execution consumption time of the operation data in the first data transmission mode and the second data transmission mode, respectively, and selecting a data transmission mode in which the consumption time is small as a compiled data transmission mode of the operation data at the time of compiling, according to which Generate a compiled execution code file, so that when the program is running in the machine, the operation data can be processed according to the selected compiled data transmission mode, the execution time is shortened, the execution efficiency can be effectively improved, and the program is transplanted to another heterogeneous
  • the technical solution of the embodiment of the present application can be used to determine that the operation data conforms to the data transmission mode of the heterogeneous system, thereby ensuring the execution efficiency of the program in different heterogeneous systems.
  • FIG. 3 is a flowchart of another embodiment of an open computing language OpenCL program compiling method according to an embodiment of the present application.
  • the source of the following OpenCL program is used.
  • a fragment of the program file is an example:
  • the operation data defined in the source program file segment includes the operation data A and the operation data B, and the data transmission mode is the copy mode.
  • the following mainly introduces the operation data A as an example.
  • the processing procedure is similar to the operation data A, and will not be described again.
  • the method can include the following steps:
  • step 302 Verify whether the operation data is safe according to the mapping mode, and if yes, execute step 103, and if no, end the process.
  • the data transfer time Ct1 Vt*St*2 of the operation data A in the copy mode.
  • Vt is the total data amount of the operation data A. It can be known from the above program that the quantity type of the operation data A is a double-precision floating-point type, occupies 8 bytes, and the vector length of the operation data A is 65536, therefore, the operation data A The total amount of data is 65536*8B (bytes).
  • St is the unit data consumption time to indicate the data transmission rate. In this embodiment, it is assumed to be 4 cycles/B (4 clock cycles per byte).
  • the device program execution time Ca1 Va*Sab of the operation data A in the copy mode.
  • Va is the total amount of memory access of the device program to the operation data A
  • Nwi is 65536
  • the Sab refers to the unit data consumption time when the device program accesses the device-side memory, and is used to indicate the memory access rate to the device-side memory. Assume 4cycle/B.
  • the device program execution time Ca2 Va * Sam.
  • Va 128 KB.
  • the unit data consumes time, and the memory access rate to the host-side memory is assumed to be 16cycle/B.
  • mapping mode is the compiled data mode of the operation data A at compile time, so that when compiling, the data transmission mode of the operation data A will be operated. Make changes and generate a compiled execution code file corresponding to the mapping mode.
  • the operation data A is processed in accordance with the mapping mode, so that the execution processing time of the operation data A can be reduced, and the program execution efficiency is improved.
  • the operation data A can be determined according to the solution of the embodiment of the present application to conform to the data transmission mode of another heterogeneous system to ensure the program in the other heterogeneous system. effectiveness.
  • FIG. 4 is a schematic structural diagram of an embodiment of a compiler according to an embodiment of the present disclosure, where the compiler may include:
  • the mode determining module 401 is configured to acquire a source program file of the OpenCL program, and determine a first data transmission mode of the operation data defined in the source program file.
  • the first data transmission mode may be a copy mode or a mapping mode, as known by a function defined in the source program file.
  • the calculating module 402 is configured to calculate an execution consumption time of the operation data in the first data transmission mode and the second data transmission mode, respectively.
  • the second data transmission mode is different from the first data transmission mode.
  • the second data transmission mode is the mapping mode; when the first data transmission mode is the mapping mode, the second data transmission mode is the copy mode.
  • the execution consumption time of the operation data in the first data transmission mode and the second data transmission mode that is, the execution consumption time in the copy mode and the mapping mode, respectively, is calculated.
  • the execution consumption time includes a data transmission time of the operation data and a device program execution time.
  • the mode selection module 403 is configured to select a data transmission mode with a small consumption time as a compiled data transmission mode of the operation data when the OpenCL source program is compiled.
  • the compiling module 404 is configured to generate a compiled execution code file according to the compiled data transmission mode.
  • the data transmission mode with less consumption time can be selected as the operation data of the OpenCL source program compiling time. Compiling the data transfer mode, so that when compiling, the corresponding compiled execution code file is generated according to the compiled data transfer mode, so that when the OpenCL program is executed, the operation data is transmitted and processed according to the selected compiled data transfer mode, and the execution can be shortened. It takes time to improve execution efficiency.
  • the compiler when the compiler acquires the source program file for compiling, first determines a first data transmission mode of the operation data defined in the source program; and then separately calculates the operation data in the first data transmission mode and the second data respectively.
  • the execution time in the transfer mode consumes time, and the data transfer mode in which the consumption time is small is selected as the compiled data transfer mode of the operation data at the time of compiling, whereby the compiled execution code file can be generated, so that when the program is run in the machine,
  • the operation data can be processed according to the selected compiled data transmission mode, the execution time is shortened, the execution efficiency can be effectively improved, and the program is transplanted to another heterogeneous system, and the technical solution of the embodiment of the present application can be used. It is determined that the operational data conforms to the data transmission mode of the heterogeneous system, thereby ensuring the execution efficiency of the program in different heterogeneous systems.
  • FIG. 5 is a schematic structural diagram of another embodiment of a compiler according to an embodiment of the present disclosure, where the compiler may include:
  • the mode determining module 501 is configured to acquire a source program file of the OpenCL program, and determine a first data transmission mode of the operation data defined in the source program file.
  • the verification module 501 is configured to verify whether the operation data is safe when the operation data is processed according to the second data transmission mode.
  • Whether the operation data is safe or not can be determined by judging whether the operation data is processed according to the second data transmission mode, and whether the operation of the program is erroneous, for example, whether the operation data is consistent on the host side and the device side.
  • the verification module may be specifically configured to: when the first data transmission mode is a replication mode, and when the second data transmission mode is a mapping mode, analyze whether a host side writes the operation data during a program execution process. If not, determining that the operation data is safe according to the second data transmission mode; when the first data mode is a mapping mode, and the second data transmission mode is a replication mode, analyzing is performed during program execution Whether there is a write operation of the operation data by the device end, and if not, determining that the operation data is safe according to the second data transmission mode.
  • the data stream analysis technology may be used to analyze the definition and usage of the data in the source program file to determine whether there is a host end or a device end pair operation. Data write operation.
  • the calculating module 502 is configured to calculate a program execution consumption time when the operation data is in the first data transmission mode and the second data transmission mode, respectively, when the verification module 501 verifies the operation data security.
  • the second data transmission mode is different from the first data transmission mode, and the execution consumption time includes a data transmission time of the operation data and a device program execution time.
  • the first data transmission mode may be a replication mode, and the second data transmission mode may be a mapping mode; or the first data transmission mode may be a mapping mode, and the second data transmission mode may be a replication mode. .
  • the computing module may specifically include:
  • the first transmission time calculation module 601 is configured to calculate a data transmission time of the operation data in the replication mode according to the total data amount of the operation data and the data transmission rate.
  • the data transmission rate may be represented by a unit data transmission consumption time, and the data transmission time may be equal to twice the total data amount of the operation data and the unit data transmission consumption time.
  • the first execution time calculation module 602 is configured to calculate a device program execution time of the operation data in the replication mode according to a total amount of memory accesses to the operation data and a memory access rate of the access device end during execution of the device program.
  • the total amount of memory accesses to the operation data is calculated according to the number of work items of the device program defined in the source program file and the amount of memory access data of the unit work item.
  • the memory access rate of the access device may be determined according to the hardware characteristics of the current heterogeneous system program execution platform.
  • the total amount of memory access to the operational data may be equal to the product of the number of work items of the device program and the amount of memory access data per unit of work item.
  • the work item work-item is the smallest execution unit.
  • the number of work items indicates how many units the computer is divided into.
  • the amount of memory access data of each work item can be known according to the definition in the source program file. Analysis of flow analysis techniques.
  • the first consumption time calculation module 603 is configured to use the sum of the data transmission time calculated in the copy mode and the device program execution time as the execution consumption time of the operation data in the copy mode.
  • the second transmission time calculation module 604 is configured to calculate and eliminate the time according to the mapping relationship between the host end and the device end, and calculate the data transmission time of the operation data in the mapping mode.
  • mapping relationship establishment and elimination time can be predetermined based on the hardware characteristics of the heterogeneous system execution platform.
  • the second execution time calculation module 605 is configured to calculate a device program execution time of the operation data in the mapping mode according to a total amount of memory accesses to the operation data and a memory access rate of the access host end during execution of the device program.
  • the memory access rate of the access host may be predetermined according to the hardware characteristics of the heterogeneous system execution platform.
  • the total amount of memory access to the operational data may be equal to the product of the number of work items of the device program and the amount of memory access data per unit of work item.
  • the second consumption time calculation module 606 is configured to use the sum of the data transmission time calculated in the mapping mode and the device program execution as the execution time of the operation data in the mapping mode.
  • the mode selection module 503 is configured to select a data transmission mode with a small consumption time as a compiled data transmission mode of the operation data when the OpenCL source program is compiled.
  • the compiling module 504 is configured to generate a compiled execution code file according to the compiled data transmission mode.
  • the compiled data transmission mode may be the first of the operation data.
  • the OpenCL program can be run on the machine, reducing execution time and improving execution efficiency.
  • the compiler obtains the source program file, and determines a first data transmission mode of the operation data defined in the source program file, and verifies the operation data. If it is processed according to the second data transmission mode, the security time is And calculating an execution consumption time of the operation data in the first data transmission mode and the second data transmission mode, respectively, and selecting a data transmission mode in which the consumption time is small as a compiled data transmission mode of the operation data at the time of compiling, According to this, the compiled execution code file can be generated, so that when the program is running in the machine, the operation data can be processed according to the selected compiled data transmission mode, the execution time is shortened, the execution efficiency can be effectively improved, and the program can be transplanted to another When performing in a heterogeneous system, the technical solution of the embodiment of the present application can be used to determine that the operational data conforms to the data transmission mode of the heterogeneous system, thereby ensuring the execution efficiency of the program in different heterogeneous systems.
  • the compiler described in the foregoing embodiment is applied to the computing device in a practical application.
  • the computing device that deploys the compiler in the embodiment of the present application can implement the compilation of the source program file, and compile the source program file into a machine-recogable code.
  • the data transfer mode with low consumption time can be compiled for the operation data defined in the source program file, so that the execution time of the program is reduced and the program execution efficiency is improved.
  • the embodiment of the present application further provides a computing device, where the computing device includes at least a memory 701 and a processor 702 connected to the memory 701 via a bus.
  • the memory 701 stores a set of program instructions.
  • the memory 701 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory or the like.
  • the processor 702 is configured to invoke a program instruction stored by the memory 701, and perform the following operations:
  • a compiled execution code file is generated in accordance with the compiled data transfer mode.
  • the processor 702 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
  • CPU central processing unit
  • ASIC Application Specific Integrated Circuit
  • the computing device can be used to execute any of the OpenCL program compilation methods shown in FIG. 1 to FIG. 2 provided by the embodiments of the present application.
  • the present application can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM or a disk. , an optical disk, etc., includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present application or portions of the embodiments.
  • a computer device which may be a personal computer, server, or network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

L'invention concerne un procédé de compilation de programme OpenCL (Open Computing Language) et un compilateur, ledit procédé comprenant : l'obtention de fichiers source OpenCL et la détermination d'un premier mode de transmission de données pour des données de fonctionnement définies dans lesdits fichiers source; le calcul du temps d'exécution desdites données de fonctionnement dans ledit premier mode de transmission de données et dans un second mode de transmission de données, ledit second mode de transmission de données étant différent dudit premier mode de transmission de données, et ledit temps d'exécution comprenant le temps de transmission de données des données de fonctionnement ainsi que le temps d'exécution de programme de dispositif; la sélection du mode de transmission de données ayant un temps d'exécution plus court pour servir en tant que mode de transmission de données compilées pour lesdites données de fonctionnement lorsque lesdits fichiers source sont compilés; la génération d'un fichier de code d'exécution de compilation conformément au dit mode de transmission de données de compilation. L'invention garantit effectivement le rendement d'exécution de programme.
PCT/CN2014/085885 2013-09-06 2014-09-04 Procédé de compilation de programme opencl et compilateur WO2015032331A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310404125.6A CN104424009B (zh) 2013-09-06 2013-09-06 OpenCL程序编译方法和编译器
CN201310404125.6 2013-09-06

Publications (1)

Publication Number Publication Date
WO2015032331A1 true WO2015032331A1 (fr) 2015-03-12

Family

ID=52627814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/085885 WO2015032331A1 (fr) 2013-09-06 2014-09-04 Procédé de compilation de programme opencl et compilateur

Country Status (2)

Country Link
CN (1) CN104424009B (fr)
WO (1) WO2015032331A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108885546B (zh) * 2016-04-08 2021-07-20 华为技术有限公司 一种基于异构系统的程序处理方法和装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6298477B1 (en) * 1998-10-30 2001-10-02 Sun Microsystems, Inc. Method and apparatus for selecting ways to compile at runtime
CN1518693A (zh) * 2000-10-05 2004-08-04 皇家菲利浦电子有限公司 可重定目标的编译系统和方法
CN101034361A (zh) * 2007-01-18 2007-09-12 浙江大学 一种基于指令代价的编译器优化代码生成方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6298477B1 (en) * 1998-10-30 2001-10-02 Sun Microsystems, Inc. Method and apparatus for selecting ways to compile at runtime
CN1518693A (zh) * 2000-10-05 2004-08-04 皇家菲利浦电子有限公司 可重定目标的编译系统和方法
CN101034361A (zh) * 2007-01-18 2007-09-12 浙江大学 一种基于指令代价的编译器优化代码生成方法

Also Published As

Publication number Publication date
CN104424009B (zh) 2017-10-17
CN104424009A (zh) 2015-03-18

Similar Documents

Publication Publication Date Title
Lowe-Power et al. The gem5 simulator: Version 20.0+
Ubal et al. Multi2Sim: A simulation framework for CPU-GPU computing
Lo et al. Roofline model toolkit: A practical tool for architectural and program analysis
Konstantinidis et al. A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling
JP5551939B2 (ja) 任意の標的アーキテクチャに対する並列simdコードを生成する方法、コンピュータ可読媒体、およびシステム
Gutierrez et al. Sources of error in full-system simulation
US10175964B2 (en) Compiler caching for runtime routine redundancy tracking
TW201826122A (zh) 用於異質計算之系統,方法,及設備
US20070169057A1 (en) Mechanism to restrict parallelization of loops
Potop-Butucaru et al. Integrated worst-case execution time estimation of multicore applications
JP2013528884A (ja) グラフに基づく計算の動的ロード
US9817643B2 (en) Incremental interprocedural dataflow analysis during compilation
Jiang et al. WebPerf: Evaluating what-if scenarios for cloud-hosted web applications
US10318261B2 (en) Execution of complex recursive algorithms
US20160078531A1 (en) Aggregation engine for real-time counterparty credit risk scoring
Yang Hierarchical roofline analysis: How to collect data using performance tools on intel cpus and nvidia gpus
Pathak et al. Enabling automatic offloading of resource-intensive smartphone applications
Owenson et al. An unstructured CFD mini‐application for the performance prediction of a production CFD code
US20150019198A1 (en) Method to apply perturbation for resource bottleneck detection and capacity planning
Qiu et al. Clara: Performance clarity for SmartNIC offloading
Liu et al. Mousse: a system for selective symbolic execution of programs with untamed environments
WO2015032331A1 (fr) Procédé de compilation de programme opencl et compilateur
US20160110170A1 (en) Message inlining
DeRose et al. Relative debugging for a highly parallel hybrid computer system
US20140372996A1 (en) Compiler optimization for memoization of pure function arguments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14842096

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14842096

Country of ref document: EP

Kind code of ref document: A1