US20190108006A1 - Code coverage generation in gpu by using host-device coordination - Google Patents

Code coverage generation in gpu by using host-device coordination Download PDF

Info

Publication number
US20190108006A1
US20190108006A1 US16/154,542 US201816154542A US2019108006A1 US 20190108006 A1 US20190108006 A1 US 20190108006A1 US 201816154542 A US201816154542 A US 201816154542A US 2019108006 A1 US2019108006 A1 US 2019108006A1
Authority
US
United States
Prior art keywords
code
processor
instrumentation
host
coverage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US16/154,542
Inventor
Hariharan Sandanagobalane
Sean Lee
Vinod Grover
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nvidia Corp
Original Assignee
Nvidia Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nvidia Corp filed Critical Nvidia Corp
Priority to US16/154,542 priority Critical patent/US20190108006A1/en
Assigned to NVIDIA CORPORATION reassignment NVIDIA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GROVER, VINOD, LEE, SEAN YOUNGSUNG, SANDANAGOBALANE, HARIHARAN
Publication of US20190108006A1 publication Critical patent/US20190108006A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3624Software debugging by performing operations on the source code, e.g. via a compiler
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3676Test management for coverage analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/451Code distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/458Synchronisation, e.g. post-wait, barriers, locks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44536Selecting among different versions
    • G06F9/44542Retargetable
    • G06F9/44547Fat binaries

Definitions

  • Embodiments of the present disclosure are related to computer program compilers, and more specifically, to determining code coverage for software to be performed by one or more co-processors by coordinating with one or more host processors.
  • Certain computer systems include a co-processing subsystem that may be configured to concurrently execute multiple program threads that are instantiated from a common application program.
  • a computer system may include a host processor and one or more device processors which are also known as coprocessors or accelerator processors.
  • CUDA is a well-known parallel computing platform and an application programming interface (API) model that enables general purpose computing by using a graphics processing unit (GPU) as a device processor (or co-processor) and a Central Processing Unit (CPU) as a host processor.
  • API application programming interface
  • GPU graphics processing unit
  • CPU Central Processing Unit
  • Code coverage is mechanism used to measure the degree to which source code is executed by a test-suite. It is often used to assist performance tuning by helping programmers focus their development and debug efforts on the most commonly executed portions of code.
  • FIG. 1 illustrates an exemplary computer implemented process of compilation and instrumented execution to generate code coverage information from device code execution in accordance with an embodiment of the present disclosure.
  • FIG. 2A illustrates an exemplary computer implemented process of instrumented compilation in a device compiler in accordance with an embodiment of the present disclosure.
  • FIG. 2B is a flow chart depicting an exemplary computer implemented process of instrumenting device functions in a device compiler in accordance with an embodiment of the present disclosure.
  • FIG. 3 is a flow chart depicting an exemplary instrumented execution process through coordination between a CPU and a GPU in accordance with an embodiment of the present disclosure.
  • FIG. 4 is a block diagram illustrating an exemplary computing system operable to compile integrated source code and instrument the code for code coverage data collection in accordance with an embodiment of the present disclosure.
  • Embodiments of the present disclosure provide a compilation mechanism to enable generation of code coverage information with regard to code execution by a device processor (or a co-processor or accelerator processor herein).
  • An exemplary integrated compiler can compile source code programmed to be concurrently executed by a host processor (or main processor) and a device processor.
  • the compilation can generate an instrumented executable code including (1) code coverage instrumentation counters for the device functions, (2) mapping information that maps instrumentation counters to source constructs, (3) memory requirements of the counters, and (4) instructions for the host processor to allocate and initialize device memory for the counters and to retrieve collected coverage data from the device memory to generate instrumentation output.
  • Execution of the instrumented executable code can produce values coverage counters, which when provided to coverage tool, along with the executable can produce code coverage report on the device functions.
  • the code coverage information can be used to determine the extent that the source code is expressed by a test-suite of test applications.
  • a first processor such as a GPU operates, as a co-processor of a second processor, such as a CPU, or vice versa.
  • the first processor and the second professor are configured to operate in a co-processing manner.
  • Some embodiments of the present disclosure can be integrated in a NVCC compiler for the CUDA programming language and a General-Purpose computing on Graphics Processing Units (GPGPU) platform, e.g., with a CPU being the host and a GPU being a device.
  • GPGPU General-Purpose computing on Graphics Processing Units
  • other embodiments of the present disclosure may also be used in any other suitable parallel computing platform that includes different types of processors.
  • an application program written for CUDA may include sequential C language programming statements, and calls to a specialized application programming interface (API) used for configuring and managing parallel execution of program threads.
  • API application programming interface
  • a function associated with a CUDA application program that is destined for concurrent execution on a device processor is referred to as a “kernel” function.
  • An instance of a kernel function is referred to as a thread, and a set of concurrently executing threads may be organized as a thread block.
  • FIG. 1 illustrates an exemplary computer implemented process 100 of compilation and instrumented execution to generate code coverage information from device code execution in accordance with an embodiment of the present disclosure.
  • the compilation process may be performed by an exemplary compiler that integrates the functionalities of host compilation by using a host compiler 113 , device compilation by using a device compiler 122 , and linking.
  • the integrated source code is processed by the host and device preprocessors 111 and 112 .
  • the device code and the host code are separated from each other and supplied to the host compiler 113 and the device compiler 122 , respectively.
  • the device code is subject to front end and back end processing to generate device code machine binary.
  • a code coverage pass 123 is implemented to generate instrumentation code by inserting code to increment counters to the device functions, e.g., as part of the optimization phase. As described in greater detail with reference to FIGS.
  • the code coverage pass 123 also generates mapping information that maps the inserted instrumentation counters to their source constructs (e.g., the source construct information is generated as part of the front end processing), and memory usage requirement for each device function.
  • mapping information is passed along in the device binary, while the memory usage information and the call graph information are enclosed in a file named “covinfo.”
  • the device compiler 122 sends the instrumentation code and the “covinfo” file to the host compiler 113 which uses the enclosed information to declare mirrors for counters on the host side.
  • the device instrumented code is combined with the front end-processed host code and processed by the host compiler 113 to generate an object file.
  • the host compiler 113 can generate instructions for a host processor to allocate and initialize memory for the counters in the instrumented execution phase, as described in greater detail below with reference to FIG. 3 .
  • the object file is then processed by the device linker 131 (in case of separate compilation as described below), the host compiler 113 , and the host linker 132 . As a result, the instrumented executable code is produced for the program.
  • a code coverage report with collected code coverage data can be produced by the coverage tool 150 , e.g., in a format that can be displayed in a graphics user interface (GUI) viewable by a user.
  • GUI graphics user interface
  • the report may present the source file as annotated with coverage information at source block granularity, and annotated uncovered source region.
  • the device compiler 122 may be configured to limit instrumentation and annotation to a selected set of functions in the program.
  • the flow in the dashed-line box 120 may be performed for each virtual architecture, e.g., each Instruction Set Architecture (ISA).
  • ISA Instruction Set Architecture
  • an architecture field is added to the host-device communication macros to uniquely identify the different architecture variants.
  • the flow in the dashed-line box 110 is performed once as the device instrument code supplied to the host compiler includes a complete function call list (callee list) of each kernel.
  • a complete function call list of a kernel may not be known at the time of compiling the kernel by the device compiler 122 .
  • the call graph and the callee list may be only available at link time.
  • communications between the device compiler 122 , the device linker 131 and the host compiler 113 are used to achieve instrumentation. Partial instrument information from all compilation units is fed to the device linker 131 and combined with the object file. As such, the instrumentation for the entire program, and therefore for a complete function call list, becomes available.
  • the flow in the dashed-line box 110 is performed once and the code coverage pass 123 may generate instrumentation related to a partial function call list contained in the portion.
  • the device compiler 122 instruments the portion of the code as it would for a whole program compilation. In addition, it emits information of instrumentation counters and mapping in “covinfo” to the host compiler 113 for it to declare mirrors for the counters.
  • an initialized constant variable may be created, containing:
  • the instrument information from all compilation units is collated and a call graph is generated which contains the partial call graphs using compiler information.
  • This call graph is supplemented with the call graph generated by the linker 131 , and instrument code is generated using the combined call list.
  • this instrument code contains all the information necessary for the host side to allocate memory and print the collected coverage data to a file after a kernel launch.
  • a host side stub file is created, compiled and linked to produce the final executable.
  • function names may be passed between the device compiler 122 and the linker 131 using relocations.
  • the device compiler 122 uses function addresses in the counter variable initialization. They turn into linker relocations, which are patched at link time.
  • function names can be passed as strings.
  • a Cyclic Redundancy Check (CRC) error detection code can be used to check based on the structure and indexes of the CFG of the program.
  • the CRC code in combination with the function names can be used to facilitate validity verification of the code coverage data.
  • coverage instrumentation for device code includes two major tasks: (1) instrumenting the source code with increment counters; and (2) generating coverage mapping information to map instrumentation counters to source constructs.
  • Task (1) uses call graph information and full instrumentation information for each function. Thus, in one embodiment, it may be achieved by using an optimization (OPT) module pass.
  • OPT optimization
  • task (2) may be achieved by a front end process with its access to source lexical blocks.
  • the front end of the device compiler constructs a syntax tree, along with the source line information, e.g., Source Position (SPOS).
  • SPOS Source Position
  • FIG. 2A illustrates an exemplary computer implemented process 200 of instrumented compilation in an exemplary device compiler 210 in accordance with an embodiment of the present disclosure.
  • the device front end 211 is configured to generate and emit calls to coverage intrinsics at instrumentation points as part of intermediate representation (IR) code generation.
  • IR intermediate representation
  • the lexical blocks and their source positions are available.
  • the intrinsics are operable to encode the source positional information as parameters and may be emitted for each lexical block in the source program.
  • the optimization phase (OPT) 212 includes a code coverage module pass 221 operable to convert the coverage intrinsics to coverage instrumentation instructions in the instrumentation code and emit relevant information in a file (e.g., in the “covinfo”) which can be used by the host compiler to generate instructions for a host processor to allocate memory during execution.
  • the code coverage pass 221 also converts the coverage intrinsics to coverage mapping information and emits this information in the assembly language code (e.g., PTX code) and the machine binary code (e.g., “cubin”) for example.
  • a global coverage mapping variable may be emitted for each compilation unit in case of separate compilation.
  • the information in all such variables from different compilation units is then combined together by the linker.
  • the coverage mapping information can be used in reconstruction of the collected coverage data into a coverage report, which needs the values of all the counters emitted for a compilation unit, and the mapping of source positions to the corresponding counters.
  • an extract library may be implemented to enable a coverage tool to retrieve the mapping information. Since the machine binary code (e.g., “cubin”) is wrapped in fatbinary in the host-side executable, the library can operate to unpack all the machine binary and append the coverage information for the coverage tool. This information is then analyzed along with the instrumentation counter values read from the library calls to construct the coverage report.
  • the device compiler 210 emits a list of information to the host side for combination with the front end processed host code, the information including the constant global variable of call list or partial call lists in case of separate compilation, instrumentation counters, and the memory requirements of the counters.
  • the output from the optimization phase 212 is sent to the back end 215 , where the device code generator 213 converts it into assembly language code (e.g., PTX).
  • assembly language code e.g., PTX
  • the PTX code is further converted to machine binary code by the PTX assembly 214 .
  • the PTX code and machine binary code are embedded in the fatbinary through the fatbinary module 220 and also combined (“included”) in the front end-processed host code which is fed to the host compiler.
  • the code coverage pass is a module pass integrated as part of an Intermediate Representation (IR) pass in the device optimization phase, and can be invoked anywhere in the optimization phase 212 of the device back end 215 before conversion of the IR code to the machine instruction code.
  • IR Intermediate Representation
  • the device code coverage generation can be implemented in any other well-known suitable manner without departing from the scope of the present disclosure.
  • FIG. 2B is a flow chart depicting an exemplary computer implemented process 250 of instrumenting device functions in a device compiler in accordance with an embodiment of the present disclosure.
  • Process 250 can be implemented in a module pass as call graph information is needed. In one embodiment, process 250 may be performed by the code coverage pass 221 in FIG. 2A .
  • the calls to coverage intrinsics are converted to instrumentation instructions in the instrumentation code.
  • the memory usage requirement for each function is collected and this information is emitted in the “covinfo” file with call graph information.
  • the coverage mapping information for each function is accumulated, and a global constant variable for the whole compilation unit is emitted.
  • a code coverage pass is used to generate device instrumentation code by inserting instrumentation counters. The counters are updated each time the associated code is executed. Also generated in compilation are the instructions for coordination between the host processor and the device processor during the instrumented execution, such as memory allocation and initialization.
  • FIG. 3 is a flow chart depicting an exemplary instrumented execution process 300 through coordination between a CPU and a GPU in accordance with an embodiment of the present disclosure.
  • the flows in the dashed-boxes 310 and 320 illustrate the CPU (host) execution and GPU (device) execution processes, respectively. Steps 311 - 317 and 321 - 322 are performed for each kernel invocation at runtime.
  • the CPU allocates GPU memory for the coverage instrumentation counters of a kernel and all the device functions called from the kernel.
  • the GPU driver is used to initialize the coverage instrumentation counters.
  • the GPU memory is bound to an ID of the GPU, e.g., a device symbol name.
  • the CPU launches the kernel.
  • the GPU executes the kernel at 321 and increments the coverage instrumentation counters accordingly at 322 .
  • the counters associated with a respective code portion are updated each time the respective code portion is executed at 321 .
  • atomic instructions e.g., PTX instructions
  • PTX instructions are used to achieve atomic update operations.
  • the CPU copies the counter values from the GPU memory, and at 316 calls into a library interface to record the collected coverage data including the counter values.
  • the CPU calls a library to write the collected coverage data to an output file.
  • FIG. 4 is a block diagram illustrating an exemplary computing system 400 operable to compile integrated source code and instrument the code for code coverage data collection in accordance with an embodiment of the present disclosure.
  • system 400 may be a general-purpose computing device used to compile a program configured to be executed concurrently by a host processor and one or more device processors in parallel execution system.
  • System 400 comprises a Central Processing Unit (CPU) 401 , a system memory 402 , a Graphics Processing Unit (GPU) 403 , I/O interfaces 404 and network circuits 405 , an operating system 406 and application software 407 stored in the memory 402 .
  • software 407 includes an exemplary integrated compiler 408 configured to compile source code of programs having a mixture of host code and device code.
  • a code coverage pass 410 in the integrated compiler 408 can generate instrumented executable code with coverage instrumentation counters inserted for the device functions, coverage mapping information and memory requirement for the counters.
  • the compiler 408 can further generate instructions for the host processor to allocate and initialize device memory for the counters and to retrieve collected coverage information from the device memory and output coverage counters.
  • the compiler 408 may perform various other functions that are well known in the art as well as those discussed in details with reference to FIGS. 1-3 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Stored Programmes (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

System and method of compiling a program having a mixture of host code and device code to enable code coverage data collection for device code execution. An exemplary integrated compiler can compile source code programmed to be executed by a host processor (e.g., CPU) and a co-processor (e.g., a GPU) concurrently. The compilation can generate an instrumented executable code which includes: coverage instrumentation counters for the device functions; mapping information that maps the counters with the instrumented source points; and instructions for the host processor to allocate and initialize device memory for the counters and to retrieve collected code coverage information from the device memory to the host memory. Execution of the instrumented executable can yield a coverage report on the device code functions.

Description

    CROSSREFERENCE TO RELATED PATENT APPLICATIONS
  • This application claims priority to, and benefit of, U.S. provisional patent application No. 62/569,380, filed on Oct. 6, 2017, and entitled “COORDINATED HOST DEVICE MECHANISM FOR DEVICE PROFILING IN GPU ACCELERATORS AND CODE COVERAGE IN GPU ACCELERATORS FOR WHOLE PROGRAM AND SEPARATE COMPILATION,” the content of which is herein incorporated by reference in entirety for all purposes. This application is related to the co-pending, commonly-assigned U.S. patent application Ser. No. ______, filed on ______, and entitled “DEVICE PROFILING IN GPU ACCELERATORS BY USING HOST-DEVICE COORDINATION.”
  • FIELD OF THE INVENTION
  • Embodiments of the present disclosure are related to computer program compilers, and more specifically, to determining code coverage for software to be performed by one or more co-processors by coordinating with one or more host processors.
  • BACKGROUND OF THE INVENTION
  • Certain computer systems include a co-processing subsystem that may be configured to concurrently execute multiple program threads that are instantiated from a common application program. Such a computer system may include a host processor and one or more device processors which are also known as coprocessors or accelerator processors. For example, CUDA is a well-known parallel computing platform and an application programming interface (API) model that enables general purpose computing by using a graphics processing unit (GPU) as a device processor (or co-processor) and a Central Processing Unit (CPU) as a host processor. Code coverage is mechanism used to measure the degree to which source code is executed by a test-suite. It is often used to assist performance tuning by helping programmers focus their development and debug efforts on the most commonly executed portions of code. Current compiler techniques are not able to provide coverage information of code intended to be performed by co-processors, such as a graphics processing unit (GPU) or other fixed-function accelerator due, in part, to the difficulty in coordinating between a host processor (e.g., CPU) and a co-processor (e.g., GPU) when instrumenting code to be performed by the co-processor. Accordingly, there is currently a need for techniques to collect coverage information of code to be performed by a co-processor, such as a GPU or other accelerator.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings.
  • FIG. 1 illustrates an exemplary computer implemented process of compilation and instrumented execution to generate code coverage information from device code execution in accordance with an embodiment of the present disclosure.
  • FIG. 2A illustrates an exemplary computer implemented process of instrumented compilation in a device compiler in accordance with an embodiment of the present disclosure.
  • FIG. 2B is a flow chart depicting an exemplary computer implemented process of instrumenting device functions in a device compiler in accordance with an embodiment of the present disclosure.
  • FIG. 3 is a flow chart depicting an exemplary instrumented execution process through coordination between a CPU and a GPU in accordance with an embodiment of the present disclosure.
  • FIG. 4 is a block diagram illustrating an exemplary computing system operable to compile integrated source code and instrument the code for code coverage data collection in accordance with an embodiment of the present disclosure.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Reference will now be made in detail to the preferred embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings. While the disclosure will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the disclosure to these embodiments. On the contrary, the disclosure is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the disclosure as defined by the appended claims. Furthermore, in the following detailed description of embodiments of the present disclosure, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be recognized by one of ordinary skill in the art that the present disclosure may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments of the present disclosure.
  • Notation and Nomenclature:
  • Some portions of the detailed descriptions, which follow, are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
  • It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present disclosure, discussions utilizing terms such as “processing” or “compiling” or “linking” or “accessing” or “performing” or “executing” or “providing” or the like, refer to the action and processes of an integrated circuit, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
  • Code Coverage Generation in GPU by Using Host-Device Coordination
  • Embodiments of the present disclosure provide a compilation mechanism to enable generation of code coverage information with regard to code execution by a device processor (or a co-processor or accelerator processor herein). An exemplary integrated compiler can compile source code programmed to be concurrently executed by a host processor (or main processor) and a device processor. The compilation can generate an instrumented executable code including (1) code coverage instrumentation counters for the device functions, (2) mapping information that maps instrumentation counters to source constructs, (3) memory requirements of the counters, and (4) instructions for the host processor to allocate and initialize device memory for the counters and to retrieve collected coverage data from the device memory to generate instrumentation output. Execution of the instrumented executable code can produce values coverage counters, which when provided to coverage tool, along with the executable can produce code coverage report on the device functions.
  • The code coverage information can be used to determine the extent that the source code is expressed by a test-suite of test applications.
  • In one embodiment, a first processor, such as a GPU operates, as a co-processor of a second processor, such as a CPU, or vice versa. The first processor and the second professor are configured to operate in a co-processing manner.
  • Some embodiments of the present disclosure can be integrated in a NVCC compiler for the CUDA programming language and a General-Purpose computing on Graphics Processing Units (GPGPU) platform, e.g., with a CPU being the host and a GPU being a device. However, other embodiments of the present disclosure may also be used in any other suitable parallel computing platform that includes different types of processors.
  • For example, an application program written for CUDA may include sequential C language programming statements, and calls to a specialized application programming interface (API) used for configuring and managing parallel execution of program threads. A function associated with a CUDA application program that is destined for concurrent execution on a device processor is referred to as a “kernel” function. An instance of a kernel function is referred to as a thread, and a set of concurrently executing threads may be organized as a thread block.
  • FIG. 1 illustrates an exemplary computer implemented process 100 of compilation and instrumented execution to generate code coverage information from device code execution in accordance with an embodiment of the present disclosure. In one embodiment, the compilation process may be performed by an exemplary compiler that integrates the functionalities of host compilation by using a host compiler 113, device compilation by using a device compiler 122, and linking.
  • More specifically, the integrated source code is processed by the host and device preprocessors 111 and 112. The device code and the host code are separated from each other and supplied to the host compiler 113 and the device compiler 122, respectively. In the device compiler 122, the device code is subject to front end and back end processing to generate device code machine binary. In the illustrated embodiment, a code coverage pass 123 is implemented to generate instrumentation code by inserting code to increment counters to the device functions, e.g., as part of the optimization phase. As described in greater detail with reference to FIGS. 2A-3, in one embodiment, the code coverage pass 123 also generates mapping information that maps the inserted instrumentation counters to their source constructs (e.g., the source construct information is generated as part of the front end processing), and memory usage requirement for each device function. In one embodiment, the mapping information is passed along in the device binary, while the memory usage information and the call graph information are enclosed in a file named “covinfo.”
  • The device compiler 122 sends the instrumentation code and the “covinfo” file to the host compiler 113 which uses the enclosed information to declare mirrors for counters on the host side. The device instrumented code is combined with the front end-processed host code and processed by the host compiler 113 to generate an object file. Provided with the device instrument code and the “covinfo” file, the host compiler 113 can generate instructions for a host processor to allocate and initialize memory for the counters in the instrumented execution phase, as described in greater detail below with reference to FIG. 3. The object file is then processed by the device linker 131 (in case of separate compilation as described below), the host compiler 113, and the host linker 132. As a result, the instrumented executable code is produced for the program.
  • After the execution platform 140 executes the executable, it produces code coverage data including counter information, which when combined with coverage information available in the executable, is passed to a code coverage tool. A code coverage report with collected code coverage data can be produced by the coverage tool 150, e.g., in a format that can be displayed in a graphics user interface (GUI) viewable by a user. In one embodiment the report may present the source file as annotated with coverage information at source block granularity, and annotated uncovered source region. In one embodiment, the device compiler 122 may be configured to limit instrumentation and annotation to a selected set of functions in the program.
  • In one embodiment, the flow in the dashed-line box 120 may be performed for each virtual architecture, e.g., each Instruction Set Architecture (ISA). In one embodiment, an architecture field is added to the host-device communication macros to uniquely identify the different architecture variants.
  • In case of whole compilation, in one embodiment, the flow in the dashed-line box 110 is performed once as the device instrument code supplied to the host compiler includes a complete function call list (callee list) of each kernel. In case of separate compilation, in one embodiment, a complete function call list of a kernel may not be known at the time of compiling the kernel by the device compiler 122. The call graph and the callee list may be only available at link time. In one embodiment, communications between the device compiler 122, the device linker 131 and the host compiler 113 are used to achieve instrumentation. Partial instrument information from all compilation units is fed to the device linker 131 and combined with the object file. As such, the instrumentation for the entire program, and therefore for a complete function call list, becomes available.
  • More specifically, for each compilation unit configured to compile a portion of the source code, the flow in the dashed-line box 110 is performed once and the code coverage pass 123 may generate instrumentation related to a partial function call list contained in the portion. During compilation, the device compiler 122 instruments the portion of the code as it would for a whole program compilation. In addition, it emits information of instrumentation counters and mapping in “covinfo” to the host compiler 113 for it to declare mirrors for the counters.
  • In one embodiment, an initialized constant variable may be created, containing:
      • 1. Function name, function hash, architecture ID and number of counters for each device function; and
      • 2. Partial call list containing calls recognized for one compilation unit.
  • In one embodiment, at link time, the instrument information from all compilation units is collated and a call graph is generated which contains the partial call graphs using compiler information. This call graph is supplemented with the call graph generated by the linker 131, and instrument code is generated using the combined call list. In one embodiment, this instrument code contains all the information necessary for the host side to allocate memory and print the collected coverage data to a file after a kernel launch. In one embodiment, a host side stub file is created, compiled and linked to produce the final executable.
  • In one embodiment, function names may be passed between the device compiler 122 and the linker 131 using relocations. The device compiler 122 uses function addresses in the counter variable initialization. They turn into linker relocations, which are patched at link time. In another embodiment, function names can be passed as strings.
  • As the coverage information collected for a program is sensitive to changes to the compiler and the source code, in one embodiment, a Cyclic Redundancy Check (CRC) error detection code can be used to check based on the structure and indexes of the CFG of the program. The CRC code in combination with the function names can be used to facilitate validity verification of the code coverage data.
  • According to embodiments of the present disclosure, coverage instrumentation for device code includes two major tasks: (1) instrumenting the source code with increment counters; and (2) generating coverage mapping information to map instrumentation counters to source constructs. Task (1) uses call graph information and full instrumentation information for each function. Thus, in one embodiment, it may be achieved by using an optimization (OPT) module pass. In one embodiment, task (2) may be achieved by a front end process with its access to source lexical blocks. In one embodiment, as part of parsing, the front end of the device compiler constructs a syntax tree, along with the source line information, e.g., Source Position (SPOS).
  • FIG. 2A illustrates an exemplary computer implemented process 200 of instrumented compilation in an exemplary device compiler 210 in accordance with an embodiment of the present disclosure. In the illustrated example, the device front end 211 is configured to generate and emit calls to coverage intrinsics at instrumentation points as part of intermediate representation (IR) code generation. At this stage, the lexical blocks and their source positions are available. The intrinsics are operable to encode the source positional information as parameters and may be emitted for each lexical block in the source program.
  • In one embodiment, the optimization phase (OPT) 212 includes a code coverage module pass 221 operable to convert the coverage intrinsics to coverage instrumentation instructions in the instrumentation code and emit relevant information in a file (e.g., in the “covinfo”) which can be used by the host compiler to generate instructions for a host processor to allocate memory during execution. In addition, the code coverage pass 221 also converts the coverage intrinsics to coverage mapping information and emits this information in the assembly language code (e.g., PTX code) and the machine binary code (e.g., “cubin”) for example. In one embodiment, a global coverage mapping variable may be emitted for each compilation unit in case of separate compilation. In one embodiment, the information in all such variables from different compilation units is then combined together by the linker.
  • The coverage mapping information can be used in reconstruction of the collected coverage data into a coverage report, which needs the values of all the counters emitted for a compilation unit, and the mapping of source positions to the corresponding counters. In some embodiments, for reconstruction, an extract library may be implemented to enable a coverage tool to retrieve the mapping information. Since the machine binary code (e.g., “cubin”) is wrapped in fatbinary in the host-side executable, the library can operate to unpack all the machine binary and append the coverage information for the coverage tool. This information is then analyzed along with the instrumentation counter values read from the library calls to construct the coverage report.
  • As illustrated, the device compiler 210 emits a list of information to the host side for combination with the front end processed host code, the information including the constant global variable of call list or partial call lists in case of separate compilation, instrumentation counters, and the memory requirements of the counters.
  • The output from the optimization phase 212, including the instrumented calls to counters and coverage mapping information, is sent to the back end 215, where the device code generator 213 converts it into assembly language code (e.g., PTX). The PTX code is further converted to machine binary code by the PTX assembly 214. In one embodiment, the PTX code and machine binary code are embedded in the fatbinary through the fatbinary module 220 and also combined (“included”) in the front end-processed host code which is fed to the host compiler.
  • In this example, the code coverage pass is a module pass integrated as part of an Intermediate Representation (IR) pass in the device optimization phase, and can be invoked anywhere in the optimization phase 212 of the device back end 215 before conversion of the IR code to the machine instruction code. However, it will be appreciated that the device code coverage generation can be implemented in any other well-known suitable manner without departing from the scope of the present disclosure.
  • FIG. 2B is a flow chart depicting an exemplary computer implemented process 250 of instrumenting device functions in a device compiler in accordance with an embodiment of the present disclosure. Process 250 can be implemented in a module pass as call graph information is needed. In one embodiment, process 250 may be performed by the code coverage pass 221 in FIG. 2A. At 250, for each device function, the calls to coverage intrinsics are converted to instrumentation instructions in the instrumentation code. At 252, the memory usage requirement for each function is collected and this information is emitted in the “covinfo” file with call graph information. At 253, the coverage mapping information for each function is accumulated, and a global constant variable for the whole compilation unit is emitted.
  • In one embodiment, a code coverage pass is used to generate device instrumentation code by inserting instrumentation counters. The counters are updated each time the associated code is executed. Also generated in compilation are the instructions for coordination between the host processor and the device processor during the instrumented execution, such as memory allocation and initialization. FIG. 3 is a flow chart depicting an exemplary instrumented execution process 300 through coordination between a CPU and a GPU in accordance with an embodiment of the present disclosure.
  • The flows in the dashed- boxes 310 and 320 illustrate the CPU (host) execution and GPU (device) execution processes, respectively. Steps 311-317 and 321-322 are performed for each kernel invocation at runtime. At 311, the CPU allocates GPU memory for the coverage instrumentation counters of a kernel and all the device functions called from the kernel. At 312, the GPU driver is used to initialize the coverage instrumentation counters. At 313, the GPU memory is bound to an ID of the GPU, e.g., a device symbol name. At 314, the CPU launches the kernel.
  • In response, the GPU executes the kernel at 321 and increments the coverage instrumentation counters accordingly at 322. The counters associated with a respective code portion are updated each time the respective code portion is executed at 321. In one embodiment, atomic instructions (e.g., PTX instructions) are used to achieve atomic update operations.
  • At 315, the CPU copies the counter values from the GPU memory, and at 316 calls into a library interface to record the collected coverage data including the counter values. When the execution exits, at 317, the CPU calls a library to write the collected coverage data to an output file.
  • FIG. 4 is a block diagram illustrating an exemplary computing system 400 operable to compile integrated source code and instrument the code for code coverage data collection in accordance with an embodiment of the present disclosure. In one embodiment, system 400 may be a general-purpose computing device used to compile a program configured to be executed concurrently by a host processor and one or more device processors in parallel execution system. System 400 comprises a Central Processing Unit (CPU) 401, a system memory 402, a Graphics Processing Unit (GPU) 403, I/O interfaces 404 and network circuits 405, an operating system 406 and application software 407 stored in the memory 402. In one embodiment, software 407 includes an exemplary integrated compiler 408 configured to compile source code of programs having a mixture of host code and device code.
  • In one embodiment, provided with source code of a program and executed by the CPU 401, a code coverage pass 410 in the integrated compiler 408 can generate instrumented executable code with coverage instrumentation counters inserted for the device functions, coverage mapping information and memory requirement for the counters. The compiler 408 can further generate instructions for the host processor to allocate and initialize device memory for the counters and to retrieve collected coverage information from the device memory and output coverage counters. The compiler 408 may perform various other functions that are well known in the art as well as those discussed in details with reference to FIGS. 1-3.

Claims (27)

What is claimed is:
1. A method comprising:
accessing source code of a program comprising code configured to be executed by a host processor and code configured to be executed by a co-processor; and
compiling the source code to generate instrumented executable code, wherein the compiling comprises inserting instrumentation code operable to cause generation of code coverage data during execution of the instrumented executable code by the co-processor, wherein the instrumentation code comprises one or more instrumentation counters for a kernel to be executed by the co-processor, and wherein the instrumented executable code is operable to cause the host processor to initialize a co-processor memory for the one or more instrumentation counters.
2. The method of claim 1, wherein the host processor comprises a Central Processing Unit (CPU) and the co-processor comprises a Graphics Processing Unit (GPU).
3. The method of claim 1, wherein the compiling further comprises generating coverage mapping information indicative of correspondences between the one or more instrumentation counters and source positions of instrumented points.
4. The method of claim 3, wherein the inserting comprises providing the instrumentation code and a function call list from a co-processor compiler to a host processor compiler.
5. The method of claim 4, wherein the inserting further comprises inserting calls to coverage intrinsics, wherein the coverage intrinsic are operable to encode source positions of instrumented points into parameters.
6. The method of claim 5, wherein the inserting the instrumentation code further comprises converting the calls to coverage intrinsics related to the co-processor into coverage instrumentation instructions and the coverage mapping information.
7. The method of claim 4, wherein the compiling further comprises: generating call graph information and memory usage requirements for functions defined for the co-processor; and
sending the call graph information and the memory usage requirements from a co-processor compiler to a host compiler.
8. The method of claim 1, wherein the instrumented executable code is operable to cause said host processor to allocate the co-processor memory for the one or more instrumentation counters.
9. The method of claim 1, wherein the instrumented executable code is operable to cause the host processor to initialize the co-processor memory by using a driver for the co-processor.
10. The method of claim 1, wherein the instrumented executable code is operable to: cause the host processor to invoke the kernel for execution after initializing the co-processor memory for the one or more instrumentation counters; and cause the co-processor to update the one or more instrumentation counters during execution of the kernel.
11. The method of claim 1, wherein further the one or more instrumentation counters are configured to increment in atomic operations.
12. The method of claim 1, wherein the instrumented executable code is further operable to cause the host processor to copy values of the one or more instrumentation counters from the co-processor memory to a host processor memory after completion of the kernel.
13. The method of claim 1, wherein the instrumented executable code is operable to cause the host processor to call a library to write the code coverage data into an output file.
14. The method of claim 1, wherein the compiling comprises performing a set of separate compilations for multiple portions of the source code, wherein performing a separate compilation comprises:
inserting instrumentation code for a portion of the source code in the separate compilation; and
generating an initialized constant variable for the separate compilation, wherein the initialized constant variable comprises a partial function call list associated with the separate compilation.
15. The method of claim 14, wherein the compiling further comprises linking the instrumented code resulting from the set of separate compilations to generate the instrumented executable code, wherein the linking comprises:
generating a combined call list from partial function call lists; and
generating a representation of a combined call graph comprising partial call graphs associated with the multiple portions of the source code respectively.
16. The method of claim 14, wherein the performing the separate compilation further comprises:
sending instrumentation code for the portion from a device compiler to a host compiler; and
declaring mirrors for counters at the host compiler.
17. A system comprising:
a processor; and
a memory coupled to the processor and storing instructions that, when executed by the processor, cause the system to perform a method of generating code coverage data, wherein the method comprises:
accessing source code of a program comprising code configured to be executed by a host processor and code configured to be executed by a co-processor concurrently; and
compiling the source code to generate instrumented executable code, wherein the compiling comprises inserting instrumentation code operable to cause generation of code coverage data during execution of the instrumented executable code by the co-processor, wherein the instrumentation code comprises one or more instrumentation counters for a kernel to be executed by the co-processor, and wherein the instrumented executable code is operable to cause the host processor to initialize a co-processor memory for the one or more instrumentation counters before the host processor invokes the kernel.
18. The system of claim 17, wherein the compiling further comprises generating coverage mapping information indicative of correspondence between the one or more instrumentation counters and source positions of instrumented points.
19. The system of claim 18, wherein the inserting comprises converting calls to coverage intrinsics related to the co-processor into coverage instrumentation instructions, wherein the calls to coverage intrinsics are operable to encode the source positions of instrumented points into parameters.
20. The system of claim 18, wherein the compiling further comprises generating call graph information and memory usage requirements for functions defined for the co-processor.
21. The system of claim 17, wherein the instrumented executable code is further operable to cause the host processor to allocate the co-processor memory.
22. The system of claim 17, wherein the instrumented executable code is further operable to: cause the host processor to initialize the co-processor memory by using a driver for the co-processor; and cause the host processor to invoke the kernel for execution by the co-processor after initializing the co-processor memory.
23. The system of claim 17, wherein the instrumented executable code is further operable to cause the co-processor to update the one or more instrumentation counters during execution of the kernel, and wherein further said the one or more instrumentation counters are configured to increment in atomic operations.
24. The system of claim 17, wherein the instrumented executable code is further operable to cause the host processor to:
copy values of said the one or more instrumentation counters from the co-processor memory to a host processor memory after completion of execution of the kernel; and
write the code coverage data into an output file.
25. The system of claim 17, wherein the compiling comprises performing a set of separate compilations for multiple portions of the source code, wherein performing a separate compilation comprises:
inserting instrumentation code for a portion of the source code in the separate compilation; and
generating an initialized constant variable for the separate compilation, wherein the initialized constant variable comprises a partial function call list associated with the separate compilation.
26. The system of claim 25, wherein the compiling further comprises linking the instrumented code resulting from the set of separate compilations to generate the instrumented executable code, wherein the linking comprises:
generating a combined call list from partial function call lists; and
generating a representation of a combined call graph comprising partial call graphs associated with the multiple portions of the source code respectively.
27. The system claim 25, wherein the performing the separate compilation further comprises:
sending instrumentation code for the portion from a co-processor compiler to a host compiler; and
declaring mirrors for counters at the host compiler.
US16/154,542 2017-10-06 2018-10-08 Code coverage generation in gpu by using host-device coordination Abandoned US20190108006A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/154,542 US20190108006A1 (en) 2017-10-06 2018-10-08 Code coverage generation in gpu by using host-device coordination

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762569380P 2017-10-06 2017-10-06
US16/154,542 US20190108006A1 (en) 2017-10-06 2018-10-08 Code coverage generation in gpu by using host-device coordination

Publications (1)

Publication Number Publication Date
US20190108006A1 true US20190108006A1 (en) 2019-04-11

Family

ID=65993196

Family Applications (3)

Application Number Title Priority Date Filing Date
US16/154,560 Active US10853044B2 (en) 2017-10-06 2018-10-08 Device profiling in GPU accelerators by using host-device coordination
US16/154,542 Abandoned US20190108006A1 (en) 2017-10-06 2018-10-08 Code coverage generation in gpu by using host-device coordination
US16/939,313 Active US11579852B2 (en) 2017-10-06 2020-07-27 Device profiling in GPU accelerators by using host-device coordination

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US16/154,560 Active US10853044B2 (en) 2017-10-06 2018-10-08 Device profiling in GPU accelerators by using host-device coordination

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/939,313 Active US11579852B2 (en) 2017-10-06 2020-07-27 Device profiling in GPU accelerators by using host-device coordination

Country Status (1)

Country Link
US (3) US10853044B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021057057A1 (en) * 2019-09-23 2021-04-01 上海创景信息科技有限公司 Target-code coverage testing method, system, and medium of operating system-level program
CN114168142A (en) * 2021-10-12 2022-03-11 芯华章科技股份有限公司 Code coverage rate calculation method, electronic device and storage medium
CN115017059A (en) * 2022-08-08 2022-09-06 北京北大软件工程股份有限公司 Fuzzy test method and system for graphical user interface program
US12008363B1 (en) 2021-07-14 2024-06-11 International Business Machines Corporation Delivering portions of source code based on a stacked-layer framework

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11650902B2 (en) * 2017-11-08 2023-05-16 Intel Corporation Methods and apparatus to perform instruction-level graphics processing unit (GPU) profiling based on binary instrumentation
US11120521B2 (en) * 2018-12-28 2021-09-14 Intel Corporation Techniques for graphics processing unit profiling using binary instrumentation
US10922779B2 (en) * 2018-12-28 2021-02-16 Intel Corporation Techniques for multi-mode graphics processing unit profiling
US11226799B1 (en) 2020-08-31 2022-01-18 International Business Machines Corporation Deriving profile data for compiler optimization

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5909577A (en) 1994-04-18 1999-06-01 Lucent Technologies Inc. Determining dynamic properties of programs
US5758061A (en) 1995-12-15 1998-05-26 Plum; Thomas S. Computer software testing method and apparatus
US5815720A (en) * 1996-03-15 1998-09-29 Institute For The Development Of Emerging Architectures, L.L.C. Use of dynamic translation to collect and exploit run-time information in an optimizing compilation system
US6631518B1 (en) 1997-03-19 2003-10-07 International Business Machines Corporation Generating and utilizing organized profile information
US6622300B1 (en) * 1999-04-21 2003-09-16 Hewlett-Packard Development Company, L.P. Dynamic optimization of computer programs using code-rewriting kernal module
US6308324B1 (en) * 1999-06-10 2001-10-23 International Business Machines Corporation Multi-stage profiler
US6795963B1 (en) 1999-11-12 2004-09-21 International Business Machines Corporation Method and system for optimizing systems with enhanced debugging information
US20030066060A1 (en) 2001-09-28 2003-04-03 Ford Richard L. Cross profile guided optimization of program execution
US7107585B2 (en) * 2002-07-29 2006-09-12 Arm Limited Compilation of application code in a data processing apparatus
US20050028146A1 (en) 2003-08-01 2005-02-03 Quick Shawn G. Systems and methods for software and firmware testing using checkpoint signatures
US7730469B1 (en) 2004-05-04 2010-06-01 Oracle America, Inc. Method and system for code optimization
US20070079294A1 (en) 2005-09-30 2007-04-05 Robert Knight Profiling using a user-level control mechanism
US7954094B2 (en) 2006-03-27 2011-05-31 International Business Machines Corporation Method for improving performance of executable code
US8375368B2 (en) * 2006-06-20 2013-02-12 Google Inc. Systems and methods for profiling an application running on a parallel-processing computer system
US20090037887A1 (en) 2007-07-30 2009-02-05 Chavan Shasank K Compiler-inserted predicated tracing
US8387026B1 (en) * 2008-12-24 2013-02-26 Google Inc. Compile-time feedback-directed optimizations using estimated edge profiles from hardware-event sampling
US8789032B1 (en) * 2009-02-27 2014-07-22 Google Inc. Feedback-directed inter-procedural optimization
US20120167057A1 (en) 2010-12-22 2012-06-28 Microsoft Corporation Dynamic instrumentation of software code
US8782645B2 (en) * 2011-05-11 2014-07-15 Advanced Micro Devices, Inc. Automatic load balancing for heterogeneous cores
US8819649B2 (en) 2011-09-09 2014-08-26 Microsoft Corporation Profile guided just-in-time (JIT) compiler and byte code generation
CN103959238B (en) 2011-11-30 2017-06-09 英特尔公司 Use the efficient realization of the RSA of GPU/CPU architectures
US10025643B2 (en) 2012-05-10 2018-07-17 Nvidia Corporation System and method for compiler support for kernel launches in device code
US9760351B2 (en) 2013-04-02 2017-09-12 Google Inc. Framework for user-directed profile-driven optimizations
US9612809B2 (en) 2014-05-30 2017-04-04 Microsoft Technology Licensing, Llc. Multiphased profile guided optimization
US9535815B2 (en) * 2014-06-04 2017-01-03 Nvidia Corporation System, method, and computer program product for collecting execution statistics for graphics processing unit workloads
US9348567B2 (en) * 2014-07-03 2016-05-24 Microsoft Technology Licensing, Llc. Profile guided optimization in the presence of stale profile data
US9274771B1 (en) 2014-09-22 2016-03-01 Oracle International Corporation Automated adaptive compiler optimization
US10353679B2 (en) * 2014-10-31 2019-07-16 Microsoft Technology Licensing, Llc. Collecting profile data for modified global variables
US10097973B2 (en) * 2015-05-27 2018-10-09 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device
US11003428B2 (en) 2016-05-25 2021-05-11 Microsoft Technolgy Licensing, Llc. Sample driven profile guided optimization with precise correlation
US10296447B2 (en) 2016-12-09 2019-05-21 Fujitsu Limited Automated software program repair
US10379827B2 (en) 2016-12-29 2019-08-13 Intel Corporation Automatic identification and generation of non-temporal store and load operations in a dynamic optimization environment
US11650902B2 (en) * 2017-11-08 2023-05-16 Intel Corporation Methods and apparatus to perform instruction-level graphics processing unit (GPU) profiling based on binary instrumentation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021057057A1 (en) * 2019-09-23 2021-04-01 上海创景信息科技有限公司 Target-code coverage testing method, system, and medium of operating system-level program
US12008363B1 (en) 2021-07-14 2024-06-11 International Business Machines Corporation Delivering portions of source code based on a stacked-layer framework
CN114168142A (en) * 2021-10-12 2022-03-11 芯华章科技股份有限公司 Code coverage rate calculation method, electronic device and storage medium
CN115017059A (en) * 2022-08-08 2022-09-06 北京北大软件工程股份有限公司 Fuzzy test method and system for graphical user interface program

Also Published As

Publication number Publication date
US10853044B2 (en) 2020-12-01
US11579852B2 (en) 2023-02-14
US20190146766A1 (en) 2019-05-16
US20200356351A1 (en) 2020-11-12

Similar Documents

Publication Publication Date Title
US20190108006A1 (en) Code coverage generation in gpu by using host-device coordination
Vouillon et al. From bytecode to JavaScript: the Js_of_ocaml compiler
Nelson et al. Specification and verification in the field: Applying formal methods to {BPF} just-in-time compilers in the linux kernel
US5907709A (en) Development system with methods for detecting invalid use and management of resources and memory at runtime
US20060130021A1 (en) Automated safe secure techniques for eliminating undefined behavior in computer software
US9417931B2 (en) Unified metadata for external components
US20070011669A1 (en) Software migration
Grimmer et al. Cross-language interoperability in a multi-language runtime
US9626170B2 (en) Method and computer program product for disassembling a mixed machine code
US20110296385A1 (en) Mechanism for Generating Backtracing Information for Software Debugging of Software Programs Running on Virtual Machines
US8881123B2 (en) Enabling symbol resolution of private symbols in legacy programs and optimizing access to the private symbols
US20090320007A1 (en) Local metadata for external components
Li et al. K-LLVM: a relatively complete semantics of LLVM IR
Doeraene Scala. js: Type-directed interoperability with dynamically typed languages
Barrière et al. Formally verified native code generation in an effectful JIT: turning the CompCert backend into a formally verified JIT compiler
US10983771B1 (en) Quality checking inferred types in a set of code
Hill et al. Pin++: an object-oriented framework for writing pintools
Chen et al. Type-preserving compilation for large-scale optimizing object-oriented compilers
Hamza et al. From verified Scala to STIX file system embedded code using Stainless
Chisnall Smalltalk in a C world
Chang et al. Analysis of low-level code using cooperating decompilers
Arif et al. Cinnamon: A domain-specific language for binary profiling and monitoring
Amadio et al. Certifying cost annotations in compilers
Ramos et al. Implementing Python for DrRacket
Kågström et al. Cibyl: an environment for language diversity on mobile devices

Legal Events

Date Code Title Description
AS Assignment

Owner name: NVIDIA CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SANDANAGOBALANE, HARIHARAN;LEE, SEAN YOUNGSUNG;GROVER, VINOD;REEL/FRAME:047612/0859

Effective date: 20181128

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION