US20150261651A1 - Validation of applications for graphics processing unit - Google Patents
Validation of applications for graphics processing unit Download PDFInfo
- Publication number
- US20150261651A1 US20150261651A1 US14/727,427 US201514727427A US2015261651A1 US 20150261651 A1 US20150261651 A1 US 20150261651A1 US 201514727427 A US201514727427 A US 201514727427A US 2015261651 A1 US2015261651 A1 US 2015261651A1
- Authority
- US
- United States
- Prior art keywords
- application
- gpu
- modified version
- server device
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0736—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in functional embedded systems, i.e. in a data processing system designed as a combination of hardware and software dedicated to performing a certain function
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3612—Software analysis for verifying properties of programs by runtime analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3648—Software debugging using additional hardware
- G06F11/3652—Software debugging using additional hardware in-circuit-emulation [ICE] arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3664—Environments for testing or debugging software
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/566—Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Definitions
- This disclosure is directed to applications that execute on a graphics processing unit (GPU), and more particularly, to validation of such applications.
- GPU graphics processing unit
- GPUs Graphics processing units
- Newer GPUs include programmable cores that execute programs, and thereby provide greater functional flexibility as compared to the traditional GPUs.
- the programmable cores may execute both graphics related applications and non-graphics related applications.
- this disclosure is related to techniques for identifying potentially problematic applications that are to be executed on a graphics processing unit (GPU), prior to execution.
- problematic applications include, but are not limited to, malicious applications, as well as inefficient or error-prone applications.
- a server device external to the device that houses the GPU may validate the application. Validation of the application may mean that the application satisfies one or more criteria. As one example, validation may mean determining with some level of assurance that the application is not a malicious application, an error-prone application, or an inefficient application.
- the server device may transmit an indication, to the device, that indicates whether it is either safe or unadvisable for the GPU to execute the program. The device may then elect to execute the program on the GPU based on the received indication.
- the disclosure describes a method that includes receiving, with a server device, an application that is to be executed by a graphics processing unit (GPU) that resides on a device external to the server device.
- the method also include performing, with the server device, at least one of an analysis of the application prior to and during compilation of the application on the server device, and an analysis of the application during execution of the application on the server device.
- the method further includes determining whether the application satisfies one or more performance criteria based on at least one of the analyses, and transmitting to the device a validation of the application if the application satisfies the one or more performance criteria.
- the disclosure describes an apparatus that includes an emulator unit operable to receive an application that is to be executed by a graphics processing unit (GPU) that resides on a device external to the apparatus.
- the emulator unit is also operable to perform at least one of an analysis of the application prior to and during compilation of the application on the apparatus, and an analysis of the application during execution of the application on the apparatus.
- the emulator unit is also operable to determine whether the application satisfies one or more performance criteria based on at least one of the analyses, and transmit to the device a validation of the application if the application satisfies the one or more performance criteria.
- the disclosure describes a server device that includes means for receiving an application that is to be executed by a graphics processing unit (GPU) that resides on a device external to the server device.
- the server device also includes means for performing at least one of an analysis of the application prior to and during compilation of the application on the server device, and an analysis of the application during execution of the application on the server device.
- the server device further includes means for determining whether the application satisfies one or more performance criteria based on at least one of the analyses, and means for transmitting to the device a validation of the application if the application satisfies the one or more performance criteria.
- the disclosure describes a non-transitory computer-readable storage medium comprising instructions that cause one or more processors to receive, with a server device, an application that is to be executed by a graphics processing unit (GPU) that resides on a device external to the server device.
- the instructions further cause one or more processors to perform, with the server device, at least one of an analysis of the application prior to and during compilation of the application on the server device, and an analysis of the application during execution of the application on the server device.
- the instructions also cause the one or more processors to determine whether the application satisfies one or more performance criteria based on at least one of the analyses, and transmit to the device a validation of the application if the application satisfies the one or more performance criteria.
- the disclosure describes a method that includes receiving an application that is to be executed by a graphics processing unit (GPU) of a device, and transmitting the application to a server device external to the device for validation of the application.
- the method further includes receiving a validation from the server device that indicates that the application satisfies one or more criteria for execution on the GPU.
- GPU graphics processing unit
- the disclosure describes an apparatus that includes a graphics processing unit (GPU), and a device memory operable to store an application that is to be executed by the GPU.
- the apparatus also includes a processor operable to transmit the application to a server device external to the apparatus, and receive a validation from the server device that indicates that the application satisfies one or more criteria for execution on the GPU.
- the disclosure describes a device that includes a graphics processing unit (GPU).
- the device also includes means for receiving an application that is to be executed by the GPU, and means for transmitting the application to a server device external to the device for validation of the application.
- the device further includes means for receiving a validation from the server device that indicates that the application satisfies one or more criteria for execution on the GPU.
- the disclosure describes a non-transitory computer-readable storage medium comprising instructions that cause one or more processors to receive an application that is to be executed by a graphics processing unit (GPU) of a device, and transmit the application to a server device external to the device for validation of the application.
- the instructions further cause the processor to receive a validation from the server device that indicates that the application satisfies one or more criteria for execution on the GPU.
- FIG. 1 is a block diagram illustrating an example of a system that may be operable to implement one or more aspects of this disclosure.
- FIG. 2 is a flowchart illustrating an example operation of a device that may be operable to implement one or more aspects of this disclosure.
- FIG. 3 is a flowchart illustrating an example operation of a server that may be operable to implement one or more aspects of this disclosure.
- FIG. 4 is a flowchart illustrating another example operation of a server that may be operable to implement one or more aspects of this disclosure.
- FIG. 5 is a block diagram illustrating an example device, illustrated in FIG. 1 , in further detail.
- this disclosure is related to techniques to ensure proper functionality of applications that are to be executed on a graphics processing unit (GPU).
- GPUs graphics processing unit
- newer GPUs allow for programmable shader cores.
- these GPUs execute applications such as vertex shaders and fragment shaders that perform functions that were previously delegated to components of the fixed-function hardware pipelines.
- programmable shader cores allow for functional flexibility, they also invite misuse or suboptimal use of the GPU.
- a malicious developer may develop an application that generates a denial of service attack or a virus.
- a developer who may not have malicious intent, may nevertheless inadvertently develop an inefficient or error-prone application.
- a problematic application e.g., a malicious, inefficient or error-prone application
- the techniques of this disclosure may assist in identifying possibly malicious, inefficient and/or error-prone GPU-executed applications, prior to execution by the GPU.
- the techniques of this disclosure may be directed to a cloud-based solution in which a server device, external to the device that houses the GPU, and coupled to the device housing the GPU via one or more network connections, functions as an emulator for execution of an application.
- the server may emulate the results of the application, as if the application is executing on the GPU.
- the server may validate the application (e.g., determine whether or not the program is malicious, inefficient, or error-prone), and indicate as such to the device that houses the GPU.
- the GPU may then execute the application based on the received indication.
- the server may execute a validation process to validate the application.
- the validation process may be a software process.
- the software process may be executed in conjunction with general purpose processor and/or special purpose hardware.
- the server may execute virtual model software.
- the virtual model causes the server to emulate the GPU or the actual device that includes GPU upon which the application will execute.
- the server may include a hardware emulation board to validate the application.
- the server may also include an application that is specifically designed to test security violations of the application that is be executed by the GPU.
- the server may perform static analysis, dynamic analysis, or a combination thereof.
- Static analysis refers to analysis of the application that can be performed without execution of the application. For instance, static analysis can be performed during compilation.
- the server may identify errors in the application such as infinite loops in the program or out-of-bounds access to array locations within the application as two non-limiting examples.
- Dynamic analysis refers to analysis of the application during execution, which may additionally result in identifying problematic applications (e.g., malicious, inefficient, and error-prone applications).
- the server may execute compiled code, and the server may provide the executed code with hypothetical input values.
- the hypothetical input values may be, for example, different input images, input images with different sizes, and the like.
- the server may monitor the results and the functions performed by the executed code. For example, the server may monitor memory accesses by the virtual model of the GPU, and determine whether the memory accesses are out-of-bounds memory accesses. The server may also monitor the memory addresses where the virtual model of the GPU is writing information. Based on the memory accesses of the virtual model of the GPU and memory addresses where the virtual model of the GPU is writing information, the server may be able to determine whether the application is error-prone. Such memory tracking may be particularly useful when the application reads or writes to variables using pointers.
- the server may also detect applications that generate or enable denial of service attacks. For example, the server may monitor the rate at which the virtual model of the GPU is able to execute the application. If the server detects slow responsiveness, unintended termination, or hanging, the server may determine that the application is an application designed for a denial of service attack, or a very poorly designed application. In either case, execution of such an application may negatively impact the experience of a user.
- the server may be able to tune and optimize the application as well. For example, the server may insert or replace the source code, or portions of the source code, or collect statistics to determine how well the compiled code works.
- the server may validate the application and optimize or tune the application once. After such validation, the device may execute the application as often as the user would like without requiring further validations or optimization. Also, in some examples, after validating a certain application, the server may store an indication that indicates that this application has already been validated. If the server receives the same source code or pre-compiled object code again, the server may first ensure that the code is identical, and if so, immediately validate that application.
- FIG. 1 is a block diagram illustrating an example of a system that may be operable to implement one or more aspects of this disclosure.
- FIG. 1 illustrates system 10 that includes device 12 , network 22 , validation server device 24 , and application server device 38 .
- system 10 may include a plurality of devices 12 , validation servers 24 , and application servers 38 .
- System 10 may be referred to as a cloud-based system to indicate that validation of application 20 occurs in validation server device 24 , which is external to device 12 , as described in more detail.
- the techniques of this disclosure may be directed to validating application 20 in the cloud (e.g., in validation server device 24 , which is external to device 12 ).
- Examples of device 12 include, but are not limited to, video devices such as media players, set-top boxes, wireless handsets such as mobile telephones, personal digital assistants (PDAs), desktop computers, laptop computers, gaming consoles, video conferencing units, tablet computing devices, and the like.
- Examples of validation server device 24 and application server device 38 include, but are not limited to, laptops, desktops, web servers, and the like. In general, validation server device 24 and application server device 38 may be any type of device capable of performing the functions attributed to validation server device 24 and application server device 38 in this disclosure.
- Network 22 may allow device 12 to securely communicate with validation server device 24 and application server device 38 .
- any communication between device 12 and validation server device 24 and application server device 38 may be encrypted or otherwise secured.
- any communication between device 12 and validation server device 24 and application server device 38 may require user authorization.
- network 22 may ensure that information transmitted by any one of device 12 , validation server device 24 , and application server device 38 is received only by the intended device or devices, and no other device.
- Network 22 may be a local area network (LAN), a wide area network (WAN), the Internet, and the like.
- Device 12 , validation server device 24 , and application server device 38 may be coupled to network 22 wirelessly or through a wired link.
- device 12 may directly communicate with validation server device 24 and/or application server device 38 through a wireless or wired connection.
- network 22 may not be needed in system 10 .
- device 12 may include GPU 14 , processor 16 , and device memory 18 .
- Device 12 may include components in addition to those illustrated in FIG. 1 .
- FIG. 5 illustrates an example of device 12 that includes more components than those illustrated in FIG. 1 .
- GPU 14 and processor 16 examples include, but are not limited, to a digital signal processor (DSP), a general purpose microprocessor, an application specific integrated circuit (ASIC), a field programmable logic array (FPGA), or other equivalent integrated or discrete logic circuitry.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable logic array
- GPU 14 and processor 16 are illustrated as separate components, aspects of this disclosure are not so limited. In alternate examples, GPU 14 and processor 16 may be part of a common integrated circuit. For purposes of illustration and ease of description, GPU 14 and processor 16 are illustrated as separate components.
- Examples of device memory 18 include, but are not limited to, a random access memory (RAM), a read only memory (ROM), or an electrically erasable programmable read-only memory (EEPROM). Examples of device memory 18 may also include storage devices such as CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory. In general, device memory 18 may include mediums that can be used to store desired program code in the form of instructions or data structures and that can be accessed by GPU 14 and processor 16 . In some examples, device memory 18 may comprise one or more computer-readable storage media, such as a computer-readable storage device. For instance, in some example implementations, device memory 18 may include instructions that cause GPU 14 and processor 16 to perform the functions ascribed to GPU 14 and processor 16 in this disclosure.
- RAM random access memory
- ROM read only memory
- EEPROM electrically erasable programmable read-only memory
- Examples of device memory 18 may also include storage devices such as CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices,
- Device memory 18 may, in some examples, be considered as a non-transitory storage medium.
- the term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted to mean that device memory 18 is non-movable.
- device memory 18 may be removed from device 12 , and moved to another device.
- a storage device substantially similar to device memory 18 , may be inserted into device 12 .
- a non-transitory storage medium may store data that can, over time, change (e.g., in RAM).
- GPU 14 may be operable to execute one or more software applications.
- GPU 14 may include a processor core on which one or more software applications may execute.
- the applications that execute on GPU 14 may be graphics applications such as vertex shaders and fragment shaders for generating graphics data.
- graphics applications such as vertex shaders and fragment shaders for generating graphics data.
- a developer may consider it beneficial to exploit the massive parallelism of GPU 14 and develop a software application unrelated to graphics processing that exploits the massive parallelism of GPU 14 .
- GPU 14 may be referred to as a general purpose GPU (GP-GPU).
- GP-GPU general purpose GPU
- FIG. 1 illustrates GPU 14 executing application 20 .
- Application 20 may be a graphics application or a non-graphics application that executes on GPU 14 .
- Application 20 is illustrated in a dashed box within GPU 14 to indicate that application 20 is executing on GPU 14 .
- GPU 14 does not actually include application 20 .
- application 20 may be stored in device memory 18 , as illustrated in FIG. 1 .
- Application 20 may be developed using a wide variety of different programming application processing interfaces (APIs).
- APIs programming application processing interfaces
- a developer may have developed application 20 using any programming API such as OpenGL, OpenCL, WebGL, and WebCL.
- applications that are developed using the OpenGL or WebGL APIs are designed for graphics processing.
- Applications that are developed using the OpenCL or WebCL APIs are designed for processing unrelated to graphics processing.
- the OpenGL, OpenCL, WebGL, and WebCL APIs are provided for illustration purposes and should not be considered limiting.
- the techniques of this disclosure may be extendable to APIs in addition to the examples provided above. In general, the techniques of this disclosure may be extendable to any technique utilized by a developer to develop application 20 .
- device memory 18 may store application 20 .
- a user of device 12 may cause device 12 to download application 20 from application server device 38 via network 22 .
- device 12 may store application 20 in device memory 18 .
- a user of device 12 may insert a FLASH drive into device 12 that stores application 20 , and device 12 may retrieve application 20 from the FLASH drive and store application 20 in device memory 18 .
- application server device 38 may not be needed.
- the above examples that describe the manner in which device 12 stores application 20 in device memory 18 are provided for purposes of illustration and should not be considered limiting.
- the techniques of this disclosure may be applicable to any technique in which application 20 is loaded into device memory 18 .
- Device memory 18 may store the source code of application 20 , intermediate representation of application 20 , or object code of application 20 .
- the source code of application 20 may be the text in the programming language in which application 20 was developed.
- the object code of application 20 may be the binary bits resulting from the compilation of application 20 .
- application server device 38 may compile the source code of application 20 , and device 12 may download this pre-compiled object code of application 20 .
- the intermediate representation of application 20 may be intermediate to the source code and the object code.
- the variables of the source code of application 20 may be replaced with register or memory identifiers for where the variables will be stored in device memory 18 .
- the capability of the programmable core or cores of GPU 14 to execute applications increases the functionality of GPU 14 .
- the capability of GPU 14 to execute applications may invite misuse or suboptimal use of GPU 14 and make device 12 more susceptible to malicious applications or error-prone applications.
- applications that execute solely on a central processing unit (CPU), such as processor 16 execute applications in a virtual machine setting which allocates the amount of memory of device memory 18 and storage locations within device memory 18 that are accessible to the applications. Because the applications are confined to the virtual machine of processor 16 , the applications are unable to access out-of-bounds memory addresses and are limited to accessing memory addresses specifically provided to it by the virtual machine of processor 16 . In this way, it may be difficult for applications executing on processor 16 to drastically impact processor 16 , and device 12 , in turn, in a negative manner.
- GPU 14 it may not be practical to implement virtual machines on GPU 14 .
- the massive parallel processing capabilities of GPU 14 may not be well suited for executing virtual machines.
- the virtual machines would dominate the resources of GPU 14 , possibly restricting other applications from being executed on GPU 14 . Accordingly, in some instances, virtual machines may not be able to limit the negative impacts of malicious or error-prone applications that execute on GPU 14 .
- Applications that execute on GPU 14 may be considered as applications that execute “natively” (i.e., are not confined to a virtual machine).
- Native execution of application 20 may allow for application 20 to access larger portions of device memory 18 .
- Such access may allow problematic application such as malicious applications or poorly designed (e.g., error-prone) applications to negatively impact the performance capabilities of GPU 14 and device 12 .
- the developer of application 20 may develop application 20 such that application 20 , when executed, provokes a denial of service attack on device 12 , or propagates a virus that impacts the performance of device 12 .
- application 20 may control GPU 14 such that GPU 14 may not be able to perform any other tasks such as rendering graphics content for a user interface. This may cause device 12 to “hang,” which may drastically impact the functionality of device 12 .
- the developer of application 20 may develop application 20 to access portions of device memory 18 that it should be limited from accessing. Application 20 may store instructions for a virus in these portions of device memory 18 . Then, when processor 16 or GPU 14 accesses these portions of device memory 18 , processor 16 or GPU 14 may accidentally execute the stored virus.
- an error-prone application may include infinite loops, out-of-bounds access to an array, or out-of-bounds access to memory locations of device memory 18 .
- An inefficient application may not properly utilize the functionality of GPU 14 .
- an inefficient application may not properly use the programmable functionality of GPU 14 .
- application server device 38 may potentially provide a modicum of protection from malicious and error-prone applications.
- the owner of application server device 38 may guarantee that none of the applications stored on application server device 38 are malicious or error-prone applications. However, this may not be the case in every instance (e.g., the owner of application server device 38 may not provide a guarantee of safe and proper operation), or the purported “guarantee” from the owner of application server device 38 may not be trustworthy.
- the techniques of this disclosure may assist in identifying whether applications that are to be executed on GPU 14 (e.g., application 20 ) are problematic applications such as malicious applications, as well as inefficient and error-prone applications, prior to execution.
- the techniques of this disclosure may validate application 20 prior to GPU 14 executing application 20 .
- Validation of application 20 may mean that the application 20 satisfies one or more performance criteria. For example, validation may mean determining with some level of assurance that application 20 is not a malicious application, an inefficient application, or an error-prone application.
- the example techniques described in this disclosure may transmit an indication to device 12 that indicates whether it is safe or inadvisable for GPU 14 to execute application 20 .
- Processor 16 may then elect to instruct GPU 14 to execute application 20 based on the received indication.
- processor 16 may instruct GPU 14 to execute application 20 if the indication is favorable, i.e., indicates that the program is not malicious, not inefficient, and/or not error-prone.
- processor 16 may instruct GPU 14 to execute application 20 even if the indication is unfavorable. For example, if application 20 is not malicious or error-prone, but inefficient, processor 16 may instruct GPU 14 to execute application 20 as such execution may potentially not harm GPU 14 or device 12 , but may not execute as efficiently as possible.
- the techniques of this disclosure may also tune, or otherwise optimize, an inefficient application that is to be executed on GPU 14 .
- the developer of application 20 may not have any malicious intent, and may have developed application 20 such that application 20 is not prone to errors. Nevertheless, it may be possible that application 20 may not efficiently utilize the resources of GPU 14 .
- one of the functions of application 20 may be to divide a task into workgroups and perform parallel processing on the workgroups to exploit the parallelism of GPU 14 .
- application 20 may divide an image into blocks and perform parallel processing on the blocks. The size of each of blocks may be based on the amount of local memory available on GPU 14 .
- the developer of application 20 may want to design application 20 to execute on a variety of different GPUs, the developer may not know ahead of time how much local memory is available on a particular GPU, such as GPU 14 , as different GPUs may include different amounts of local memory. To address this, the developer may develop application 20 to utilize variable sized blocks. In some instances, utilizing variable sized blocks may be less efficient than utilizing fixed sized blocks. The techniques of this disclosure may tune or optimize application 20 such that application 20 utilizes fixed sized blocks based on the amount of available memory in GPU 14 .
- application 20 may perform matrix operations.
- the developer of application 20 may have developed application 20 to perform row-based matrix operations or column-based matrix operation.
- GPU 14 may be better suited to perform row-based matrix operations, as compared to column-based matrix operations, or vice-versa.
- the techniques of this disclosure may modify application 20 to perform row-based matrix operations, if application 20 uses column-based matrix operations, to more efficiently utilize GPU 14 .
- the developer may have developed application 20 for older versions of GPUs, and application 20 may not be optimized for GPU 14 .
- the techniques of this disclosure may modify application 20 so that application 20 is more optimized for newer GPUs, such as GPU 14 .
- GPU 14 may then execute application 20 , which is optimized to execute on newer GPUs.
- validation server device 24 may validate application 20 , and in some examples, optimize or tune application 20 .
- validation server device 24 may implement a validation process that determines whether application 20 satisfies one or more performance criteria. For example, validation server device 24 may determine, with some reasonable level of assurance, whether application 20 is a malicious application, an error-prone application, or an inefficient application. In examples where application 20 is an error-prone application or an inefficient application, validation server device 24 may attempt to correct the errors in application 20 , or optimize application 20 to be more efficient.
- validation server device 24 may employ different types of analysis to ensure with some reasonable amount of certainty that application 20 is not a problematic application.
- validation server device 24 is external to device 12 . Accordingly, the validation of application 20 and optimization of application 20 may be offloaded from device 12 , which may be referred to as validating application 20 in the “cloud” because validation server device 24 is a server that is external to device 12 .
- validation server device 24 is a server that is external to device 12 .
- the probability of application 20 negatively impacting GPU 14 and device 12 may be reduced, in cases where application 20 is a malicious application or an error-prone application.
- power savings and processing efficiency may be realized because processor 16 does not need to consume power and clock cycles validating or optimizing application 20 .
- performance criteria There may be various examples of performance criteria that application 20 may need to satisfy for validation server device 24 to validate application 20 .
- the performance criteria can be part of static analysis, dynamic analysis, or a combination thereof.
- Static analysis refers to analysis of application 20 that can be performed without execution of application 20 to ensure that application 20 satisfies one or more performance criteria associated with static analysis.
- Dynamic analysis refers to analysis of application 20 during execution to ensure that application 20 satisfies one or more performance criteria associated with dynamic analysis.
- Validation server device 24 may be operable to perform static analysis, dynamic analysis, or both static analysis and dynamic analysis. For purposes of illustration, validation server device 24 is described as being operable to perform both static analysis and dynamic analysis, and therefore, operable to ensure that application 20 satisfies the performance criteria associated with both static analysis and dynamic analysis. In alternate examples, validation server device 24 may be operable to perform one of static analysis or dynamic analysis, and in these alternate examples, validation server device 24 may be operable to ensure that application 20 satisfies the performance criteria associated with the type of analysis that validation server device 24 is operable to perform (e.g., performance criteria associated with static analysis or dynamic analysis).
- validation server device 24 includes emulator unit 26 and server memory 28 .
- Server memory 28 may include data and/or instructions defining one or more GPU models 30 , one or more GPU inputs 32 , and one or more device models 34 .
- Emulator unit 26 may be a processing unit that is operable to execute one or more of GPU models 30 and device models 34 .
- emulator unit 26 may be a hardware emulation board, which may be a GPU.
- emulator unit 26 may include two portions, which may be part of the same circuitry or separate, distinct circuits, where the first portion is a processing unit that is operable to execute one or more of GPU models 30 and device models 34 , and the second portion that is the hardware emulation board (e.g., a GPU). Examples of emulator unit 26 include, but are not limited to, a DSP, a general purpose microprocessor, an ASIC, a FPGA, or other equivalent integrated or discrete logic circuitry.
- Server memory 28 may be similar to device memory 18 .
- server memory 18 may be any medium that can be used to store desired program code in the form of instructions, data, and/or data structures and that can be accessed by emulator unit 26 and that cause emulator unit 26 to perform one or more the functions ascribed to emulator unit 26 .
- server memory 28 may, in some examples, be considered as a non-transitory storage medium, as described above with respect to device memory 18 .
- server memory 28 may store data and/or instructions defining one or more GPU models 30 , GPU inputs 32 , and device models 34 . It may not be necessary for server memory 28 to store one or more GPU models 30 , GPU inputs 32 , and device models 34 in every example. For example, server memory 28 may store GPU models 30 and GPU inputs 32 , but may not store device models 34 . If validation server device 24 is operable to perform only static analysis, GPU models 30 , GPU inputs 32 , and device models 34 may not be needed. In some examples, it is with the GPU models 30 , GPU inputs 32 , and device models 34 that emulator unit 26 performs dynamic analysis.
- Each of the one or more GPU models 30 may correspond to a particular GPU type, and each of the one or more device models 34 may correspond to a particular device type.
- each one of the GPU models 30 may model the configuration of its corresponding GPU type in terms of parallel processing capabilities, local memory availability, and any other pertinent characteristic that defines the functionality of GPUs of that GPU type.
- Each one of the device models 34 may model the configuration of its corresponding device type in terms of memory configuration, processor speed, system bus speed, device memory, and any other pertinent characteristics that defines the functionality of devices of that device type.
- different vendors provide different types of devices with different functional characteristics, and device models 34 may be models for each of these different device types.
- the one or more GPU models 30 and device models 34 may each be considered as virtual model software that emulator unit 26 can execute. For example, when emulator unit 26 executes one of the GPU models 30 , emulator unit 26 emulates the GPU to which the executed GPU model 30 corresponds. When emulator unit 26 executes one of the GPU models 30 and one of the device models 34 , emulator unit 26 emulates the device to which the executed device model 34 corresponds, as if such a device included the GPU to which the executed GPU model 30 corresponds.
- the GPU vendors and the device vendors may supply GPU models 30 and device models 34 , respectively. There may be other ways in which server memory 28 stores GPU models 30 and device models 34 , and aspects of this disclosure are not limited to the specific examples where vendors provide GPU models 30 and device models 34 .
- emulator unit 26 when emulator unit 26 executes one of GPU models 30 , emulator unit 26 may function as if the parallel processing capabilities and local memory availability of emulator unit 26 (as two examples) are functionally equivalent to the GPU type associated with executed one of GPU models 30 .
- emulator unit 26 when emulator unit 26 executes one of device models 34 , emulator unit 26 may function as if the memory configuration, processor speed, system bus speed, and device memory of emulator unit 26 (as four examples) are functionally equivalent to the device type associated with executed one of device models 34 .
- the execution of one of GPU models 30 causes emulator unit 26 to function as the GPU associated with the executed one of GPU models 30 .
- the execution of one of GPU models 30 and one of device models 34 causes emulator unit 26 to function as a device associate with the executed one of device models 34 that includes the GPU associated with the executed one of GPU models 30 .
- One of the plurality of GPU models 30 may be a generic GPU model 30
- one of the plurality of device models 34 may be generic device model 34
- server memory 28 may store a generic GPU model and a generic device model instead of a plurality of GPU models and device models.
- the generic GPU model and device model may not correspond to a particular GPU or device type, but may be suitable for static and dynamic analysis.
- server memory 28 does not store a GPU model that corresponds to GPU 14 , then the generic GPU model may be suitable for validation purposes.
- the generic GPU model and the generic device model may conform to a base profile of operation common to most GPUs or devices.
- the generic GPU model may model a GPU with average parallel processing capabilities and local memory availability as compared to other GPUs.
- the generic device model may model a device with average memory configuration, processor speed, system bus speed, and device memory as compared to other devices.
- device 12 may download application 20 from application server device 38 .
- Application 20 may be source code, an intermediate representation, or pre-compiled object code, as described above.
- Processor 16 may then install application 20 on device 12 . If application 20 is in source code or in the intermediate representation, e.g., not pre-compiled object code, part of the installation may be processor 16 executing a compiler to compile the code of application 20 .
- processor 16 may cause device 12 to transmit the downloaded code of application 20 to validation server device 24 for validation.
- processor 16 may cause device 12 to transmit the pre-compiled object code to validation server device 24 for validation before allowing GPU 14 to execute application 20 .
- processor 16 may encrypt or otherwise make secure the downloaded code of application 20 that device 12 transmits to validation server device 24 .
- processor 16 may require authorization from a user prior to transmitting the downloaded code of application 20 to validation server device 24 .
- processor 16 may cause device 12 to transmit the GPU type of GPU 14 or both the GPU type of GPU 14 and the device type of device 12 to validation server device 24 .
- processor 16 may require authorization from the user prior to transmitting the GPU type of GPU 14 or the GPU type of GPU 14 and device type of device 12 to validation server device 24 .
- Emulator unit 26 may be operable to perform static analysis on application 20 to determine whether application 20 satisfies the performance criteria associated with static analysis. For example, emulator unit 26 may analyze application 20 without executing application 20 . As one example, emulator unit 26 may parse through the downloaded code of application 20 to identify code known to be code for a virus. For instance, server memory 28 may store code of known viruses, and emulator unit 26 may compare the downloaded code of application 20 to the code of the known viruses. Determining that the downloaded code of application 20 does not include code of known viruses may be one example of performance criteria that needs to be satisfied to validate application 20 .
- emulator unit 26 may compile the downloaded code of application 20 , in examples where the downloaded code of application 20 is the source code or intermediate representation of application 20 , to identify errors in application 20 during compilation.
- emulator unit 26 may execute compiler 36 , as indicated by dashed lines within emulator unit 26 .
- the compilation of application 20 with compiler 36 , may identify any infinite loops in application 20 or out-of-bounds access to memory array locations within application 20 .
- determining that there are not errors in application 20 may be another example of performance criteria that needs to be satisfied to validate application 20 .
- Static analysis may be limited in the types of errors, inefficiencies, and malicious code that can be found. For example, if the downloaded code of application 20 is pre-compiled object code, it may not be possible for emulator unit 26 to identify errors in application 20 during compilation because the code for application 20 is already pre-compiled object code. As another example, if application 20 relies on pointers for storage, it may not be possible to determine if there are any out-of-bounds memory access errors in application 20 based simply on compiling application 20 .
- emulator unit 26 may perform dynamic analysis. As indicated above, dynamic analysis refers to analysis of application 20 during execution. In some examples, to perform dynamic analysis emulator unit 26 may cause itself to appear as if it is GPU 14 . For example, in some instances, in addition to transmitting the downloaded code of application 20 , processor 16 may cause device 12 to transmit the GPU type of GPU 14 to emulator unit 26 of validation server device 24 , or both the GPU type of GPU 14 and the device type of device 12 to emulator unit 26 of validation server device 24 via network 22 .
- Emulator unit 26 may identify which one of GPU models 30 corresponds to the GPU type of GPU 14 , and may execute that one of GPU models 30 to emulate GPU 14 on validation server device 24 . In examples where emulator unit 26 also receives the device type, emulator unit 26 may identify which one of device models 34 corresponds to the device type of device 12 , and may execute that one of device models 34 to emulate device 12 on validation server device 24 .
- emulator unit 26 may execute the generic GPU model and/or the generic device model.
- emulator unit 26 may execute the generic GPU model and/or generic device model.
- a hardware emulation board such a hardware emulation board may be designed to function, at least in part, as a generic GPU on a generic device.
- emulator unit 26 may execute application 20 . For example, if emulator unit 26 received the source code or intermediate code of application 20 , emulator unit 26 may compile the source code via compiler 36 , and execute the resulting object code. If emulator unit 26 received pre-compiled object code of application 20 , emulator unit 26 may execute the pre-compiled object code of application 20 .
- the techniques of this disclosure may be considered, in some examples, as being performed at least in part by emulator unit 26 executing a virtual model based on the type of GPU 14 (e.g., one of GPU models 30 ). Then, when emulator unit 26 executes application 20 , application 20 can be considered as executing in the virtual model (e.g., the one of GPU models 30 that is executing on emulator unit 26 ). For example, both the GPU model, of GPU models 30 , that corresponds to GPU 14 and application 20 are executing on emulator unit 26 . In the techniques of this disclosure, because emulator unit 26 functions as if it is GPU 14 , due to the execution of the GPU model that corresponds to GPU 14 , when emulator unit 26 executes application 20 , application 20 may execute on the GPU model that corresponds to GPU 14 .
- emulator unit 26 may receive hypothetical input values for application 20 that is executing on emulator unit 26 .
- server memory 28 may store one or more GPU inputs 32 .
- GPU inputs 32 may be values for different graphical images or objects. In some examples, each of these different images may be of different sizes. In examples where application 20 is not related to graphics processing, GPU inputs 32 may be non-graphics inputs. It may be difficult to ensure that emulator unit 26 tests every permutation and combination of possible input values.
- server memory 28 may store a sufficient number and/or range of GPU inputs 32 , e.g., as samples or test inputs, to provide some reasonable level of assurance that application 20 is not a malicious or highly error-prone application (e.g., a problematic application).
- the GPU inputs 32 may include different types of images or objects to be processed and rendered by GPU 14 .
- emulator unit 26 may input the values of GPU inputs 32 and may analyze functionality of the executed GPU model of GPU models 30 .
- emulator unit 26 may analyze the functionality of the hardware emulation board. For example, emulator unit 26 may monitor memory accesses by the executed GPU model of GPU models 30 . In this example, emulator unit 26 may determine whether any of the memory accesses by the executed GPU model of GPU models 30 are out-of-bounds memory accesses of server memory 28 . As another example, emulator unit 26 may monitor the memory addresses where the execute GPU model of GPU models 30 is writing information in server memory 28 . Based on the memory accesses of the GPU model and the memory addresses where the GPU model is writing information, emulator unit 26 may be able to determine whether application 20 is error-prone. Such memory tracking may be particularly useful when application 20 reads or writes to variables using pointers.
- emulator unit 26 may determine that application 20 is error-prone, and possibly malicious. For example, if the executed GPU model writes information to or reads information from a non-existent memory location, emulator unit 26 may determine that application 20 is error-prone. If the executed GPU model writes information to a memory location that is not reserved for the GPU model, emulator unit 26 may determine that application 20 is error-prone or possibly malicious. For example, emulator unit 26 may determine that application 20 is attempting to load a virus into the memory locations which application 20 should not be able to access.
- the limitations of where application 20 can write information to or read information from (e.g., access) during execution may be an example of performance criteria associated with dynamic analysis.
- the performance criteria may be a limitation of the memory locations that application 20 is allowed to access. If the GPU model of GPU models 30 accesses memory location outside of the limited memory locations, due to the execution of application 20 , application 20 may be in violation of the performance criteria. For example, there may be threshold number of access outside the limited memory locations that is allowable, in accordance with the performance criteria. The threshold number may be zero to provide a highest level of assurance that application 20 is not attempting to access memory locations outside of the limited memory locations.
- emulator unit 26 may similarly analyze functionality of the executed device model of device models 34 .
- emulator unit 26 may monitor the functions performed by the executed one of device models 34 while emulator unit 26 executes one of GPU models 30 .
- the execution of one of device models 34 may result in emulator unit 26 device 12 which includes a system bus.
- Emulator unit 26 may determine whether the execution of application 20 causes the system bus to overload resulting in device 12 slowing down.
- the monitoring of the system bus to determine whether the system bus is being overloaded may be an example of performance criteria associated with dynamic analysis. For example, if the execution of application 20 causes the system bus to overload, application 20 may be in violation of the performance criteria.
- the performance criteria may allow for some level of overloading the system bus, as it may not be possible to not allow any overloading of the system bus.
- the perform criteria may establish a percentage amount threshold of system bus overload. If the system bus overload is below the allowable percentage, the performance criteria is satisfied. Otherwise, the performance criteria is not satisfied.
- Emulator unit 26 may similarly detect malicious applications such as denial of service attacks. For example, emulator unit 26 may monitor the rate at which the GPU model of GPU models 30 is able to execute application 20 . If emulator unit 26 detects slow responsiveness, unintended termination, or hanging, emulator unit 26 may determine application 20 is an application designed for a denial of service attack, or a very poorly designed application.
- the performance criteria may be a threshold execution time or execution rate for a particular task of application 20 . If application 20 takes longer than the threshold execution time to complete a particular task or executes the task at a rate less than the threshold execution rate, application 20 may be in violation of the performance criteria.
- emulator unit 26 may monitor instructions issued by application 20 .
- instructions issued by application 20 may be 96 -bit words. However, not all combinations of 96 bits represents a valid instruction.
- GPU 14 may be designed to ignore invalid instructions; however, this may not be case for every example of GPU 14 .
- emulator unit 26 may determine whether the instructions issued by application 20 during execution are valid or invalid instructions. If emulator unit 26 determines that application 20 is issuing invalid instructions, emulator unit 26 may determine that application 20 is a malicious application, an error-prone application, or an inefficient application.
- application 20 may write data to and read data from registers.
- a malicious application, error-prone application, or inefficient application may read data from unwritten registers. If application 20 attempts to read data from a register that was not previously written to, the data read by application 20 may be meaningless data (i e , uninitialized data). Such reading of uninitialized data may result in unpredictable behavior.
- emulator unit 26 may monitor which registers application 20 writes to during execution, and may determine whether application 20 is reading from a register that has not previously been written to. If emulator unit 26 determines that application 20 is reading from unwritten registers, emulator unit 26 may determine that application 20 is a malicious application, error-prone application, or an inefficient application.
- validation server device 24 may transmit an indication to device 12 indicating that application 20 , with some level of assurance, satisfies one or more performance criteria associated with static analysis, dynamic analysis, or both static and dynamic analysis (e.g., validates application 20 ). In this case, validation server device 24 may provide an indication that application 20 is validated for use by GPU 14 . Otherwise, in some examples, validation server device 24 may transmit an indication to device 12 indicating that application 20 is invalidated for use by GPU 14 , such that it is inadvisable for GPU 14 to execute application 20 . In response, processor 16 may instruct GPU 14 to execute application 20 based on the received indication.
- emulator unit 26 may also transmit the compiled object code of application 20 , as compiled by compiler 36 . In this way, the compilation of application 20 may also be offloaded from device 12 and offloaded to an external device, such as validation server device 24 .
- Validation server device 24 may also be tasked with optimizing or tuning application 20 .
- emulator unit 26 may receive the source code or intermediate code of application 20 . As part of the static and/or dynamic analysis, emulator unit 26 may determine that application 20 is somewhat error-prone or would inefficiently utilize the capabilities of GPU 14 . In these examples, rather than transmitting an indication to device 12 indicating that it is inadvisable for GPU 14 to execute application 20 , emulator unit 26 may attempt to correct the errors of application 20 or attempt to tune application 20 for GPU 14 when it is determined that application 20 may execute inefficiently or with errors on GPU 14 .
- emulator unit 26 may compile the modified code of application 20 to generate object code that GPU 14 should execute. Emulator unit 26 may then transmit the resulting object code to device 12 with an indication that GPU 14 should execute the resulting object code. In this case, GPU 14 may execute the object code generated from the modified code, rather than the object code generated from the original code of application 20 . Alternatively, emulator unit 26 may transmit the modified code of application 20 without compilation.
- the validation of application 20 may be considered as being part of the transmission of the modified code of application 20 (e.g., the transmission of the modified code or the resulting object code).
- device 12 may automatically determine that the modified code of application 20 is suitable for execution because device 12 received the modified code of application 20 from validation server device 24 .
- the validation that device 12 receives from validation server device 24 may be an explicit validation or an implicit validation.
- emulator unit 26 may determine with some level of assurance that application 20 or the modified version of application 20 satisfies one or more performance criteria.
- emulator unit 26 may transmit the indication indicating that it is inadvisable to execute application 20 on GPU 14 . If emulator unit 26 is unable to make application 20 more efficient, emulator unit 26 may still transmit an indication to device 12 indicating that it may be suitable for GPU 14 to execute application 20 because while application 20 may not be completely efficient, application 20 may not be error-prone or malicious.
- emulator unit 26 may insert code (e.g., source code or intermediate code), replace code, or modify code of application 20 in some other manner.
- emulator unit 26 may collect statistics to determine how well the compiled code of application 20 works. For example, application 20 may utilize array indices for storing variable values in an array.
- Emulator unit 26 may add code into the source code of application 20 that checks that array indices, utilized by application 20 , are within the range.
- Emulator unit 26 may add code into the source code of application 20 that causes application 20 to abort when an array index is not within range.
- Emulator unit 26 then may compile the modified source code to produce object code for execution of application 20 by GPU 14 .
- Optimization or tuning may be based on the assumption that applications, such as application 20 , are generally developed to exploit the high level of parallelism of GPU 14 . If the developer did not intend to exploit the parallelism of GPU 14 , the developer would have developed application 20 to not execute on GPU 14 , and rather execute on processor 16 .
- the developer of application 20 may have developed application 20 to perform image processing on blocks of images in parallel.
- the size of the blocks of the images may be based on the amount of available local memory on GPU 14 . Because the developer may not know how much memory is available on GPU 14 , the developer may develop application 20 to use variable-sized blocks, instead of the more efficient fixed sized blocks. For example, fixed-size blocks may be more efficient because the size of the blocks does not change during execution.
- emulator unit 26 may determine the optimal size for the blocks because the GPU model of GPU models 30 that corresponds to GPU 14 may include information that indicates the size of the local memory of GPU 14 . In this example, emulator unit 26 may select the optimal size for the blocks based on the amount of available local memory on GPU 14 , the amount of data that will be needed to write to or read from the local memory of GPU 14 , and other such information which may not be available to developer of application 20 . In aspects of this disclosure, emulator unit 26 would know how much local memory is available and how much data needs to be written or read from local memory because emulator unit 26 may execute application 20 on the GPU model of GPU models 30 that correspond to GPU 14 .
- emulator unit 26 may update or otherwise modify the source code or intermediate code of application 20 to fix block size to the optimally determined size. In other words, emulator unit 26 may determine the optimal size of the blocks to best utilize the parallelism of GPU 14 . Emulator unit 26 may then compile this modified code of application 20 , and transmit the resulting object code to device 12 for execution on GPU 14 . In this way, when GPU 14 executes the modified application 20 , the modified application 20 may execute more efficiently on GPU 14 , as compared to the original application 20 .
- application 20 may perform matrix operations.
- emulator unit 26 may determine whether column-based matrix operations or row-based matrix operations are handled easier by GPU 14 . For instance, emulator unit 26 may cause the GPU model of GPU models 30 that corresponds to GPU 14 to execute application 20 using row-based matrix operations and using column-based matrix operations. Emulator unit 26 may compare the efficiency of the column-based and row-based matrix operations (e.g., number of accesses to memory, amount of processing time, and other such efficiency measures). Based on the measured efficiency, emulator unit 26 may modify the code of application 20 . For example, if column-based operations are more efficiently executed than row-based operations, emulator unit 26 may modify the code of application 20 so that the matrix operations are performed as column-based operations. Similarly, if row-based operations are more efficiently executed than column-based operations, emulator unit 26 may modify the code of application 20 so that the matrix operations are performed as row-based operations.
- the developer of application 20 may have developed application 20 to be executed on older versions of GPU.
- application 20 may properly execute on a GPU such as GPU 14 ; however, application 20 may not fully exploit the functionality of GPU 14 .
- application 20 may unnecessarily limit the amount of graphics or non-graphics data that GPU 14 should process in parallel because older versions of GPUs may be limited in processing capabilities.
- emulator unit 26 may modify the code of application 20 such that, when application 20 is executed, application 20 causes GPU 14 to process more data in parallel.
- emulator unit 26 may modify application 20 such that application 20 is better suited for execution on newer GPUs, and aspects of this disclosure should not be considered limited to the above examples.
- emulator unit 26 may transmit the modified or updated code of application 20 to device 12 .
- processor 16 may compile the code of application 20 , as received from emulator unit 26 , and instruct GPU 14 to execute the resulting object code.
- emulator unit 26 may compile the modified application 20 , via compiler 36 , and transmit the resulting object code to device 12 .
- processor 16 may instruct GPU 14 to execute the received object code for application 20 .
- emulator unit 26 may validate application 20 and optimize or tune application 20 once. After such validation, GPU 14 may execute application 20 as needed without requiring further validation or optimization. Also, in some examples, after emulator unit 26 validates application 20 , emulator unit 26 may store an indication in server memory 28 that indicates that this application 20 has already been validated. In these examples, when emulator unit 26 receives code for validation, emulator unit 26 may first determine whether emulator unit 26 previously validated the code based on the indication stored in server memory 28 . If emulator unit 26 previously validated the code, emulator unit 26 may immediately valid that received code. For example, emulator unit 26 may validate application 20 , as received from device 12 . Subsequently, emulator unit 26 may receive code for application 20 from a device other than device 12 .
- emulator unit 26 may first determine that the received code is same as the code that emulator unit 26 previously validated, and if so, may immediately validate the received code. In this manner, emulator unit 26 may not need to perform the static and/or dynamic analysis again for previously validated code.
- FIG. 2 is a flowchart illustrating an example operation of device 12 .
- Device 12 may receive application 20 that is to be executed by GPU 14 ( 40 ).
- device 12 may download application 20 from application server device 38 .
- application 20 may be preloaded on device memory 18 .
- device 12 may receive the source code, intermediate code (e.g., intermediate representation of application 20 ), or object code of application 20 .
- Device 12 may transmit the code of application 20 to validation server device 24 ( 42 ). For example, device 12 may transmit the source code, intermediate code, or object code of application 20 to validation server device 24 for validation of application 20 . In some examples, device 12 may transmit the code of application 20 to validation server device 24 once for validation. GPU 14 , of device 12 , may then execute application 20 as needed without requiring subsequent validation.
- device 12 may receive the validation from validation server device 24 ( 44 ). Alternatively, device 12 may receive an invalidation or either a validation or an invalidation.
- the validation from server device 24 may indicate that application 20 satisfies one or more performance criteria. If application 20 does not satisfy the one or more performance criteria, validation server device 24 may indicate that application 20 did not satisfy the performance criteria. For example, the validation may indicate that application 20 satisfies performance criteria associated with static analysis, dynamic analysis, or both static and dynamic analysis. In some examples, validation server device 24 may optimize or tune application 20 to make application 20 more efficient or less error-prone. In this case, the validation may indicate that the modified version of application 20 satisfies one or more performance criteria.
- processor 16 of device 12 may instruct GPU 14 of device 12 to execute application 20 based on the validation ( 48 ). For example, if validation server device 24 indicates that application 20 satisfies the performance criteria, processor 16 may instruct GPU 14 to execute application 20 . Otherwise, processor 16 may not allow GPU 14 to execute application 20 .
- device 12 may receive a modified version of application 20 ( 46 ).
- the dashed line from block 44 to block 46 , and from block 46 to block 48 is used to indicate that the functions of block 46 may not be necessary in every example.
- validation server device 24 may be able to optimize or tune application 20 , and may transmit the modified version of application 20 .
- device 12 may transmit the source code or intermediate code of application 20 , and receive a compiled version of application 20 from validation server device 24 .
- device 12 may receive a compiled version of the code as modified by validation server device 24 (e.g., modified for optimization or tuning)
- processor 16 may instruct GPU 14 to execute the modified version of application 20 ( 48 ).
- FIG. 3 is a flowchart illustrating an example operation of validation server device 24 .
- Validation server device 24 may receive application 20 , which is to be executed by GPU 14 , from device 12 ( 50 ).
- validation server device 24 may receive source code, intermediate code, or object code of application 20 from device 12 via network 22 .
- Validation server device 24 may perform at least one of static analysis and dynamic analysis on application 20 ( 52 ). For example, as part of static analysis, emulator unit 26 of validation server device 24 may compile the code of application 20 , and monitor for any errors during the compilation of application 20 . As part of the dynamic analysis, emulator unit 26 of validation server device 24 may execute a virtual model of GPU 14 or the virtual model of GPU 14 and a virtual model of device 12 . As described above, GPU models 30 and device models 34 may include a virtual model of GPU 14 and device 12 , respectively. In some examples, GPU models 30 and device models 34 may include a generic GPU model and a generic device model.
- emulator unit 26 may receive an identification of GPU 14 and/or device 12 from device 12 .
- Emulator unit 26 may identify which one of GPU models 30 corresponds to GPU 14 and which one of device models 34 corresponds to device 12 , and execute the corresponding GPU and device models. If there is no corresponding GPU and/or device models for GPU 14 and device 12 , or if emulator unit 26 did not receive an identification of GPU 14 and/or device 12 , emulator unit 26 may execute the generic GPU and device models.
- emulator unit 26 may execute application 20 and input application 20 with GPU inputs 32 for analyzing application 20 .
- application 20 may be considered as executing on the corresponding virtual model of GPU 14 , which is executing on emulator unit 26 .
- emulator unit 26 may execute application 20 , as if application 20 is executing on GPU 14 .
- Emulator unit 26 may monitor the functions performed by the corresponding virtual model of GPU 14 such as memory accesses, rate of execution, termination instance, and other functions pertinent to the functionality of GPU 14 .
- Emulator unit 26 may determine whether application 20 satisfies one or more performance criteria ( 54 ).
- the one or more performance criteria may be performance criteria associated with static analysis and performance criteria associated with dynamic analysis.
- the one or more performance criteria may be criteria that there are no errors in the compilation of application 20 , as evaluated by compiling application 20 during the static analysis.
- the one or more performance criteria may be criteria that application 20 not access out-of-bounds memory locations and not use up resources of GPU 14 such that GPU 14 is not able to perform other tasks in parallel, as evaluated by executing application 20 and providing application 20 with GPU inputs 32 during the dynamic analysis.
- Validation server device 24 may transmit a validation of application 20 to device 12 based on the determination ( 56 ). For example, validation server device 24 may transmit a validation of application 20 to device 12 if application 20 satisfies the one or more performance criteria. Otherwise, validation server device 24 may transmit an invalidation if application 20 does not satisfy the one or more performance criteria. For example, if emulator unit 26 determines that application 20 satisfies the one or more performance criteria, validation server device 24 may transmit an indication to device 12 indicating as such. Alternatively, if emulator unit 26 determines that application 20 does not satisfy the one or more performance criteria, validation server device 24 may transmit an indication to device 12 indicating as such.
- FIG. 4 is a flowchart illustrating another example operation of validation server device 24 .
- validation server device 24 may receive application 20 , which is to be executed by GPU 14 , from device 12 ( 58 ).
- emulator unit 26 may modify application 20 (e.g., the source code or intermediate code of application 20 ) to optimize or tune application 20 .
- emulator unit 26 may modify the code of application 20 so that application 20 executes more efficiently on GPU 14 .
- Validation server device 24 may then transmit modified application 20 to device 12 ( 62 ).
- validation server device 24 may transmit the source code or intermediate code of the modified application 20 .
- validation server device 24 may compile the modified code of application, and transmit the resulting object code to device 12 .
- FIG. 5 is a block diagram illustrating the example device of FIG. 1 in further detail.
- FIG. 5 illustrates device 12 of FIG. 1 in further detail.
- examples of device 12 include, but are not limited to, mobile wireless telephones, PDAs, video gaming consoles that include video displays, mobile video conferencing units, laptop computers, desktop computers, television set-top boxes, and the like.
- device 12 may include GPU 14 , processor 16 , device memory 18 , transceiver module 64 , user interface 66 , display 68 , and display processor 70 .
- GPU 14 , processor 16 , and device memory 18 may be substantially similar or identical to those illustrated in FIG. 1 .
- FIG. 5 For purposes of brevity, only the components that are shown in FIG. 5 , but not shown in FIG. 1 are described in detail.
- Device 12 may include additional modules or units not shown in FIG. 5 for purposes of clarity.
- device 12 may include a speaker and a microphone, neither of which are shown in FIG. 5 , to effectuate telephonic communications in examples where device 12 is a mobile wireless telephone, or a speaker where device 12 is a media player.
- the various modules and units shown in device 12 may not be necessary in every example of device 12 .
- user interface 66 and display 68 may be external to device 12 in examples where device 12 is a desktop computer or other device that is equipped to interface with an external user interface or display.
- Transceiver module 64 may include circuitry to allow wireless or wired communication between device 12 and another device or a network. Transceiver module 64 may include one or more modulators, demodulators, amplifiers, antennas and other such circuitry for wired or wireless communication.
- Display 68 may comprise a liquid crystal display (LCD), an organic light emitting diode display (OLED), a cathode ray tube (CRT) display, a plasma display, a polarized display, or another type of display device.
- LCD liquid crystal display
- OLED organic light emitting diode display
- CRT cathode ray tube
- GPU 14 may output the resulting graphics data to device memory 18 for temporary storage.
- Display processor 70 may retrieve the graphics data from device memory 18 , perform any post-processing on the graphics data, and output the resulting the graphics data to display 68 .
- display processor 70 may perform any further enhancements or scale the graphics data generated by GPU 14 .
- Computer-readable media may include computer data storage media.
- Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
- such computer-readable media can comprise random access memory (RAM), read-only memory (ROM), EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer.
- Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- the code may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- the techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (i.e., a chip set).
- IC integrated circuit
- a set of ICs i.e., a chip set.
- Various components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Virology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Stored Programmes (AREA)
- Debugging And Monitoring (AREA)
Abstract
The techniques described in this disclosure are directed to validating an application that is to be executed on a graphics processing unit (GPU). For example, a validation server device may receive code of the application. The validation server device may provide some level of assurance that the application satisfies one or more performance criteria. In this manner, the probability of a problematic application executing on the device that includes the GPU may be reduced.
Description
- This application is a continuation of U.S. application Ser. No. 13/406,272 filed Feb. 27, 2012, the entire content of which is incorporated herein by reference in its entirety.
- This disclosure is directed to applications that execute on a graphics processing unit (GPU), and more particularly, to validation of such applications.
- Graphics processing units (GPUs) traditionally have been limited to performing only graphics related processing in fixed-function pipelines that provide very limited functional flexibility. Newer GPUs include programmable cores that execute programs, and thereby provide greater functional flexibility as compared to the traditional GPUs. The programmable cores may execute both graphics related applications and non-graphics related applications.
- In general, this disclosure is related to techniques for identifying potentially problematic applications that are to be executed on a graphics processing unit (GPU), prior to execution. Examples of problematic applications include, but are not limited to, malicious applications, as well as inefficient or error-prone applications. For example, a server device external to the device that houses the GPU may validate the application. Validation of the application may mean that the application satisfies one or more criteria. As one example, validation may mean determining with some level of assurance that the application is not a malicious application, an error-prone application, or an inefficient application. The server device may transmit an indication, to the device, that indicates whether it is either safe or unadvisable for the GPU to execute the program. The device may then elect to execute the program on the GPU based on the received indication.
- In one example, the disclosure describes a method that includes receiving, with a server device, an application that is to be executed by a graphics processing unit (GPU) that resides on a device external to the server device. The method also include performing, with the server device, at least one of an analysis of the application prior to and during compilation of the application on the server device, and an analysis of the application during execution of the application on the server device. The method further includes determining whether the application satisfies one or more performance criteria based on at least one of the analyses, and transmitting to the device a validation of the application if the application satisfies the one or more performance criteria.
- In another example, the disclosure describes an apparatus that includes an emulator unit operable to receive an application that is to be executed by a graphics processing unit (GPU) that resides on a device external to the apparatus. The emulator unit is also operable to perform at least one of an analysis of the application prior to and during compilation of the application on the apparatus, and an analysis of the application during execution of the application on the apparatus. The emulator unit is also operable to determine whether the application satisfies one or more performance criteria based on at least one of the analyses, and transmit to the device a validation of the application if the application satisfies the one or more performance criteria.
- In another example, the disclosure describes a server device that includes means for receiving an application that is to be executed by a graphics processing unit (GPU) that resides on a device external to the server device. The server device also includes means for performing at least one of an analysis of the application prior to and during compilation of the application on the server device, and an analysis of the application during execution of the application on the server device. The server device further includes means for determining whether the application satisfies one or more performance criteria based on at least one of the analyses, and means for transmitting to the device a validation of the application if the application satisfies the one or more performance criteria.
- In another example, the disclosure describes a non-transitory computer-readable storage medium comprising instructions that cause one or more processors to receive, with a server device, an application that is to be executed by a graphics processing unit (GPU) that resides on a device external to the server device. The instructions further cause one or more processors to perform, with the server device, at least one of an analysis of the application prior to and during compilation of the application on the server device, and an analysis of the application during execution of the application on the server device. The instructions also cause the one or more processors to determine whether the application satisfies one or more performance criteria based on at least one of the analyses, and transmit to the device a validation of the application if the application satisfies the one or more performance criteria.
- In another example, the disclosure describes a method that includes receiving an application that is to be executed by a graphics processing unit (GPU) of a device, and transmitting the application to a server device external to the device for validation of the application. The method further includes receiving a validation from the server device that indicates that the application satisfies one or more criteria for execution on the GPU.
- In another example, the disclosure describes an apparatus that includes a graphics processing unit (GPU), and a device memory operable to store an application that is to be executed by the GPU. The apparatus also includes a processor operable to transmit the application to a server device external to the apparatus, and receive a validation from the server device that indicates that the application satisfies one or more criteria for execution on the GPU.
- In another example, the disclosure describes a device that includes a graphics processing unit (GPU). The device also includes means for receiving an application that is to be executed by the GPU, and means for transmitting the application to a server device external to the device for validation of the application. The device further includes means for receiving a validation from the server device that indicates that the application satisfies one or more criteria for execution on the GPU.
- In another example, the disclosure describes a non-transitory computer-readable storage medium comprising instructions that cause one or more processors to receive an application that is to be executed by a graphics processing unit (GPU) of a device, and transmit the application to a server device external to the device for validation of the application. The instructions further cause the processor to receive a validation from the server device that indicates that the application satisfies one or more criteria for execution on the GPU.
- The details of one or more aspects of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the disclosure will be apparent from the description and drawings, and from the claims.
-
FIG. 1 is a block diagram illustrating an example of a system that may be operable to implement one or more aspects of this disclosure. -
FIG. 2 is a flowchart illustrating an example operation of a device that may be operable to implement one or more aspects of this disclosure. -
FIG. 3 is a flowchart illustrating an example operation of a server that may be operable to implement one or more aspects of this disclosure. -
FIG. 4 . is a flowchart illustrating another example operation of a server that may be operable to implement one or more aspects of this disclosure. -
FIG. 5 is a block diagram illustrating an example device, illustrated inFIG. 1 , in further detail. - In general, this disclosure is related to techniques to ensure proper functionality of applications that are to be executed on a graphics processing unit (GPU). Some previous GPUs included only fixed-function hardware pipelines which did not provide programming capabilities. However, to increase functional flexibility, newer GPUs allow for programmable shader cores. For example, these GPUs execute applications such as vertex shaders and fragment shaders that perform functions that were previously delegated to components of the fixed-function hardware pipelines.
- While programmable shader cores allow for functional flexibility, they also invite misuse or suboptimal use of the GPU. For example, a malicious developer may develop an application that generates a denial of service attack or a virus. In some instances, a developer, who may not have malicious intent, may nevertheless inadvertently develop an inefficient or error-prone application. A problematic application (e.g., a malicious, inefficient or error-prone application) can substantially undermine the operation of the GPU or a device in which the GPU is provided.
- The techniques of this disclosure may assist in identifying possibly malicious, inefficient and/or error-prone GPU-executed applications, prior to execution by the GPU. For example, the techniques of this disclosure may be directed to a cloud-based solution in which a server device, external to the device that houses the GPU, and coupled to the device housing the GPU via one or more network connections, functions as an emulator for execution of an application. The server may emulate the results of the application, as if the application is executing on the GPU. Based on the results, the server may validate the application (e.g., determine whether or not the program is malicious, inefficient, or error-prone), and indicate as such to the device that houses the GPU. The GPU may then execute the application based on the received indication.
- There may be various ways in which the server may execute a validation process to validate the application. The validation process may be a software process. The software process may be executed in conjunction with general purpose processor and/or special purpose hardware. For example, the server may execute virtual model software. The virtual model causes the server to emulate the GPU or the actual device that includes GPU upon which the application will execute. In alternate examples, instead of or in addition to virtual models, the server may include a hardware emulation board to validate the application. The server may also include an application that is specifically designed to test security violations of the application that is be executed by the GPU.
- To validate the application that is to be executed by the GPU, the server may perform static analysis, dynamic analysis, or a combination thereof. Static analysis refers to analysis of the application that can be performed without execution of the application. For instance, static analysis can be performed during compilation. During the compilation, the server may identify errors in the application such as infinite loops in the program or out-of-bounds access to array locations within the application as two non-limiting examples.
- Dynamic analysis refers to analysis of the application during execution, which may additionally result in identifying problematic applications (e.g., malicious, inefficient, and error-prone applications). For example, the server may execute compiled code, and the server may provide the executed code with hypothetical input values. The hypothetical input values may be, for example, different input images, input images with different sizes, and the like.
- The server, executing a validation process, may monitor the results and the functions performed by the executed code. For example, the server may monitor memory accesses by the virtual model of the GPU, and determine whether the memory accesses are out-of-bounds memory accesses. The server may also monitor the memory addresses where the virtual model of the GPU is writing information. Based on the memory accesses of the virtual model of the GPU and memory addresses where the virtual model of the GPU is writing information, the server may be able to determine whether the application is error-prone. Such memory tracking may be particularly useful when the application reads or writes to variables using pointers.
- The server may also detect applications that generate or enable denial of service attacks. For example, the server may monitor the rate at which the virtual model of the GPU is able to execute the application. If the server detects slow responsiveness, unintended termination, or hanging, the server may determine that the application is an application designed for a denial of service attack, or a very poorly designed application. In either case, execution of such an application may negatively impact the experience of a user.
- In addition to validating the application, in some examples, the server may be able to tune and optimize the application as well. For example, the server may insert or replace the source code, or portions of the source code, or collect statistics to determine how well the compiled code works. In some examples, the server may validate the application and optimize or tune the application once. After such validation, the device may execute the application as often as the user would like without requiring further validations or optimization. Also, in some examples, after validating a certain application, the server may store an indication that indicates that this application has already been validated. If the server receives the same source code or pre-compiled object code again, the server may first ensure that the code is identical, and if so, immediately validate that application.
-
FIG. 1 is a block diagram illustrating an example of a system that may be operable to implement one or more aspects of this disclosure. For example,FIG. 1 illustratessystem 10 that includesdevice 12,network 22,validation server device 24, andapplication server device 38. Although only onedevice 12,validation server device 24, andapplication server device 38 is illustrated inFIG. 1 , in other examples,system 10 may include a plurality ofdevices 12,validation servers 24, andapplication servers 38.System 10 may be referred to as a cloud-based system to indicate that validation ofapplication 20 occurs invalidation server device 24, which is external todevice 12, as described in more detail. For example, the techniques of this disclosure may be directed to validatingapplication 20 in the cloud (e.g., invalidation server device 24, which is external to device 12). - Examples of
device 12 include, but are not limited to, video devices such as media players, set-top boxes, wireless handsets such as mobile telephones, personal digital assistants (PDAs), desktop computers, laptop computers, gaming consoles, video conferencing units, tablet computing devices, and the like. Examples ofvalidation server device 24 andapplication server device 38 include, but are not limited to, laptops, desktops, web servers, and the like. In general,validation server device 24 andapplication server device 38 may be any type of device capable of performing the functions attributed tovalidation server device 24 andapplication server device 38 in this disclosure. -
Network 22 may allowdevice 12 to securely communicate withvalidation server device 24 andapplication server device 38. For security purposes, any communication betweendevice 12 andvalidation server device 24 andapplication server device 38 may be encrypted or otherwise secured. Also, for further protection, any communication betweendevice 12 andvalidation server device 24 andapplication server device 38 may require user authorization. - In some examples,
network 22 may ensure that information transmitted by any one ofdevice 12,validation server device 24, andapplication server device 38 is received only by the intended device or devices, and no other device.Network 22 may be a local area network (LAN), a wide area network (WAN), the Internet, and the like.Device 12,validation server device 24, andapplication server device 38 may be coupled tonetwork 22 wirelessly or through a wired link. In some examples, it may be possible fordevice 12 to be coupled directly tovalidation server device 24 and/orapplication server device 38. For example,device 12 may directly communicate withvalidation server device 24 and/orapplication server device 38 through a wireless or wired connection. In these examples,network 22 may not be needed insystem 10. - As illustrated in
FIG. 1 ,device 12 may includeGPU 14,processor 16, anddevice memory 18.Device 12 may include components in addition to those illustrated inFIG. 1 . For example,FIG. 5 illustrates an example ofdevice 12 that includes more components than those illustrated inFIG. 1 . - Examples of
GPU 14 andprocessor 16 include, but are not limited, to a digital signal processor (DSP), a general purpose microprocessor, an application specific integrated circuit (ASIC), a field programmable logic array (FPGA), or other equivalent integrated or discrete logic circuitry. Furthermore, althoughGPU 14 andprocessor 16 are illustrated as separate components, aspects of this disclosure are not so limited. In alternate examples,GPU 14 andprocessor 16 may be part of a common integrated circuit. For purposes of illustration and ease of description,GPU 14 andprocessor 16 are illustrated as separate components. - Examples of
device memory 18 include, but are not limited to, a random access memory (RAM), a read only memory (ROM), or an electrically erasable programmable read-only memory (EEPROM). Examples ofdevice memory 18 may also include storage devices such as CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory. In general,device memory 18 may include mediums that can be used to store desired program code in the form of instructions or data structures and that can be accessed byGPU 14 andprocessor 16. In some examples,device memory 18 may comprise one or more computer-readable storage media, such as a computer-readable storage device. For instance, in some example implementations,device memory 18 may include instructions that causeGPU 14 andprocessor 16 to perform the functions ascribed toGPU 14 andprocessor 16 in this disclosure. -
Device memory 18 may, in some examples, be considered as a non-transitory storage medium. The term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted to mean thatdevice memory 18 is non-movable. As one example,device memory 18 may be removed fromdevice 12, and moved to another device. As another example, a storage device, substantially similar todevice memory 18, may be inserted intodevice 12. In certain examples, a non-transitory storage medium may store data that can, over time, change (e.g., in RAM). -
GPU 14 may be operable to execute one or more software applications. For example,GPU 14 may include a processor core on which one or more software applications may execute. The applications that execute onGPU 14 may be graphics applications such as vertex shaders and fragment shaders for generating graphics data. However, it may be possible for the applications that execute onGPU 14 to be unrelated to graphics processing. For example, a developer may consider it beneficial to exploit the massive parallelism ofGPU 14 and develop a software application unrelated to graphics processing that exploits the massive parallelism ofGPU 14. In these cases,GPU 14 may be referred to as a general purpose GPU (GP-GPU). - As one example,
FIG. 1 illustratesGPU 14 executingapplication 20.Application 20 may be a graphics application or a non-graphics application that executes onGPU 14.Application 20 is illustrated in a dashed box withinGPU 14 to indicate thatapplication 20 is executing onGPU 14.GPU 14 does not actually includeapplication 20. For instance,application 20 may be stored indevice memory 18, as illustrated inFIG. 1 . -
Application 20 may be developed using a wide variety of different programming application processing interfaces (APIs). For example, a developer may have developedapplication 20 using any programming API such as OpenGL, OpenCL, WebGL, and WebCL. In general, applications that are developed using the OpenGL or WebGL APIs are designed for graphics processing. Applications that are developed using the OpenCL or WebCL APIs are designed for processing unrelated to graphics processing. The OpenGL, OpenCL, WebGL, and WebCL APIs are provided for illustration purposes and should not be considered limiting. The techniques of this disclosure may be extendable to APIs in addition to the examples provided above. In general, the techniques of this disclosure may be extendable to any technique utilized by a developer to developapplication 20. - As illustrated,
device memory 18 may storeapplication 20. For example, a user ofdevice 12 may causedevice 12 to downloadapplication 20 fromapplication server device 38 vianetwork 22. In turn,device 12 may storeapplication 20 indevice memory 18. There may be other ways in whichdevice 12stores application 20 indevice memory 18. For instance, a user ofdevice 12 may insert a FLASH drive intodevice 12 that storesapplication 20, anddevice 12 may retrieveapplication 20 from the FLASH drive andstore application 20 indevice memory 18. In this example,application server device 38 may not be needed. The above examples that describe the manner in whichdevice 12stores application 20 indevice memory 18 are provided for purposes of illustration and should not be considered limiting. The techniques of this disclosure may be applicable to any technique in whichapplication 20 is loaded intodevice memory 18. -
Device memory 18 may store the source code ofapplication 20, intermediate representation ofapplication 20, or object code ofapplication 20. The source code ofapplication 20 may be the text in the programming language in whichapplication 20 was developed. The object code ofapplication 20 may be the binary bits resulting from the compilation ofapplication 20. For example,application server device 38 may compile the source code ofapplication 20, anddevice 12 may download this pre-compiled object code ofapplication 20. The intermediate representation ofapplication 20 may be intermediate to the source code and the object code. For example, in the intermediate representation ofapplication 20, the variables of the source code ofapplication 20 may be replaced with register or memory identifiers for where the variables will be stored indevice memory 18. - The capability of the programmable core or cores of
GPU 14 to execute applications, such asapplication 20, increases the functionality ofGPU 14. However, the capability ofGPU 14 to execute applications may invite misuse or suboptimal use ofGPU 14 and makedevice 12 more susceptible to malicious applications or error-prone applications. For example, applications that execute solely on a central processing unit (CPU), such asprocessor 16, execute applications in a virtual machine setting which allocates the amount of memory ofdevice memory 18 and storage locations withindevice memory 18 that are accessible to the applications. Because the applications are confined to the virtual machine ofprocessor 16, the applications are unable to access out-of-bounds memory addresses and are limited to accessing memory addresses specifically provided to it by the virtual machine ofprocessor 16. In this way, it may be difficult for applications executing onprocessor 16 to drastically impactprocessor 16, anddevice 12, in turn, in a negative manner. - In some instances, it may not be practical to implement virtual machines on
GPU 14. For example, the massive parallel processing capabilities ofGPU 14 may not be well suited for executing virtual machines. For instance, if virtual machines were to execute onGPU 14, the virtual machines would dominate the resources ofGPU 14, possibly restricting other applications from being executed onGPU 14. Accordingly, in some instances, virtual machines may not be able to limit the negative impacts of malicious or error-prone applications that execute onGPU 14. - Applications that execute on
GPU 14, such asapplication 20, may be considered as applications that execute “natively” (i.e., are not confined to a virtual machine). Native execution ofapplication 20 may allow forapplication 20 to access larger portions ofdevice memory 18. Such access may allow problematic application such as malicious applications or poorly designed (e.g., error-prone) applications to negatively impact the performance capabilities ofGPU 14 anddevice 12. - As one example, the developer of
application 20 may developapplication 20 such thatapplication 20, when executed, provokes a denial of service attack ondevice 12, or propagates a virus that impacts the performance ofdevice 12. For example, whenGPU 14 executesapplication 20,application 20 may controlGPU 14 such thatGPU 14 may not be able to perform any other tasks such as rendering graphics content for a user interface. This may causedevice 12 to “hang,” which may drastically impact the functionality ofdevice 12. In some cases, the developer ofapplication 20 may developapplication 20 to access portions ofdevice memory 18 that it should be limited from accessing.Application 20 may store instructions for a virus in these portions ofdevice memory 18. Then, whenprocessor 16 orGPU 14 accesses these portions ofdevice memory 18,processor 16 orGPU 14 may accidentally execute the stored virus. There may be additional examples of malicious applications, and aspects of this disclosure should not be considered limited to denial of service attacks or viruses. - As another example, the developer of
application 20 may inadvertently developapplication 20 such thatapplication 20 is inefficient or error-prone. For instance, an error-prone application may include infinite loops, out-of-bounds access to an array, or out-of-bounds access to memory locations ofdevice memory 18. An inefficient application may not properly utilize the functionality ofGPU 14. For example, an inefficient application may not properly use the programmable functionality ofGPU 14. - In some cases,
application server device 38 may potentially provide a modicum of protection from malicious and error-prone applications. For example, the owner ofapplication server device 38 may guarantee that none of the applications stored onapplication server device 38 are malicious or error-prone applications. However, this may not be the case in every instance (e.g., the owner ofapplication server device 38 may not provide a guarantee of safe and proper operation), or the purported “guarantee” from the owner ofapplication server device 38 may not be trustworthy. - The techniques of this disclosure may assist in identifying whether applications that are to be executed on GPU 14 (e.g., application 20) are problematic applications such as malicious applications, as well as inefficient and error-prone applications, prior to execution. For example, the techniques of this disclosure may validate
application 20 prior toGPU 14 executingapplication 20. Validation ofapplication 20 may mean that theapplication 20 satisfies one or more performance criteria. For example, validation may mean determining with some level of assurance thatapplication 20 is not a malicious application, an inefficient application, or an error-prone application. The example techniques described in this disclosure may transmit an indication todevice 12 that indicates whether it is safe or inadvisable forGPU 14 to executeapplication 20.Processor 16 may then elect to instructGPU 14 to executeapplication 20 based on the received indication. - For example,
processor 16 may instructGPU 14 to executeapplication 20 if the indication is favorable, i.e., indicates that the program is not malicious, not inefficient, and/or not error-prone. In some examples,processor 16 may instructGPU 14 to executeapplication 20 even if the indication is unfavorable. For example, ifapplication 20 is not malicious or error-prone, but inefficient,processor 16 may instructGPU 14 to executeapplication 20 as such execution may potentially not harmGPU 14 ordevice 12, but may not execute as efficiently as possible. - In some examples, the techniques of this disclosure may also tune, or otherwise optimize, an inefficient application that is to be executed on
GPU 14. For example, the developer ofapplication 20 may not have any malicious intent, and may have developedapplication 20 such thatapplication 20 is not prone to errors. Nevertheless, it may be possible thatapplication 20 may not efficiently utilize the resources ofGPU 14. - As one example, one of the functions of
application 20 may be to divide a task into workgroups and perform parallel processing on the workgroups to exploit the parallelism ofGPU 14. For example,application 20 may divide an image into blocks and perform parallel processing on the blocks. The size of each of blocks may be based on the amount of local memory available onGPU 14. - Because the developer of
application 20 may want to designapplication 20 to execute on a variety of different GPUs, the developer may not know ahead of time how much local memory is available on a particular GPU, such asGPU 14, as different GPUs may include different amounts of local memory. To address this, the developer may developapplication 20 to utilize variable sized blocks. In some instances, utilizing variable sized blocks may be less efficient than utilizing fixed sized blocks. The techniques of this disclosure may tune or optimizeapplication 20 such thatapplication 20 utilizes fixed sized blocks based on the amount of available memory inGPU 14. - As another example,
application 20 may perform matrix operations. The developer ofapplication 20 may have developedapplication 20 to perform row-based matrix operations or column-based matrix operation. In some instances,GPU 14 may be better suited to perform row-based matrix operations, as compared to column-based matrix operations, or vice-versa. In this example, the techniques of this disclosure may modifyapplication 20 to perform row-based matrix operations, ifapplication 20 uses column-based matrix operations, to more efficiently utilizeGPU 14. - As yet another example, the developer may have developed
application 20 for older versions of GPUs, andapplication 20 may not be optimized forGPU 14. The techniques of this disclosure may modifyapplication 20 so thatapplication 20 is more optimized for newer GPUs, such asGPU 14.GPU 14 may then executeapplication 20, which is optimized to execute on newer GPUs. - In accordance with techniques of this disclosure,
validation server device 24 may validateapplication 20, and in some examples, optimize ortune application 20. To validateapplication 20,validation server device 24 may implement a validation process that determines whetherapplication 20 satisfies one or more performance criteria. For example,validation server device 24 may determine, with some reasonable level of assurance, whetherapplication 20 is a malicious application, an error-prone application, or an inefficient application. In examples whereapplication 20 is an error-prone application or an inefficient application,validation server device 24 may attempt to correct the errors inapplication 20, or optimizeapplication 20 to be more efficient. - It may be generally difficult to absolutely guarantee that
application 20 is not a problematic application because it may be difficult to test all of the various ways in whichapplication 20 may affectGPU 14 anddevice 12. Although an absolute guarantee thatapplication 20 is not a problematic application may be difficult,validation server device 24 may employ different types of analysis to ensure with some reasonable amount of certainty thatapplication 20 is not a problematic application. - As illustrated in
FIG. 1 ,validation server device 24 is external todevice 12. Accordingly, the validation ofapplication 20 and optimization ofapplication 20 may be offloaded fromdevice 12, which may be referred to as validatingapplication 20 in the “cloud” becausevalidation server device 24 is a server that is external todevice 12. By offloading the validation ofapplication 20 tovalidation server device 24, the probability ofapplication 20 negatively impactingGPU 14 anddevice 12 may be reduced, in cases whereapplication 20 is a malicious application or an error-prone application. Also, by offloading the optimization ofapplication 20 tovalidation server device 24, power savings and processing efficiency may be realized becauseprocessor 16 does not need to consume power and clock cycles validating or optimizingapplication 20. - There may be various examples of performance criteria that
application 20 may need to satisfy forvalidation server device 24 to validateapplication 20. In general, the performance criteria can be part of static analysis, dynamic analysis, or a combination thereof. Static analysis refers to analysis ofapplication 20 that can be performed without execution ofapplication 20 to ensure thatapplication 20 satisfies one or more performance criteria associated with static analysis. Dynamic analysis refers to analysis ofapplication 20 during execution to ensure thatapplication 20 satisfies one or more performance criteria associated with dynamic analysis. -
Validation server device 24 may be operable to perform static analysis, dynamic analysis, or both static analysis and dynamic analysis. For purposes of illustration,validation server device 24 is described as being operable to perform both static analysis and dynamic analysis, and therefore, operable to ensure thatapplication 20 satisfies the performance criteria associated with both static analysis and dynamic analysis. In alternate examples,validation server device 24 may be operable to perform one of static analysis or dynamic analysis, and in these alternate examples,validation server device 24 may be operable to ensure thatapplication 20 satisfies the performance criteria associated with the type of analysis thatvalidation server device 24 is operable to perform (e.g., performance criteria associated with static analysis or dynamic analysis). - As illustrated in
FIG. 1 ,validation server device 24 includesemulator unit 26 andserver memory 28.Server memory 28 may include data and/or instructions defining one ormore GPU models 30, one ormore GPU inputs 32, and one ormore device models 34.Emulator unit 26 may be a processing unit that is operable to execute one or more ofGPU models 30 anddevice models 34. As another example,emulator unit 26 may be a hardware emulation board, which may be a GPU. In some examples,emulator unit 26 may include two portions, which may be part of the same circuitry or separate, distinct circuits, where the first portion is a processing unit that is operable to execute one or more ofGPU models 30 anddevice models 34, and the second portion that is the hardware emulation board (e.g., a GPU). Examples ofemulator unit 26 include, but are not limited to, a DSP, a general purpose microprocessor, an ASIC, a FPGA, or other equivalent integrated or discrete logic circuitry. -
Server memory 28 may be similar todevice memory 18. For instance,server memory 18 may be any medium that can be used to store desired program code in the form of instructions, data, and/or data structures and that can be accessed byemulator unit 26 and thatcause emulator unit 26 to perform one or more the functions ascribed toemulator unit 26. Similar todevice memory 18,server memory 28 may, in some examples, be considered as a non-transitory storage medium, as described above with respect todevice memory 18. - As illustrated,
server memory 28 may store data and/or instructions defining one ormore GPU models 30,GPU inputs 32, anddevice models 34. It may not be necessary forserver memory 28 to store one ormore GPU models 30,GPU inputs 32, anddevice models 34 in every example. For example,server memory 28 may storeGPU models 30 andGPU inputs 32, but may not storedevice models 34. Ifvalidation server device 24 is operable to perform only static analysis,GPU models 30,GPU inputs 32, anddevice models 34 may not be needed. In some examples, it is with theGPU models 30,GPU inputs 32, anddevice models 34 that emulatorunit 26 performs dynamic analysis. - Each of the one or
more GPU models 30 may correspond to a particular GPU type, and each of the one ormore device models 34 may correspond to a particular device type. For instance, each one of theGPU models 30 may model the configuration of its corresponding GPU type in terms of parallel processing capabilities, local memory availability, and any other pertinent characteristic that defines the functionality of GPUs of that GPU type. Each one of thedevice models 34 may model the configuration of its corresponding device type in terms of memory configuration, processor speed, system bus speed, device memory, and any other pertinent characteristics that defines the functionality of devices of that device type. For examples, different vendors provide different types of devices with different functional characteristics, anddevice models 34 may be models for each of these different device types. - The one or
more GPU models 30 anddevice models 34 may each be considered as virtual model software that emulatorunit 26 can execute. For example, when emulatorunit 26 executes one of theGPU models 30,emulator unit 26 emulates the GPU to which the executedGPU model 30 corresponds. When emulatorunit 26 executes one of theGPU models 30 and one of thedevice models 34,emulator unit 26 emulates the device to which the executeddevice model 34 corresponds, as if such a device included the GPU to which the executedGPU model 30 corresponds. In some examples, the GPU vendors and the device vendors may supplyGPU models 30 anddevice models 34, respectively. There may be other ways in whichserver memory 28stores GPU models 30 anddevice models 34, and aspects of this disclosure are not limited to the specific examples where vendors provideGPU models 30 anddevice models 34. - For example, when emulator
unit 26 executes one ofGPU models 30,emulator unit 26 may function as if the parallel processing capabilities and local memory availability of emulator unit 26 (as two examples) are functionally equivalent to the GPU type associated with executed one ofGPU models 30. Similarly, when emulatorunit 26 executes one ofdevice models 34,emulator unit 26 may function as if the memory configuration, processor speed, system bus speed, and device memory of emulator unit 26 (as four examples) are functionally equivalent to the device type associated with executed one ofdevice models 34. In other words, the execution of one ofGPU models 30causes emulator unit 26 to function as the GPU associated with the executed one ofGPU models 30. The execution of one ofGPU models 30 and one ofdevice models 34causes emulator unit 26 to function as a device associate with the executed one ofdevice models 34 that includes the GPU associated with the executed one ofGPU models 30. - One of the plurality of
GPU models 30 may be ageneric GPU model 30, and one of the plurality ofdevice models 34 may begeneric device model 34. In some examples,server memory 28 may store a generic GPU model and a generic device model instead of a plurality of GPU models and device models. The generic GPU model and device model may not correspond to a particular GPU or device type, but may be suitable for static and dynamic analysis. In some examples, ifserver memory 28 does not store a GPU model that corresponds toGPU 14, then the generic GPU model may be suitable for validation purposes. The generic GPU model and the generic device model may conform to a base profile of operation common to most GPUs or devices. - There may be various types of GPUs and devices that may be modeled by the generic GPU and generic device models. As one example, the generic GPU model may model a GPU with average parallel processing capabilities and local memory availability as compared to other GPUs. The generic device model may model a device with average memory configuration, processor speed, system bus speed, and device memory as compared to other devices.
- As an illustrative example for validating and/or optimize
application 20 for execution onGPU 14,device 12 may downloadapplication 20 fromapplication server device 38.Application 20 may be source code, an intermediate representation, or pre-compiled object code, as described above.Processor 16 may then installapplication 20 ondevice 12. Ifapplication 20 is in source code or in the intermediate representation, e.g., not pre-compiled object code, part of the installation may beprocessor 16 executing a compiler to compile the code ofapplication 20. - In some examples, where the downloaded code of
application 20 is source code or the intermediate representation, prior to compiling,processor 16 may causedevice 12 to transmit the downloaded code ofapplication 20 tovalidation server device 24 for validation. In some examples, where the downloaded code ofapplication 20 is pre-compiled object code,processor 16 may causedevice 12 to transmit the pre-compiled object code tovalidation server device 24 for validation before allowingGPU 14 to executeapplication 20. - For security purposes,
processor 16 may encrypt or otherwise make secure the downloaded code ofapplication 20 thatdevice 12 transmits tovalidation server device 24. In some examples,processor 16 may require authorization from a user prior to transmitting the downloaded code ofapplication 20 tovalidation server device 24. Furthermore, in some examples of dynamic analysis,processor 16 may causedevice 12 to transmit the GPU type ofGPU 14 or both the GPU type ofGPU 14 and the device type ofdevice 12 tovalidation server device 24. In some of these instances,processor 16 may require authorization from the user prior to transmitting the GPU type ofGPU 14 or the GPU type ofGPU 14 and device type ofdevice 12 tovalidation server device 24. -
Emulator unit 26 may be operable to perform static analysis onapplication 20 to determine whetherapplication 20 satisfies the performance criteria associated with static analysis. For example,emulator unit 26 may analyzeapplication 20 without executingapplication 20. As one example,emulator unit 26 may parse through the downloaded code ofapplication 20 to identify code known to be code for a virus. For instance,server memory 28 may store code of known viruses, andemulator unit 26 may compare the downloaded code ofapplication 20 to the code of the known viruses. Determining that the downloaded code ofapplication 20 does not include code of known viruses may be one example of performance criteria that needs to be satisfied to validateapplication 20. - As part of the static analysis,
emulator unit 26 may compile the downloaded code ofapplication 20, in examples where the downloaded code ofapplication 20 is the source code or intermediate representation ofapplication 20, to identify errors inapplication 20 during compilation. For example,emulator unit 26 may executecompiler 36, as indicated by dashed lines withinemulator unit 26. The compilation ofapplication 20, withcompiler 36, may identify any infinite loops inapplication 20 or out-of-bounds access to memory array locations withinapplication 20. In this example, determining that there are not errors inapplication 20, that can be found during compilation, may be another example of performance criteria that needs to be satisfied to validateapplication 20. - Static analysis may be limited in the types of errors, inefficiencies, and malicious code that can be found. For example, if the downloaded code of
application 20 is pre-compiled object code, it may not be possible foremulator unit 26 to identify errors inapplication 20 during compilation because the code forapplication 20 is already pre-compiled object code. As another example, ifapplication 20 relies on pointers for storage, it may not be possible to determine if there are any out-of-bounds memory access errors inapplication 20 based simply on compilingapplication 20. - To further determine whether
application 20 is problematic (e.g., inefficient, error-prone, or malicious),emulator unit 26 may perform dynamic analysis. As indicated above, dynamic analysis refers to analysis ofapplication 20 during execution. In some examples, to perform dynamicanalysis emulator unit 26 may cause itself to appear as if it isGPU 14. For example, in some instances, in addition to transmitting the downloaded code ofapplication 20,processor 16 may causedevice 12 to transmit the GPU type ofGPU 14 toemulator unit 26 ofvalidation server device 24, or both the GPU type ofGPU 14 and the device type ofdevice 12 toemulator unit 26 ofvalidation server device 24 vianetwork 22.Emulator unit 26, in turn, may identify which one ofGPU models 30 corresponds to the GPU type ofGPU 14, and may execute that one ofGPU models 30 to emulateGPU 14 onvalidation server device 24. In examples whereemulator unit 26 also receives the device type,emulator unit 26 may identify which one ofdevice models 34 corresponds to the device type ofdevice 12, and may execute that one ofdevice models 34 to emulatedevice 12 onvalidation server device 24. - In examples where
device 12 does not transmit the GPU type ofGPU 14 and/or the device type ofdevice 12,emulator unit 26 may execute the generic GPU model and/or the generic device model. Alternatively, ifdevice 12 does transmit the GPU type ofGPU 14 and/or the device type ofdevice 12, but none ofGPU models 30 anddevice models 34 correspond to the GPU and device type,emulator unit 26 may execute the generic GPU model and/or generic device model. In examples whereemulator unit 26 is or includes a hardware emulation board, such a hardware emulation board may be designed to function, at least in part, as a generic GPU on a generic device. - Once
emulator unit 26 emulates itself to beGPU 14, or to beGPU 14 as part ofdevice 12,emulator unit 26 may executeapplication 20. For example, ifemulator unit 26 received the source code or intermediate code ofapplication 20,emulator unit 26 may compile the source code viacompiler 36, and execute the resulting object code. Ifemulator unit 26 received pre-compiled object code ofapplication 20,emulator unit 26 may execute the pre-compiled object code ofapplication 20. - The techniques of this disclosure may be considered, in some examples, as being performed at least in part by
emulator unit 26 executing a virtual model based on the type of GPU 14 (e.g., one of GPU models 30). Then, when emulatorunit 26 executesapplication 20,application 20 can be considered as executing in the virtual model (e.g., the one ofGPU models 30 that is executing on emulator unit 26). For example, both the GPU model, ofGPU models 30, that corresponds toGPU 14 andapplication 20 are executing onemulator unit 26. In the techniques of this disclosure, becauseemulator unit 26 functions as if it isGPU 14, due to the execution of the GPU model that corresponds toGPU 14, when emulatorunit 26 executesapplication 20,application 20 may execute on the GPU model that corresponds toGPU 14. - As part of the dynamic analysis,
emulator unit 26 may receive hypothetical input values forapplication 20 that is executing onemulator unit 26. As illustrated,server memory 28 may store one ormore GPU inputs 32. These one ormore GPU inputs 32 may be values for different graphical images or objects. In some examples, each of these different images may be of different sizes. In examples whereapplication 20 is not related to graphics processing,GPU inputs 32 may be non-graphics inputs. It may be difficult to ensure thatemulator unit 26 tests every permutation and combination of possible input values. Accordingly,server memory 28 may store a sufficient number and/or range ofGPU inputs 32, e.g., as samples or test inputs, to provide some reasonable level of assurance thatapplication 20 is not a malicious or highly error-prone application (e.g., a problematic application). TheGPU inputs 32 may include different types of images or objects to be processed and rendered byGPU 14. - During execution of
application 20,emulator unit 26 may input the values ofGPU inputs 32 and may analyze functionality of the executed GPU model ofGPU models 30. In examples, whereemulator unit 26 is a hardware emulation board,emulator unit 26 may analyze the functionality of the hardware emulation board. For example,emulator unit 26 may monitor memory accesses by the executed GPU model ofGPU models 30. In this example,emulator unit 26 may determine whether any of the memory accesses by the executed GPU model ofGPU models 30 are out-of-bounds memory accesses ofserver memory 28. As another example,emulator unit 26 may monitor the memory addresses where the execute GPU model ofGPU models 30 is writing information inserver memory 28. Based on the memory accesses of the GPU model and the memory addresses where the GPU model is writing information,emulator unit 26 may be able to determine whetherapplication 20 is error-prone. Such memory tracking may be particularly useful whenapplication 20 reads or writes to variables using pointers. - For example, if the executed GPU model writes information to or reads information from out-of-bounds memory locations,
emulator unit 26 may determine thatapplication 20 is error-prone, and possibly malicious. For example, if the executed GPU model writes information to or reads information from a non-existent memory location, emulatorunit 26 may determine thatapplication 20 is error-prone. If the executed GPU model writes information to a memory location that is not reserved for the GPU model,emulator unit 26 may determine thatapplication 20 is error-prone or possibly malicious. For example,emulator unit 26 may determine thatapplication 20 is attempting to load a virus into the memory locations whichapplication 20 should not be able to access. - The limitations of where
application 20 can write information to or read information from (e.g., access) during execution may be an example of performance criteria associated with dynamic analysis. For example, the performance criteria may be a limitation of the memory locations thatapplication 20 is allowed to access. If the GPU model ofGPU models 30 accesses memory location outside of the limited memory locations, due to the execution ofapplication 20,application 20 may be in violation of the performance criteria. For example, there may be threshold number of access outside the limited memory locations that is allowable, in accordance with the performance criteria. The threshold number may be zero to provide a highest level of assurance thatapplication 20 is not attempting to access memory locations outside of the limited memory locations. - In examples where
emulator unit 26 also executes one ofdevice models 34,emulator unit 26 may similarly analyze functionality of the executed device model ofdevice models 34. For example,emulator unit 26 may monitor the functions performed by the executed one ofdevice models 34 whileemulator unit 26 executes one ofGPU models 30. For example, the execution of one ofdevice models 34 may result inemulator unit 26device 12 which includes a system bus.Emulator unit 26 may determine whether the execution ofapplication 20 causes the system bus to overload resulting indevice 12 slowing down. - The monitoring of the system bus to determine whether the system bus is being overloaded may be an example of performance criteria associated with dynamic analysis. For example, if the execution of
application 20 causes the system bus to overload,application 20 may be in violation of the performance criteria. In this example, the performance criteria may allow for some level of overloading the system bus, as it may not be possible to not allow any overloading of the system bus. For example, the perform criteria may establish a percentage amount threshold of system bus overload. If the system bus overload is below the allowable percentage, the performance criteria is satisfied. Otherwise, the performance criteria is not satisfied. -
Emulator unit 26 may similarly detect malicious applications such as denial of service attacks. For example,emulator unit 26 may monitor the rate at which the GPU model ofGPU models 30 is able to executeapplication 20. Ifemulator unit 26 detects slow responsiveness, unintended termination, or hanging,emulator unit 26 may determineapplication 20 is an application designed for a denial of service attack, or a very poorly designed application. In this example, the performance criteria may be a threshold execution time or execution rate for a particular task ofapplication 20. Ifapplication 20 takes longer than the threshold execution time to complete a particular task or executes the task at a rate less than the threshold execution rate,application 20 may be in violation of the performance criteria. - As another example of
emulator unit 26 detecting malicious applications or error-prone applications,emulator unit 26 may monitor instructions issued byapplication 20. For instance, in some examples, instructions issued byapplication 20 may be 96-bit words. However, not all combinations of 96 bits represents a valid instruction. In some examples,GPU 14 may be designed to ignore invalid instructions; however, this may not be case for every example ofGPU 14. To avoidGPU 14 from inadvertently executing an invalid instruction,emulator unit 26 may determine whether the instructions issued byapplication 20 during execution are valid or invalid instructions. Ifemulator unit 26 determines thatapplication 20 is issuing invalid instructions,emulator unit 26 may determine thatapplication 20 is a malicious application, an error-prone application, or an inefficient application. - As another example, during execution,
application 20 may write data to and read data from registers. A malicious application, error-prone application, or inefficient application may read data from unwritten registers. Ifapplication 20 attempts to read data from a register that was not previously written to, the data read byapplication 20 may be meaningless data (i e , uninitialized data). Such reading of uninitialized data may result in unpredictable behavior. In some examples,emulator unit 26 may monitor which registersapplication 20 writes to during execution, and may determine whetherapplication 20 is reading from a register that has not previously been written to. Ifemulator unit 26 determines thatapplication 20 is reading from unwritten registers,emulator unit 26 may determine thatapplication 20 is a malicious application, error-prone application, or an inefficient application. - If
emulator unit 26 determines that the performance criteria associated with static analysis and dynamic analysis are met,validation server device 24 may transmit an indication todevice 12 indicating thatapplication 20, with some level of assurance, satisfies one or more performance criteria associated with static analysis, dynamic analysis, or both static and dynamic analysis (e.g., validates application 20). In this case,validation server device 24 may provide an indication thatapplication 20 is validated for use byGPU 14. Otherwise, in some examples,validation server device 24 may transmit an indication todevice 12 indicating thatapplication 20 is invalidated for use byGPU 14, such that it is inadvisable forGPU 14 to executeapplication 20. In response,processor 16 may instructGPU 14 to executeapplication 20 based on the received indication. - In examples where
validation server device 24 received source code or intermediate code ofapplication 20,emulator unit 26 may also transmit the compiled object code ofapplication 20, as compiled bycompiler 36. In this way, the compilation ofapplication 20 may also be offloaded fromdevice 12 and offloaded to an external device, such asvalidation server device 24. -
Validation server device 24 may also be tasked with optimizing or tuningapplication 20. For example,emulator unit 26 may receive the source code or intermediate code ofapplication 20. As part of the static and/or dynamic analysis,emulator unit 26 may determine thatapplication 20 is somewhat error-prone or would inefficiently utilize the capabilities ofGPU 14. In these examples, rather than transmitting an indication todevice 12 indicating that it is inadvisable forGPU 14 to executeapplication 20,emulator unit 26 may attempt to correct the errors ofapplication 20 or attempt to tuneapplication 20 forGPU 14 when it is determined thatapplication 20 may execute inefficiently or with errors onGPU 14. - If
emulator unit 26 is able to correct the errors or makeapplication 20 more efficient,emulator unit 26 may compile the modified code ofapplication 20 to generate object code thatGPU 14 should execute.Emulator unit 26 may then transmit the resulting object code todevice 12 with an indication thatGPU 14 should execute the resulting object code. In this case,GPU 14 may execute the object code generated from the modified code, rather than the object code generated from the original code ofapplication 20. Alternatively,emulator unit 26 may transmit the modified code ofapplication 20 without compilation. - In either of these examples, the validation of
application 20 may be considered as being part of the transmission of the modified code of application 20 (e.g., the transmission of the modified code or the resulting object code). For example, whendevice 12 receives modified code ofapplication 20 fromvalidation server device 24,device 12 may automatically determine that the modified code ofapplication 20 is suitable for execution becausedevice 12 received the modified code ofapplication 20 fromvalidation server device 24. In this sense, the validation thatdevice 12 receives fromvalidation server device 24 may be an explicit validation or an implicit validation. In either case, i.e., explicit or implicit validation,emulator unit 26 may determine with some level of assurance thatapplication 20 or the modified version ofapplication 20 satisfies one or more performance criteria. - If
emulator unit 26 is unable to correct the errors ofapplication 20,emulator unit 26 may transmit the indication indicating that it is inadvisable to executeapplication 20 onGPU 14. Ifemulator unit 26 is unable to makeapplication 20 more efficient,emulator unit 26 may still transmit an indication todevice 12 indicating that it may be suitable forGPU 14 to executeapplication 20 because whileapplication 20 may not be completely efficient,application 20 may not be error-prone or malicious. - To tune or optimize
application 20,emulator unit 26 may insert code (e.g., source code or intermediate code), replace code, or modify code ofapplication 20 in some other manner. In some examples,emulator unit 26 may collect statistics to determine how well the compiled code ofapplication 20 works. For example,application 20 may utilize array indices for storing variable values in an array.Emulator unit 26 may add code into the source code ofapplication 20 that checks that array indices, utilized byapplication 20, are within the range.Emulator unit 26 may add code into the source code ofapplication 20 that causesapplication 20 to abort when an array index is not within range.Emulator unit 26 then may compile the modified source code to produce object code for execution ofapplication 20 byGPU 14. - Optimization or tuning may be based on the assumption that applications, such as
application 20, are generally developed to exploit the high level of parallelism ofGPU 14. If the developer did not intend to exploit the parallelism ofGPU 14, the developer would have developedapplication 20 to not execute onGPU 14, and rather execute onprocessor 16. - For example, the developer of
application 20 may have developedapplication 20 to perform image processing on blocks of images in parallel. As described above, the size of the blocks of the images may be based on the amount of available local memory onGPU 14. Because the developer may not know how much memory is available onGPU 14, the developer may developapplication 20 to use variable-sized blocks, instead of the more efficient fixed sized blocks. For example, fixed-size blocks may be more efficient because the size of the blocks does not change during execution. - In some examples,
emulator unit 26 may determine the optimal size for the blocks because the GPU model ofGPU models 30 that corresponds toGPU 14 may include information that indicates the size of the local memory ofGPU 14. In this example,emulator unit 26 may select the optimal size for the blocks based on the amount of available local memory onGPU 14, the amount of data that will be needed to write to or read from the local memory ofGPU 14, and other such information which may not be available to developer ofapplication 20. In aspects of this disclosure,emulator unit 26 would know how much local memory is available and how much data needs to be written or read from local memory becauseemulator unit 26 may executeapplication 20 on the GPU model ofGPU models 30 that correspond toGPU 14. - In these examples,
emulator unit 26 may update or otherwise modify the source code or intermediate code ofapplication 20 to fix block size to the optimally determined size. In other words,emulator unit 26 may determine the optimal size of the blocks to best utilize the parallelism ofGPU 14.Emulator unit 26 may then compile this modified code ofapplication 20, and transmit the resulting object code todevice 12 for execution onGPU 14. In this way, whenGPU 14 executes the modifiedapplication 20, the modifiedapplication 20 may execute more efficiently onGPU 14, as compared to theoriginal application 20. - In another example for optimization, as described above,
application 20 may perform matrix operations. In this example,emulator unit 26 may determine whether column-based matrix operations or row-based matrix operations are handled easier byGPU 14. For instance,emulator unit 26 may cause the GPU model ofGPU models 30 that corresponds toGPU 14 to executeapplication 20 using row-based matrix operations and using column-based matrix operations.Emulator unit 26 may compare the efficiency of the column-based and row-based matrix operations (e.g., number of accesses to memory, amount of processing time, and other such efficiency measures). Based on the measured efficiency,emulator unit 26 may modify the code ofapplication 20. For example, if column-based operations are more efficiently executed than row-based operations,emulator unit 26 may modify the code ofapplication 20 so that the matrix operations are performed as column-based operations. Similarly, if row-based operations are more efficiently executed than column-based operations,emulator unit 26 may modify the code ofapplication 20 so that the matrix operations are performed as row-based operations. - In another example for optimization, as described above, the developer of
application 20 may have developedapplication 20 to be executed on older versions of GPU. In this case,application 20 may properly execute on a GPU such asGPU 14; however,application 20 may not fully exploit the functionality ofGPU 14. For example,application 20 may unnecessarily limit the amount of graphics or non-graphics data thatGPU 14 should process in parallel because older versions of GPUs may be limited in processing capabilities. In this example,emulator unit 26 may modify the code ofapplication 20 such that, whenapplication 20 is executed,application 20 causesGPU 14 to process more data in parallel. There may be other examples of ways in whichemulator unit 26 may modifyapplication 20 such thatapplication 20 is better suited for execution on newer GPUs, and aspects of this disclosure should not be considered limited to the above examples. - After optimizing
application 20,emulator unit 26 may transmit the modified or updated code ofapplication 20 todevice 12. In this example,processor 16 may compile the code ofapplication 20, as received fromemulator unit 26, and instructGPU 14 to execute the resulting object code. In some other examples,emulator unit 26 may compile the modifiedapplication 20, viacompiler 36, and transmit the resulting object code todevice 12. In this example,processor 16 may instructGPU 14 to execute the received object code forapplication 20. - In some examples,
emulator unit 26 may validateapplication 20 and optimize ortune application 20 once. After such validation,GPU 14 may executeapplication 20 as needed without requiring further validation or optimization. Also, in some examples, afteremulator unit 26 validatesapplication 20,emulator unit 26 may store an indication inserver memory 28 that indicates that thisapplication 20 has already been validated. In these examples, when emulatorunit 26 receives code for validation,emulator unit 26 may first determine whetheremulator unit 26 previously validated the code based on the indication stored inserver memory 28. Ifemulator unit 26 previously validated the code,emulator unit 26 may immediately valid that received code. For example,emulator unit 26 may validateapplication 20, as received fromdevice 12. Subsequently,emulator unit 26 may receive code forapplication 20 from a device other thandevice 12. In this case,emulator unit 26 may first determine that the received code is same as the code that emulatorunit 26 previously validated, and if so, may immediately validate the received code. In this manner,emulator unit 26 may not need to perform the static and/or dynamic analysis again for previously validated code. -
FIG. 2 is a flowchart illustrating an example operation ofdevice 12. For purposes of illustration only, reference is made toFIG. 1 .Device 12 may receiveapplication 20 that is to be executed by GPU 14 (40). For example,device 12 may downloadapplication 20 fromapplication server device 38. As another example,application 20 may be preloaded ondevice memory 18. As described above,device 12 may receive the source code, intermediate code (e.g., intermediate representation of application 20), or object code ofapplication 20. -
Device 12 may transmit the code ofapplication 20 to validation server device 24 (42). For example,device 12 may transmit the source code, intermediate code, or object code ofapplication 20 tovalidation server device 24 for validation ofapplication 20. In some examples,device 12 may transmit the code ofapplication 20 tovalidation server device 24 once for validation.GPU 14, ofdevice 12, may then executeapplication 20 as needed without requiring subsequent validation. - In response to transmitting the code of
application 20 tovalidation server device 24 for validation,device 12 may receive the validation from validation server device 24 (44). Alternatively,device 12 may receive an invalidation or either a validation or an invalidation. The validation fromserver device 24 may indicate thatapplication 20 satisfies one or more performance criteria. Ifapplication 20 does not satisfy the one or more performance criteria,validation server device 24 may indicate thatapplication 20 did not satisfy the performance criteria. For example, the validation may indicate thatapplication 20 satisfies performance criteria associated with static analysis, dynamic analysis, or both static and dynamic analysis. In some examples,validation server device 24 may optimize ortune application 20 to makeapplication 20 more efficient or less error-prone. In this case, the validation may indicate that the modified version ofapplication 20 satisfies one or more performance criteria. - In some examples,
processor 16 ofdevice 12 may instructGPU 14 ofdevice 12 to executeapplication 20 based on the validation (48). For example, ifvalidation server device 24 indicates thatapplication 20 satisfies the performance criteria,processor 16 may instructGPU 14 to executeapplication 20. Otherwise,processor 16 may not allowGPU 14 to executeapplication 20. - In some alternate examples, prior to execution,
device 12 may receive a modified version of application 20 (46). InFIG. 2 , the dashed line fromblock 44 to block 46, and fromblock 46 to block 48 is used to indicate that the functions ofblock 46 may not be necessary in every example. For instance,validation server device 24 may be able to optimize ortune application 20, and may transmit the modified version ofapplication 20. As another example,device 12 may transmit the source code or intermediate code ofapplication 20, and receive a compiled version ofapplication 20 fromvalidation server device 24. As yet another example,device 12 may receive a compiled version of the code as modified by validation server device 24 (e.g., modified for optimization or tuning) In these examples,processor 16 may instructGPU 14 to execute the modified version of application 20 (48). -
FIG. 3 is a flowchart illustrating an example operation ofvalidation server device 24. For purposes of illustration only, reference is made toFIG. 1 .Validation server device 24 may receiveapplication 20, which is to be executed byGPU 14, from device 12 (50). For example,validation server device 24 may receive source code, intermediate code, or object code ofapplication 20 fromdevice 12 vianetwork 22. -
Validation server device 24 may perform at least one of static analysis and dynamic analysis on application 20 (52). For example, as part of static analysis,emulator unit 26 ofvalidation server device 24 may compile the code ofapplication 20, and monitor for any errors during the compilation ofapplication 20. As part of the dynamic analysis,emulator unit 26 ofvalidation server device 24 may execute a virtual model ofGPU 14 or the virtual model ofGPU 14 and a virtual model ofdevice 12. As described above,GPU models 30 anddevice models 34 may include a virtual model ofGPU 14 anddevice 12, respectively. In some examples,GPU models 30 anddevice models 34 may include a generic GPU model and a generic device model. - For example,
emulator unit 26 may receive an identification ofGPU 14 and/ordevice 12 fromdevice 12.Emulator unit 26 may identify which one ofGPU models 30 corresponds toGPU 14 and which one ofdevice models 34 corresponds todevice 12, and execute the corresponding GPU and device models. If there is no corresponding GPU and/or device models forGPU 14 anddevice 12, or ifemulator unit 26 did not receive an identification ofGPU 14 and/ordevice 12,emulator unit 26 may execute the generic GPU and device models. - As part of the dynamic analysis,
emulator unit 26 may executeapplication 20 andinput application 20 withGPU inputs 32 for analyzingapplication 20. In these examples,application 20 may be considered as executing on the corresponding virtual model ofGPU 14, which is executing onemulator unit 26. In this way,emulator unit 26 may executeapplication 20, as ifapplication 20 is executing onGPU 14.Emulator unit 26 may monitor the functions performed by the corresponding virtual model ofGPU 14 such as memory accesses, rate of execution, termination instance, and other functions pertinent to the functionality ofGPU 14. -
Emulator unit 26 may determine whetherapplication 20 satisfies one or more performance criteria (54). The one or more performance criteria may be performance criteria associated with static analysis and performance criteria associated with dynamic analysis. For example, the one or more performance criteria may be criteria that there are no errors in the compilation ofapplication 20, as evaluated by compilingapplication 20 during the static analysis. As another example, the one or more performance criteria may be criteria thatapplication 20 not access out-of-bounds memory locations and not use up resources ofGPU 14 such thatGPU 14 is not able to perform other tasks in parallel, as evaluated by executingapplication 20 and providingapplication 20 withGPU inputs 32 during the dynamic analysis. There may be other examples of performance criteria that emulatorunit 26 may determine thatapplication 20 satisfies. -
Validation server device 24 may transmit a validation ofapplication 20 todevice 12 based on the determination (56). For example,validation server device 24 may transmit a validation ofapplication 20 todevice 12 ifapplication 20 satisfies the one or more performance criteria. Otherwise,validation server device 24 may transmit an invalidation ifapplication 20 does not satisfy the one or more performance criteria. For example, ifemulator unit 26 determines thatapplication 20 satisfies the one or more performance criteria,validation server device 24 may transmit an indication todevice 12 indicating as such. Alternatively, ifemulator unit 26 determines thatapplication 20 does not satisfy the one or more performance criteria,validation server device 24 may transmit an indication todevice 12 indicating as such. -
FIG. 4 is a flowchart illustrating another example operation ofvalidation server device 24. For purposes of illustration only, reference is made toFIGS. 1 and 3 . Similar toFIG. 3 ,validation server device 24 may receiveapplication 20, which is to be executed byGPU 14, from device 12 (58). In this example,emulator unit 26 may modify application 20 (e.g., the source code or intermediate code of application 20) to optimize ortune application 20. For example,emulator unit 26 may modify the code ofapplication 20 so thatapplication 20 executes more efficiently onGPU 14.Validation server device 24 may then transmit modifiedapplication 20 to device 12 (62). In some examples,validation server device 24 may transmit the source code or intermediate code of the modifiedapplication 20. As another example,validation server device 24 may compile the modified code of application, and transmit the resulting object code todevice 12. -
FIG. 5 is a block diagram illustrating the example device ofFIG. 1 in further detail. For instance,FIG. 5 illustratesdevice 12 ofFIG. 1 in further detail. For example, as indicated above, examples ofdevice 12 include, but are not limited to, mobile wireless telephones, PDAs, video gaming consoles that include video displays, mobile video conferencing units, laptop computers, desktop computers, television set-top boxes, and the like. - As illustrated in
FIG. 5 ,device 12 may includeGPU 14,processor 16,device memory 18,transceiver module 64,user interface 66,display 68, anddisplay processor 70.GPU 14,processor 16, anddevice memory 18 may be substantially similar or identical to those illustrated inFIG. 1 . For purposes of brevity, only the components that are shown inFIG. 5 , but not shown inFIG. 1 are described in detail. -
Device 12 may include additional modules or units not shown inFIG. 5 for purposes of clarity. For example,device 12 may include a speaker and a microphone, neither of which are shown inFIG. 5 , to effectuate telephonic communications in examples wheredevice 12 is a mobile wireless telephone, or a speaker wheredevice 12 is a media player. Furthermore, the various modules and units shown indevice 12 may not be necessary in every example ofdevice 12. For example,user interface 66 anddisplay 68 may be external todevice 12 in examples wheredevice 12 is a desktop computer or other device that is equipped to interface with an external user interface or display. - Examples of
user interface 66 include, but are not limited to, a trackball, a mouse, a keyboard, and other types of input devices.User interface 66 may also be a touch screen and may be incorporated as a part ofdisplay 68.Transceiver module 64 may include circuitry to allow wireless or wired communication betweendevice 12 and another device or a network.Transceiver module 64 may include one or more modulators, demodulators, amplifiers, antennas and other such circuitry for wired or wireless communication.Display 68 may comprise a liquid crystal display (LCD), an organic light emitting diode display (OLED), a cathode ray tube (CRT) display, a plasma display, a polarized display, or another type of display device. - In some examples, after
GPU 14 generates the graphics data for display ondisplay 68,GPU 14 may output the resulting graphics data todevice memory 18 for temporary storage.Display processor 70 may retrieve the graphics data fromdevice memory 18, perform any post-processing on the graphics data, and output the resulting the graphics data to display 68. For example,display processor 70 may perform any further enhancements or scale the graphics data generated byGPU 14. - In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions or code on a computer-readable medium. Computer-readable media may include computer data storage media. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. By way of example, and not limitation, such computer-readable media can comprise random access memory (RAM), read-only memory (ROM), EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- The code may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Also, the techniques could be fully implemented in one or more circuits or logic elements.
- The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (i.e., a chip set). Various components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
- Various examples have been described. These and other examples are within the scope of the following claims.
Claims (20)
1. A computer-readable storage medium comprising instructions that, when executed, cause one or more processors to:
receive, with a server device, an application that is to be executed by a graphics processing unit (GPU) that resides on a device external to the server device;
determine, by the server device, that the application would execute inefficiently on the GPU;
modify, by the server device and based on the determination that the application would execute inefficiently on the GPU, code of the application to create a modified version of the application that would execute more efficiently on the GPU than the received application;
perform, with the server device, an analysis of the modified version of the application during execution of the modified version of the application on the server device, wherein the instructions that cause the one or more processors to perform the analysis comprise instructions that cause the one or more processors to:
execute a virtual GPU model;
execute the modified version of the application on the virtual GPU model; and
analyze functionality of the virtual GPU model during the execution of the modified version of the application on the virtual GPU model;
determine whether the modified version of the application satisfies one or more performance criteria based on at least one of the analyses; and
transmit, to the device, the modified code of the application and a validation of the application if the application satisfies the one or more performance criteria.
2. The computer-readable storage medium of claim 1 , wherein the instructions that cause the one or more processors to receive the application further comprise instructions that cause the one or more processors to receive an identification of the GPU that resides on the device external to the server device, and further comprising instructions that cause the one or more processors to:
identify, based on the received identification of the GPU, a particular virtual GPU model of a plurality of virtual GPU models, wherein the instructions that cause the one or more processors to execute the virtual GPU model comprise instructions that cause the one or more processors to execute the identified particular virtual GPU model, and wherein the instructions that cause the one or more processors to execute the modified version of the application on the virtual GPU model comprise instructions that cause the one or more processors to execute the modified version of the application on the identified particular virtual GPU model.
3. A method comprising:
receiving an application that is to be executed by a graphics processing unit (GPU) of a device;
transmitting the application and an identification of the GPU to a server device external to the device for validation of the application on a virtual GPU model associated with the identified GPU of the device;
receiving a modified version of the application from the server device, wherein the modified version of the application would execute more efficiently on the GPU; and
receiving a validation from the server device that indicates that the modified version of the application satisfies one or more criteria for execution on the GPU.
4. The method of claim 3 , further comprising:
executing the modified version of the application on the GPU based on the received validation.
5. The method of claim 3 , wherein receiving the modified version of the application comprises receiving at least one of source code for the modified version of the application, intermediate code of the modified version of the application, and complied code for the modified version of the application, and wherein transmitting the application comprises transmitting at least one of the source code for the application, intermediate code of the application, and the compiled code of the application.
6. The method of claim 3 , further comprising:
executing the modified version of the application on the GPU.
7. The method of claim 3 , wherein transmitting the application comprises transmitting at least one of a source code of the application and an intermediate code of the application, wherein receiving the modified version of the application comprises receiving compiled object code of the modified version of the application from the server device, the method further comprising:
executing the compiled object code of the modified version of the application on the GPU.
8. The method of claim 3 , wherein transmitting the application to the server device comprises transmitting the application only once to the server device, and wherein receiving the validation from the server device comprises receiving, only once, the validation from the server device.
9. An apparatus comprising:
a graphics processing unit (GPU);
a device memory operable to store an application that is to be executed by the GPU; and
a processor configured to:
transmit the application and an identification of the GPU to a server device external to the apparatus for validation of the application on a virtual GPU model associated with the identified GPU of the device;
receive a modified version of the application from the server device, wherein the modified version of the application would execute more efficiently on the GPU; and
receive a validation from the server device that indicates that the modified version of the application satisfies one or more criteria for execution on the GPU.
10. The apparatus of claim 9 , wherein the processor is further configured to instruct the GPU to execute the modified version of the application based on the received validation, and wherein the GPU is operable to execute the application in response to the instruction from the processor.
11. The apparatus of claim 9 , wherein the processor receives at least one of source code for the modified version of the application, intermediate code of the modified version of the application, and complied code for the modified version of the application, and wherein the processor transmits at least one of the source code for the application, intermediate code of the application, and the compiled code of the application.
12. The apparatus of claim 9 , wherein the GPU is configured to execute the modified version of the application.
13. The apparatus of claim 9 , wherein the processor transmits at least one of a source code of the application and an intermediate code of the application, wherein the processor is configured to receive the modified version of the application by at least receiving compiled object code of the modified version of the application from the server device, and wherein the GPU is configured to execute the compiled object code of the modified version of the application.
14. The apparatus of claim 9 , wherein the processor transmits the application only once to the server device, and wherein the processor receives the validation from the server device only once.
15. A device comprising:
a graphics processing unit (GPU);
means for receiving an application that is to be executed by the GPU;
means for transmitting the application and an identification of the GPU to a server device external to the device for validation of the application on a virtual GPU model associated with the identified GPU of the device;
means for receiving a modified version of the application from the server device, wherein the modified version of the application would execute more efficiently on the GPU; and
means for receiving a validation from the server device that indicates that the modified version of the application satisfies one or more criteria for execution on the GPU.
16. The device of claim 15 , further comprising:
means for executing the modified version of the application on the GPU based on the received validation.
17. The device of claim 15 , wherein the means for receiving the modified version of the application comprise means for receiving at least one of source code for the modified version of the application, intermediate code of the modified version of the application, and complied code for the modified version of the application, and wherein the means for transmitting the application comprise means for transmitting at least one of the source code for the application, intermediate code of the application, and the compiled code of the application.
18. The device of claim 15 , further comprising:
means for executing the modified version of the application on the GPU.
19. The device of claim 15 , wherein the means for transmitting the application comprise means for transmitting at least one of a source code of the application and an intermediate code of the application, wherein the means for receiving the modified version of the application comprise means for receiving compiled object code of the modified version of the application from the server device, the device further comprising:
means for executing the compiled object code of the modified version of the application on the GPU.
20. The device of claim 15 , wherein the means for transmitting the application to the server device comprise means for transmitting the application only once to the server device, and wherein the means for receiving the validation from the server device comprise means for receiving, only once, the validation from the server device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/727,427 US20150261651A1 (en) | 2012-02-27 | 2015-06-01 | Validation of applications for graphics processing unit |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/406,272 US9075913B2 (en) | 2012-02-27 | 2012-02-27 | Validation of applications for graphics processing unit |
US14/727,427 US20150261651A1 (en) | 2012-02-27 | 2015-06-01 | Validation of applications for graphics processing unit |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/406,272 Continuation US9075913B2 (en) | 2012-02-27 | 2012-02-27 | Validation of applications for graphics processing unit |
Publications (1)
Publication Number | Publication Date |
---|---|
US20150261651A1 true US20150261651A1 (en) | 2015-09-17 |
Family
ID=47846123
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/406,272 Expired - Fee Related US9075913B2 (en) | 2012-02-27 | 2012-02-27 | Validation of applications for graphics processing unit |
US14/727,427 Abandoned US20150261651A1 (en) | 2012-02-27 | 2015-06-01 | Validation of applications for graphics processing unit |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/406,272 Expired - Fee Related US9075913B2 (en) | 2012-02-27 | 2012-02-27 | Validation of applications for graphics processing unit |
Country Status (6)
Country | Link |
---|---|
US (2) | US9075913B2 (en) |
EP (1) | EP2820544A1 (en) |
JP (1) | JP5934392B2 (en) |
KR (1) | KR101569308B1 (en) |
CN (1) | CN104137076B (en) |
WO (1) | WO2013130212A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150365463A1 (en) * | 2012-10-02 | 2015-12-17 | Nextbit Systems, Inc. | Dynamic application deployment |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9424089B2 (en) * | 2012-01-24 | 2016-08-23 | Samsung Electronics Co., Ltd. | Hardware acceleration of web applications |
US9648133B2 (en) * | 2012-03-12 | 2017-05-09 | Telefonaktiebolaget L M Ericsson | Optimizing traffic load in a communications network |
US9104873B1 (en) * | 2012-05-21 | 2015-08-11 | Symantec Corporation | Systems and methods for determining whether graphics processing units are executing potentially malicious processes |
US8990183B2 (en) * | 2012-06-06 | 2015-03-24 | Microsoft Technology Licensing, Llc | Deep application crawling |
EP2862077A4 (en) | 2012-06-15 | 2016-03-02 | Cycle Computing Llc | Method and system for automatically detecting and resolving infrastructure faults in cloud infrastructure |
US9646153B2 (en) * | 2012-08-08 | 2017-05-09 | Intel Corporation | Securing content from malicious instructions |
US9092617B2 (en) * | 2012-11-08 | 2015-07-28 | Intel Corporation | Protecting systems from unauthorized access to system resources using browser independent web page technology |
US9268668B1 (en) * | 2012-12-20 | 2016-02-23 | Google Inc. | System for testing markup language applications |
US10101982B2 (en) * | 2013-01-31 | 2018-10-16 | Htc Corporation | Methods for application management in an electronic device supporting hardware acceleration |
EP2973172B1 (en) * | 2013-03-12 | 2017-07-26 | Intel Corporation | Preventing malicious instruction execution |
US9740886B2 (en) * | 2013-03-15 | 2017-08-22 | Sony Interactive Entertainment Inc. | Enhanced security for hardware decoder accelerator |
US20140289719A1 (en) * | 2013-03-20 | 2014-09-25 | Google Inc. | Automatic version management |
US9819661B2 (en) * | 2013-09-12 | 2017-11-14 | The Boeing Company | Method of authorizing an operation to be performed on a targeted computing device |
US9747084B2 (en) * | 2014-09-09 | 2017-08-29 | Google Inc. | Offline shader compilation |
US9841972B2 (en) * | 2014-12-17 | 2017-12-12 | Cisco Technology, Inc. | Securing secret information in source code verification and at runtime |
US10241761B2 (en) * | 2014-12-29 | 2019-03-26 | Nvidia Corporation | System and method for compiler support for compile time customization of code |
US20160357530A1 (en) * | 2015-06-05 | 2016-12-08 | Apple Inc. | Method and apparatus for intermediate representation of applications |
US20180113794A1 (en) * | 2015-06-10 | 2018-04-26 | Intel Corporation | Webgl application analyzer |
CN107240155B (en) * | 2016-03-29 | 2019-02-19 | 腾讯科技(深圳)有限公司 | A kind of method, server and the 3D application system of model object building |
CN105976305B (en) * | 2016-04-26 | 2019-01-08 | 福州瑞芯微电子股份有限公司 | A kind of graphics accelerator IP verification method and device |
US10423500B2 (en) * | 2016-06-01 | 2019-09-24 | Seagate Technology Llc | Technologies for limiting performance variation in a storage device |
US10445218B2 (en) * | 2017-06-06 | 2019-10-15 | Microsoft Technology Licensing, Llc | Execution of graphic workloads on a simulated hardware environment |
KR102002545B1 (en) * | 2017-12-15 | 2019-07-22 | 슈어소프트테크주식회사 | Code test automatic proceeding method through virtualixation and appratus for the same |
KR20200101496A (en) | 2019-01-29 | 2020-08-28 | 삼성전자주식회사 | Serer and control method for providing accelerated computing environments |
WO2020194000A1 (en) * | 2019-03-28 | 2020-10-01 | Validata Holdings Limited | Method of detecting and removing defects |
US20230104468A1 (en) * | 2021-10-06 | 2023-04-06 | Dell Products L.P. | Ransomware detection in host encrypted data environment |
US20240036842A1 (en) * | 2022-07-29 | 2024-02-01 | Xilinx, Inc. | Method for mitigating memory access conflicts in a multi-core graph compiler |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110072056A1 (en) * | 2003-11-19 | 2011-03-24 | Reuven Bakalash | Internet-based graphics application profile management system for updating graphic application profiles stored within the multi-gpu graphics rendering subsystems of client machines running graphics-based applications |
US20110102443A1 (en) * | 2009-11-04 | 2011-05-05 | Microsoft Corporation | Virtualized GPU in a Virtual Machine Environment |
US7950003B1 (en) * | 2006-12-07 | 2011-05-24 | Sony Computer Entertainment Inc. | Heads-up-display software development tool for analyzing and optimizing computer software |
US8274518B2 (en) * | 2004-12-30 | 2012-09-25 | Microsoft Corporation | Systems and methods for virtualizing graphics subsystems |
US8436870B1 (en) * | 2006-08-01 | 2013-05-07 | Nvidia Corporation | User interface and method for graphical processing analysis |
US20140104287A1 (en) * | 2012-10-11 | 2014-04-17 | Hema C. Nalluri | Hardware assist for privilege access violation checks |
US20150074668A1 (en) * | 2013-09-09 | 2015-03-12 | Apple Inc. | Use of Multi-Thread Hardware For Efficient Sampling |
US9064322B1 (en) * | 2008-04-16 | 2015-06-23 | Nvidia Corporation | Method and system for steering access to display configuration information in a multi-GPU system |
US20150199262A1 (en) * | 2014-01-16 | 2015-07-16 | Vivek Bhavsar | Runtime code visualization |
US20150212815A1 (en) * | 2014-01-24 | 2015-07-30 | Nvidia Corporation | Methods and systems for maintenance and control of applications for performance tuning |
Family Cites Families (45)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2665089B2 (en) | 1991-09-26 | 1997-10-22 | 三菱電機株式会社 | Compilation method in distributed environment |
JPH05342012A (en) | 1992-06-10 | 1993-12-24 | Sony Corp | Compiling method and compiler |
US5761512A (en) | 1995-12-27 | 1998-06-02 | International Business Machines Corporation | Automatic client-server complier |
US5987256A (en) | 1997-09-03 | 1999-11-16 | Enreach Technology, Inc. | System and process for object rendering on thin client platforms |
JP4027482B2 (en) | 1997-12-24 | 2007-12-26 | 富士通株式会社 | Translation apparatus and method for performing cryptographic restoration |
JP2000122871A (en) | 1998-10-14 | 2000-04-28 | Hitachi Ltd | Application distributing method |
JP2000215181A (en) | 1999-01-21 | 2000-08-04 | Fujitsu Ltd | Network computer system and substitute compiling server device |
JP2001195264A (en) | 2000-01-07 | 2001-07-19 | Toshiba Corp | Compile method and computer system |
KR100441115B1 (en) | 2001-06-27 | 2004-07-19 | 주식회사 인터와이즈 | Java Compile-On-Demand Service System for Accelerating Processing Speed of Java Program on Data Processing System And Method Thereof |
JP2003114813A (en) * | 2001-10-03 | 2003-04-18 | Ibm Japan Ltd | Analysis server, program analysis network system and program analysis method |
JP2003216434A (en) | 2002-01-25 | 2003-07-31 | Hewlett Packard Co <Hp> | Method and system for optimizing downloaded program by activating user profile data in small it device |
US7340730B2 (en) | 2002-03-18 | 2008-03-04 | Sun Microsystems, Inc. | On demand, network accessible, run time compile server |
US7009605B2 (en) | 2002-03-20 | 2006-03-07 | Nvidia Corporation | System, method and computer program product for generating a shader program |
EP1588230A4 (en) | 2003-01-10 | 2008-05-07 | Nexaweb Technologies Inc | System and method for network-based computing |
US7095416B1 (en) * | 2003-09-22 | 2006-08-22 | Microsoft Corporation | Facilitating performance analysis for processing |
US7587712B2 (en) | 2003-12-19 | 2009-09-08 | Marvell International Ltd. | End-to-end architecture for mobile client JIT processing on network infrastructure trusted servers |
US7248265B2 (en) | 2004-04-16 | 2007-07-24 | Apple Inc. | System and method for processing graphics operations with graphics processing unit |
US7548892B2 (en) | 2004-04-30 | 2009-06-16 | Microsoft Corporation | Processing machine learning techniques using a graphics processing unit |
KR100590686B1 (en) | 2004-07-15 | 2006-06-19 | 에스케이 텔레콤주식회사 | Method of controlling the graphic accelerator for displaying 3D data in mobile terminal |
US20060012604A1 (en) | 2004-07-15 | 2006-01-19 | Avinash Seetharamaiah | Legacy processing for pixel shader hardware |
US7733347B2 (en) | 2004-11-05 | 2010-06-08 | Microsoft Corporation | Automated construction of shader programs |
US7650639B2 (en) | 2005-03-31 | 2010-01-19 | Microsoft Corporation | System and method for protecting a limited resource computer from malware |
US8271964B2 (en) | 2005-05-16 | 2012-09-18 | Microsoft Corporation | Extensible software development services |
US7389500B2 (en) * | 2005-07-08 | 2008-06-17 | Microsoft Corporation | Selective pre-compilation of virtual code to enhance boot time emulator performance |
US20070169066A1 (en) | 2005-11-17 | 2007-07-19 | Nielsen Spencer J | System and method for an extensible 3D interface programming framework |
US7797747B1 (en) * | 2006-02-21 | 2010-09-14 | Symantec Corporation | Detection of malicious code in non-paged pool unused pages |
US8764566B2 (en) | 2006-02-24 | 2014-07-01 | Igt | Internet remote game server |
JP4983801B2 (en) * | 2006-09-28 | 2012-07-25 | 富士通株式会社 | Program performance analyzer |
US8453104B2 (en) | 2006-10-27 | 2013-05-28 | Microsoft Corporation | Thin client software development environment |
US7986325B1 (en) | 2006-12-12 | 2011-07-26 | Nvidia Corporation | Loading integer-based data into a graphics processing system |
US8533697B2 (en) * | 2007-02-14 | 2013-09-10 | The Mathworks, Inc. | Graphical processing unit (GPU) arrays providing high computational capabilities in a computing environment |
US9069967B2 (en) | 2007-02-16 | 2015-06-30 | Veracode, Inc. | Assessment and analysis of software security flaws |
US7992137B2 (en) * | 2007-07-30 | 2011-08-02 | Nvidia Corporation | Client server system for analysis and performance tuning of remote graphics devices |
US8365153B2 (en) | 2007-10-26 | 2013-01-29 | Qualcomm Incorporated | Server-based code compilation |
US8984628B2 (en) | 2008-10-21 | 2015-03-17 | Lookout, Inc. | System and method for adverse mobile application identification |
US8108933B2 (en) | 2008-10-21 | 2012-01-31 | Lookout, Inc. | System and method for attack and malware prevention |
US8294723B2 (en) * | 2008-11-07 | 2012-10-23 | Google Inc. | Hardware-accelerated graphics for web applications using native code modules |
US8397241B2 (en) * | 2008-11-13 | 2013-03-12 | Intel Corporation | Language level support for shared virtual memory |
US8479286B2 (en) | 2009-12-15 | 2013-07-02 | Mcafee, Inc. | Systems and methods for behavioral sandboxing |
EP2336882A1 (en) | 2009-12-18 | 2011-06-22 | Telefonaktiebolaget L M Ericsson (PUBL) | Technique for run-time provision of executable code using off-device services |
US8462166B2 (en) * | 2010-10-01 | 2013-06-11 | Apple Inc. | Graphics system which measures CPU and GPU performance |
US8479295B2 (en) * | 2011-03-30 | 2013-07-02 | Intel Corporation | Method and apparatus for transparently instrumenting an application program |
US8584242B2 (en) * | 2011-07-12 | 2013-11-12 | At&T Intellectual Property I, L.P. | Remote-assisted malware detection |
US9514507B2 (en) * | 2011-11-29 | 2016-12-06 | Citrix Systems, Inc. | Methods and systems for maintaining state in a virtual machine when disconnected from graphics hardware |
US20130226535A1 (en) * | 2012-02-24 | 2013-08-29 | Jeh-Fu Tuan | Concurrent simulation system using graphic processing units (gpu) and method thereof |
-
2012
- 2012-02-27 US US13/406,272 patent/US9075913B2/en not_active Expired - Fee Related
-
2013
- 2013-01-30 EP EP13708934.8A patent/EP2820544A1/en not_active Withdrawn
- 2013-01-30 KR KR1020147027041A patent/KR101569308B1/en not_active IP Right Cessation
- 2013-01-30 JP JP2014558752A patent/JP5934392B2/en not_active Expired - Fee Related
- 2013-01-30 WO PCT/US2013/023874 patent/WO2013130212A1/en active Application Filing
- 2013-01-30 CN CN201380010829.3A patent/CN104137076B/en not_active Expired - Fee Related
-
2015
- 2015-06-01 US US14/727,427 patent/US20150261651A1/en not_active Abandoned
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110072056A1 (en) * | 2003-11-19 | 2011-03-24 | Reuven Bakalash | Internet-based graphics application profile management system for updating graphic application profiles stored within the multi-gpu graphics rendering subsystems of client machines running graphics-based applications |
US8274518B2 (en) * | 2004-12-30 | 2012-09-25 | Microsoft Corporation | Systems and methods for virtualizing graphics subsystems |
US8436870B1 (en) * | 2006-08-01 | 2013-05-07 | Nvidia Corporation | User interface and method for graphical processing analysis |
US7950003B1 (en) * | 2006-12-07 | 2011-05-24 | Sony Computer Entertainment Inc. | Heads-up-display software development tool for analyzing and optimizing computer software |
US9064322B1 (en) * | 2008-04-16 | 2015-06-23 | Nvidia Corporation | Method and system for steering access to display configuration information in a multi-GPU system |
US20110102443A1 (en) * | 2009-11-04 | 2011-05-05 | Microsoft Corporation | Virtualized GPU in a Virtual Machine Environment |
US20140104287A1 (en) * | 2012-10-11 | 2014-04-17 | Hema C. Nalluri | Hardware assist for privilege access violation checks |
US20150074668A1 (en) * | 2013-09-09 | 2015-03-12 | Apple Inc. | Use of Multi-Thread Hardware For Efficient Sampling |
US20150199262A1 (en) * | 2014-01-16 | 2015-07-16 | Vivek Bhavsar | Runtime code visualization |
US20150212815A1 (en) * | 2014-01-24 | 2015-07-30 | Nvidia Corporation | Methods and systems for maintenance and control of applications for performance tuning |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150365463A1 (en) * | 2012-10-02 | 2015-12-17 | Nextbit Systems, Inc. | Dynamic application deployment |
Also Published As
Publication number | Publication date |
---|---|
KR20140139516A (en) | 2014-12-05 |
JP5934392B2 (en) | 2016-06-15 |
KR101569308B1 (en) | 2015-11-13 |
EP2820544A1 (en) | 2015-01-07 |
CN104137076B (en) | 2017-05-24 |
CN104137076A (en) | 2014-11-05 |
WO2013130212A1 (en) | 2013-09-06 |
US9075913B2 (en) | 2015-07-07 |
US20130227521A1 (en) | 2013-08-29 |
JP2015511737A (en) | 2015-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9075913B2 (en) | Validation of applications for graphics processing unit | |
US9858057B2 (en) | Methods and apparatus to validate translated guest code in a dynamic binary translator | |
US11188639B2 (en) | System, method and apparatus for automatic program compartmentalization | |
US20170277903A1 (en) | Data Protection Using Virtual Resource Views | |
KR102324336B1 (en) | User device and integrity verification method for the same | |
US10127018B2 (en) | Dynamic addition of code in shared libraries | |
US10218508B2 (en) | Methods and apparatus to provide isolated execution environments | |
US20210365591A1 (en) | Secure debug of fpga design | |
WO2012154606A1 (en) | Efficient conditional flow control compilation | |
US8589657B2 (en) | Operating system management of address-translation-related data structures and hardware lookasides | |
US20150160945A1 (en) | Allocation of load instruction(s) to a queue buffer in a processor system based on prediction of an instruction pipeline hazard | |
JP2017500668A (en) | System and method for detecting malicious multimedia files | |
Siavvas et al. | On the relationship between software security and energy consumption | |
Xu et al. | Framework for State-Aware Virtual Hardware Fuzzing | |
US20100257514A1 (en) | Effective mapping of code sections to the same section of secondary memory to improve the security of computing systems | |
US20240119656A1 (en) | Method of Operating Shared GPU Resource and a Shared GPU Device | |
US20230078985A1 (en) | Checker and checking method for prossor circuit | |
US20240095375A1 (en) | Mechanism To Secure An Execution Environment In Processor Cores | |
US11086985B2 (en) | Binary authorization based on both file and package attributes | |
US11308202B2 (en) | Intrusion detection systems | |
Ivanov et al. | {SAGE}: Software-based Attestation for {GPU} Execution | |
Pustelnik et al. | Whispering Pixels: Exploiting Uninitialized Register Accesses in Modern GPUs | |
Xue et al. | On-line Firmware Updating and Fingerprint Generating for Solid State Disks | |
Payet | Modeling the Android platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BOURD, ALEXEI V.;YUN, JAY CHUNSUP;REEL/FRAME:035757/0944 Effective date: 20120222 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |