WO2014051608A1 - Application randomization - Google Patents

Application randomization Download PDF

Info

Publication number
WO2014051608A1
WO2014051608A1 PCT/US2012/057819 US2012057819W WO2014051608A1 WO 2014051608 A1 WO2014051608 A1 WO 2014051608A1 US 2012057819 W US2012057819 W US 2012057819W WO 2014051608 A1 WO2014051608 A1 WO 2014051608A1
Authority
WO
WIPO (PCT)
Prior art keywords
application
modification
intermediate representation
processor
instruction block
Prior art date
Application number
PCT/US2012/057819
Other languages
English (en)
French (fr)
Inventor
Brian Quentin Monahan
Keith Harrison
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to EP12885210.0A priority Critical patent/EP2901348A4/en
Priority to CN201280077350.7A priority patent/CN104798075A/zh
Priority to PCT/US2012/057819 priority patent/WO2014051608A1/en
Priority to US14/432,202 priority patent/US20150294114A1/en
Publication of WO2014051608A1 publication Critical patent/WO2014051608A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/14Protecting executable software against software analysis or reverse engineering, e.g. by obfuscation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/54Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by adding security routines or objects to programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation

Definitions

  • Applications are typically compiled for a particular environment (e.g., operating system and hardware platform) and executed at hosts such as computing systems that realize that environment. Accordingly, one instance of a particular build or version of an application is identical to other instances of that build or version of the application.
  • FIG. 1 is an illustration of operation of an application randomization system, according to an implementation.
  • FIG. 2 is a flowchart of a process to generate an annotated intermediate representation of an application, according to an implementation.
  • FIG. 3 is an illustration of an annotated intermediate representation of an application, according to an implementation.
  • FIG. 4 is an illustration of an annotated intermediate representation of an application, according to another implementation.
  • FIG. 5 is a flowchart of a process to apply random modification to an application, according to an implementation.
  • FIG. 6 is a flowchart of a random modification process, according to an implementation.
  • FIG. 7 is a schematic block diagram of an application randomization system, according to an implementation.
  • FIG. 8 is a schematic block diagram of a computing system hosting an application randomization system, according to an implementation.
  • Attackers often attempt to learn about the internal operation and structure of an application by interacting with the application. That is, an attacker can learn about an application by providing input to the application and observing output. As a specific example, an attacker can research a web-based or network-enabled application by providing random input and/or targeted input (e.g., input including values or symbols to exploit a particular security vulnerability or class of security vulnerabilities) via an interface of the application, and observing the output of the application. Such techniques can be referred to as fuzzing.
  • an attacker can provide input that is crafted to exploit a structured query language (SQL) vulnerability (e.g., an SQL query embedded in the input), a buffer overflow vulnerability (e.g., a large volume of data in the input), or an arbitrary code execution vulnerability (e.g., shell code embedded in the input) to an interface of an application. Based on the response or output corresponding to the input, the attacker can determine whether and where within the application a security vulnerability exists.
  • SQL structured query language
  • buffer overflow vulnerability e.g., a large volume of data in the input
  • an arbitrary code execution vulnerability e.g., shell code embedded in the input
  • Attackers also use reverse engineering techniques such as disassembly and assembly code analysis to research applications. For example, an attacker can disassemble a native-code (or object-code) representation of an application and analyze the resulting assembly instructions to learn about the structure and operation of the application.
  • a native-code or object-code
  • ASLR Address space layout randomization
  • an instance of an application can refer to a group of instructions stored at a memory (e.g., Random Access Memory (RAM)) that define the application and are being executed by a processor.
  • a memory e.g., Random Access Memory (RAM)
  • ASLR complicates exploitation of some security vulnerabilities because this technique forces attackers to dynamically identify the memory locations of these application components of an executing instance.
  • ASLR does not, however, change the operation or structure of the application itself. Rather, ASLR moves the in-memory locations of some application
  • Instantiation of an application refers to generating an instance of the application.
  • instantiation can include loading instructions or program code representing the application into a memory (e.g., RAM), and starting execution by a processor at an entry point (e.g., entry address) of the application.
  • instantiation of an application can include repositioning portions of the application within memory to effect ASLR.
  • ASLR methodologies implementations discussed herein can be combined with ASLR methodologies.
  • Random modifications discussed herein can be applied to each instance of the application (i.e., each time the application is instantiated or executed) to alter the structure and operation of the application without altering the functionality of the application.
  • the random modifications change how the application performs tasks, but do not change what tasks the application performs.
  • each instance of the application performs the same functionalities, but does so using different internal structure and/or operation. That is, the results of the different structure and/or operation in each instance are equivalent.
  • FIG. 1 is an illustration of operation of an application randomization system, according to an implementation. More specifically, FIG. 1 illustrates the flow of an application (or different representations of an application) through components (e.g., modules) of an application randomization system. As used herein, the term
  • application refers software that can be executed (or hosted) within an environment to perform one or more functionalities.
  • a network service such as a web or Hypertext Transfer Protocol server, a web application server, office
  • productivity e.g., word processing
  • PDF Portable Document Format
  • source code representation 1 1 1 of an application is provided to intermediate representation generator 120.
  • source code representation 1 1 1 1 can be a file or group of files that define the application in a programming language such as a native programming language. Examples of programming languages include: C, C++, C#, Objective-C, JavaTM, Haskell, Erlang, Scala, Lua, and Python. In some implementations, source code representation 1 1 1 can reference functionalities or resources external to source code representation 1 1 1 such as a library or
  • Internnediate representation generator 120 is a module that generates an intermediate representation 1 12 of the application based on source code
  • intermediate representation generator 120 can be a compiler or a portion of a compiler such as compiler components to perform lexical, syntactic, semantic, and optimization analysis and to output an intermediate representation of the application.
  • compiler components to perform lexical, syntactic, semantic, and optimization analysis and to output an intermediate representation of the application.
  • intermediate representation 1 12 can be a Low-Level Virtual Machine (LLVM) bitcode intermediate representation, source code
  • LLVM Low-Level Virtual Machine
  • representation 1 1 1 can be a group of C source code files, and intermediate representation generator 120 can include an LLVM compiler such as clang that outputs intermediate representation 1 12.
  • the LLVM intermediate representation can be described in a variety of forms. Typically, the LLVM intermediate representation is described in a bitcode form or a symbolic textual form, and an LLVM system includes utilities for converting between these forms.
  • implementations discussed herein with reference to an LLVM bitcode intermediate representation are specific example implementations of the invention. The methodologies and systems discussed in relation to such example implementations can be applicable to other implementations such as implementations that utilize other intermediate
  • representations such as LLVM intermediate representations in a symbolic form.
  • intermediate representation refers to a
  • intermediate language is a language of a machine other than the host of the application such as an abstract machine. That is, instructions represented in an intermediate representation are not executable directly by the host of the application (i.e., the machine or virtual machine that will execute the application).
  • intermediate language is a language of a machine other than the host of the application such as an abstract machine. That is, instructions represented in an intermediate representation are not executable directly by the host of the application (i.e., the machine or virtual machine that will execute the application).
  • RTL Register Transfer Language
  • SSA static single assignment
  • LLVM bitcode a stack-based intermediate language
  • Common Intermediate Language some other intermediate language, or a combination thereof.
  • an intermediate representation of an application is not executable directly by a host of the application.
  • the intermediate representation is not executed by the host without generating a native-code representation of the application using, for example as discussed in more detail herein, a random modification module and a native code generator. Accordingly, a unique or random native-code representation of the application is generated each time the application is instantiated or executed.
  • an intermediate representation simplifies flow analysis of an application.
  • an intermediate representation can represent an application in a form in which each instruction of the intermediate representation define only one operation (i.e., multi-operation instructions do not exist) and the number of registers available is very large or unlimited.
  • an intermediate representation can be a static single assignment form intermediate representation in which each register or variable is assigned once.
  • Intermediate representation 1 12 is then accessed by flow analysis module 130 to generate annotated intermediate representation 1 13.
  • Flow analysis module 130 analyzes intermediate representation 1 12 to identify instruction blocks within intermediate representation 1 12. For example, flow analysis module 130 can analyze intermediate representation 1 12 using data flow and/or control flow analysis techniques to identify instructions blocks within intermediate representation 1 12. Flow analysis module 130 then annotates intermediate representation 1 12 to identify instruction blocks and, in some implementations, properties or characteristics thereof within annotated intermediate representation 1 13.
  • instruction block means a group of related instructions within an intermediate representation.
  • subroutines within intermediate representation 1 12 can be defined as instruction blocks.
  • a group of sequential instructions for which a particular register or value is an operand can be defined as an instruction block.
  • an instruction block can be a group of instructions that are specified sequentially without interruption within an intermediate representation. More specifically, for example, the instructions between jump targets (e.g., instructions to which jump instructions transfer control or execution) and jump (or branch) instructions can be defined as an instruction block. That is, as specified by intermediate representation 1 12, each instruction in the instruction block is to be executed sequentially.
  • flow analysis module 130 can generate a control flow graph based on intermediate representation 1 12. Nodes of the control flow graph include (or represent) groups of instructions without any jump instructions or jump targets. That is, a jump target denotes the beginning of a block and a jump instruction denotes the end of a block. The edges of the control flow graph represent jumps (or braches) in the flow of the application. Flow analysis module 130 can then extract or identify the instruction blocks of the application from the nodes of the control flow graph.
  • flow analysis module 130 annotates intermediate representation 1 12 to identify the beginning of the instruction blocks to define annotated intermediate representation 1 13.
  • flow analysis module 130 includes additional annotations (or information) within annotated intermediate representation 1 13.
  • annotations can identify the ends of instruction blocks, identify lengths of instruction blocks, describe of instruction blocks, identify instructions blocks defined by subroutines, identify jump targets to which instruction blocks jump (i.e., the jump target or potential jump targets of a jump instruction at which an instruction block ends), identify the instruction blocks (or jump instructions) that jump to a jump target within an instruction block, and/or include additional information related to instruction blocks.
  • annotated intermediate representation 1 13 can be stored at data store 140.
  • Data store 140 is a device or service such as a hard disk drive (HDD), a non-volatile semiconductor based memory device such as a solid- state drive (SSD), a cache at a volatile memory, a file system, or a database at which annotated intermediate representation 1 13 can be stored for subsequent use.
  • HDD hard disk drive
  • SSD solid- state drive
  • Such storage can be useful for variety of reasons.
  • the flow analysis performed at flow analysis module 130 can take many seconds, minutes, or even hours for some applications.
  • annotated intermediate representation 1 13 can be used to generate a randomized intermediate representation of the application each time the application is instantiated (or launched).
  • Performing flow analysis of intermediate representation 1 12 for each instantiation of the application can significantly increase the time required to instantiate the application.
  • accessing pre-generated annotated intermediate representation 130 at data store 140 rather than performing flow analysis can reduce the time required to instantiate the application.
  • flow analysis module 130 can perform flow analysis on an intermediate representation of the updated application, and generate a new annotated intermediate representation to replace annotated intermediate representation 1 13.
  • Random modification module 150 accesses annotated intermediate representation 1 13 at data store 140, for example, in response to an instantiation signal associated with the application. That is, an environment in which the application will be hosted can provide a signal (or indication), for example, in response to user input, that indicates the application should be instantiated to random modification module 150. Random modification module 150 receives annotated intermediate representation 1 13, and identifies the instruction blocks using the annotations provided by flow analysis module 130. Thus, random modification module 150 need not perform flow analysis for the application. Rather, random modification module 150 relies on the annotations in annotated intermediate representation 1 13 to provide the results of the flow analysis performed by flow analysis module 130.
  • Random modification module 150 then randomly modifies the instructions blocks of the application.
  • the modifications performed by random modification module 150 alter the operation and/or structure of the application, but do not alter the functionality of the application. That is, the modifications alter the instruction blocks to, for example, change the number, order, operands, or types, of instructions without altering the results of the instruction blocks.
  • random modification module 150 can disaggregate one instruction block into multiple instruction blocks by adding jump instructions (e.g., the jump instructions chain the multiple instruction blocks together to provide equivalent functionality to the one instruction block); rearrange (or reorder) instructions that operate on different data within an instruction block; aggregate two or more instruction blocks by removing jump instructions and adding instructions from one instruction block to another instruction block; add additional instructions to an instruction block; alter an instruction block that is not a subroutine to be a subroutine and jump instructions for which that instruction block is a jump target to be subroutine calls to that instruction block; unroll a loop within an instruction block; combine loops within an instruction block; disaggregate one subroutine into multiple subroutines and add subroutine calls to the subroutines to chain the subroutines together to provide an equivalent result to the one subroutine; inline a subroutine (e.g., add instructions from the subroutine to each instruction block that calls the subroutine); and/or otherwise modify or obfuscate the intermediate representation of the
  • random modification module 150 randomly chooses whether to modify that instruction block and which modification or modifications to apply to that instruction block.
  • random refers to both true random processes with truly random results and pseudo-random processes such as seed- based pseudo-random number generators.
  • a random operation or some operation performed randomly can be based on, for example, a output from a Geiger counter, a photon counter, or a pseudo-random number generator provided with a randomization seed (i.e., a value input an as initial state to the pseudo-random number generator).
  • the randomization seed can be provided or selected by a user such as a system administrator.
  • an application randomization system can include an interface such as a graphical user interface via which a system administrator can provide a randomization seed.
  • This interface can be secured, for example, using authentication techniques, credentials (e.g., passwords or security certificates), cryptography, trusted computing mechanisms such as Trusted Platform Modules (TPMs), and/or other methodologies.
  • TPMs Trusted Platform Modules
  • Such implementations can be useful to allow the system administrator to cause an application randomization system to generate identical native-code representations of an application for, for example, debugging the application and/or the application randomization system.
  • the modifications are randomly selected based on the output of a pseudo-random number generator, providing the same randomization seed to the pseudo-random number generator causes the pseudo-random number generator to output the same sequence of random inputs (or random values) to a random modification module. Because the random modification module selects modifications for instruction blocks based on the random inputs from the pseudo-random number generator, providing a common
  • randomization seed to the pseudo-random number generator causes the random modification module to select the same modifications for the instruction blocks each time random modification module modifies the intermediate representation of the application.
  • Random modification module 150 outputs randomized intermediate
  • Randomized intermediate representation 1 14 is an intermediate representation of the application that includes the modifications performed by random modification module 150. Typically, randomized intermediate representation 1 14 does not include the annotations flow analysis module 130 added to
  • intermediate representation 1 12 to define annotated intermediate representation 1 13.
  • Native code generator 160 is a module that accesses randomized intermediate representation 1 14 and generates native-code representation 1 15 of the application.
  • Native-code representation 1 15 of the application is a representation of the application in which the application is defined by instructions that can be executed at the host of the application.
  • native code generator 160 can be a just-in-time compiler or translator to generate native-code representation 1 15 from randomized intermediate
  • native-code representation 1 15 is generated based on (or using or from) randomized intermediate representation 1 14, native-code representation 1 15 includes (or has) the modifications performed at random modification module 150. In other words, the modifications performed at random modification module 150 are applied to (or at) native-code representation 1 15.
  • randomized intermediate representation 1 14 can be specified in LLVM bitcode intermediate representation
  • native code generator 160 can be an LLVM just-in-time compiler for an x86 architecture
  • native-code representation 1 15 can be defined by x86 object or binary code.
  • native code generator 160 does not perform any optimizations or only performs some types of optimizations on randomized
  • intermediate representation 1 14 to generate native-code representation 1 15.
  • native code generator 160 can combine single-operation instructions into multi-operation instructions, but does not remove irrelevant instructions. Such implementations can be particularly beneficial to prevent native code generator 160 from removing or "optimizing out” the random modifications performed by random modification module 150 to generate randomized intermediate representation 1 14.
  • intermediate representation generator 120 can perform optimizations on source code representation 1 1 1 to generate intermediate representation 1 12. In some implementations, intermediate representation generator 120 can perform optimizations that native code generator 160 does not perform on source code representation 1 1 1 to generate intermediate representation 1 12. To continue the example from above, intermediate representation generator 120 can perform optimizations to remove irrelevant instructions although native code generator 160 does not. Because intermediate representation generator 120 performs optimizations before random modification module 150 randomly modifies the application, these optimizations do not interfere with the modifications performed by random modification module 150.
  • a software vendor can use intermediate
  • representation generator 120 and flow analysis module 130 to distribute an application as annotated intermediate representation 1 13.
  • the software vendor can distribute the application as annotated intermediate representation 1 13.
  • Users of the application can then instantiate the application at a host (e.g., a computing system) with an application randomization system including random modification module 150 and native code generator 160. That is, data store 140, random modification module 150, and native code generator 160 can be accessible to the host.
  • a host e.g., a computing system
  • an application randomization system including random modification module 150 and native code generator 160. That is, data store 140, random modification module 150, and native code generator 160 can be accessible to the host.
  • representation of the application that differs from other native-code representations of the application is generated and executed at the host.
  • a software vendor can generate a native-code representation of the application for each user or client. That is, data store 140, random modification module 150, and native code generator 160 can be accessible to the software vendor. For example, a potential user of the application can request a native-code representation of the application via, for example, a web page or other interface. The software vendor can then access annotated intermediate
  • representation 1 13 at data store 140 provides intermediate representation 1 13 to random modification module 150, and a randomized intermediate representation of the application to native code generator 160.
  • Native code generator 160 then generates the native-code representation of the application for that user, and provides the native-code representation of the application to that user.
  • each user of the application can have a unique native-code representation of the application.
  • FIG. 2 is a flowchart of a process to generate an annotated intermediate representation of an application, according to an implementation.
  • Process 200 can be implemented, for example, to distribute an application in an annotated
  • Flow analysis is performed on an intermediate representation of an application at block 210 to identify instruction blocks within the intermediate representation of the application. For example, a control flow graph or data flow graph can be generated to identify instruction blocks of the application.
  • Information related to the instruction blocks of the application is then used at block 220 to generate an annotated intermediate representation of the application.
  • the annotated intermediate representation of the application includes the
  • annotations identify, for example, the beginning and end of instructions blocks, instructions blocks defined by subroutines, jump targets to which instruction blocks jump, registers used within an instruction block, and/or other characteristics or properties of instruction blocks.
  • an annotated intermediate representation can be in any of a variety of formats.
  • FIG. 3 is an illustration of an annotated intermediate representation of an application, according to an implementation.
  • Annotated intermediate representation 300 includes two sections: section 310 including references to instruction blocks (i.e., annotations identifying instruction blocks), and section 320 including an intermediate representation of an application. Sections 310 and 320 can be, for example, separate files. Section 320 can be a file including an intermediate representation of an application.
  • the intermediate representation can be an LLVM bitcode intermediate representation, and references to blocks 31 1 -319 can be bit or byte offsets into the LLVM bitcode intermediate representation at which instruction blocks are encoded.
  • sections 310 and 320 can be different portions of a file or data associated with a file. More specifically, for example, section 310 can be metadata at a particular portion of a file (e.g., at the beginning of a file) or metadata stored within a file system and associated with a file including section 320 (i.e., the intermediate representation of the application).
  • a byte offset to the beginning of each instruction block within the intermediate representation analyzed at block 210 can be determined, and a value representing that byte offset can be stored at a file or as metadata with an identifier (e.g., a unique number or alpha-numeric identifier) of that instruction block.
  • an identifier e.g., a unique number or alpha-numeric identifier
  • the identifier, byte offset, and any other information stored at the file or as metadata can be referred to as an annotation.
  • FIG. 4 is an illustration of an annotated intermediate representation of an application, according to another implementation.
  • Annotated intermediate representation 400 includes multiple sections, each of which includes the intermediate representation of an instruction block.
  • each of sections 41 1 -419 includes the intermediate representation of an instruction block represented by that section.
  • annotated intermediate representation 400 can be an Extensible Markup Language (XML) document in which each section is an XML element representing an instruction block that encapsulates the
  • an XML document can be generated, and the intermediate representation of each instruction block copied from the
  • Each XML element can also include attributes or other elements to describe the instruction block.
  • attributes or other elements can include a byte offset of the instruction block, an identifier of the instruction block, jump targets to that instruction block jumps, and/or identifiers of other instruction blocks that jump to that instruction block.
  • the application randomization system can use various tools or utilities to manipulate the intermediate representation.
  • the application randomization system can use tools or utilities of an LLVM system to read, produce, alter, or otherwise manipulate the intermediate representation.
  • tools and utilities can include mechanisms for accesses groups of instructions within the intermediate
  • the annotated intermediate representation of the application can be distributed to hosts.
  • the annotated intermediate representation of the application can be distributed to hosts as downloads via a communications link such as the Internet.
  • a communications link such as the Internet.
  • representation of the application can be distributed to hosts on non-transitory processor-readable media such as digital versatile disc (DVDs), FLASH drives, or other media.
  • DVDs digital versatile disc
  • FLASH drives or other media.
  • FIG. 5 is a flowchart of a process to apply random modifications to an application, according to an
  • Process 500 can be implemented at an application randomization system hosted at a host such as a computing device to generate a new native-code representation of an application from an annotated intermediate representation of the application each time the application is instantiated.
  • an instantiation signal such as a load-time instantiation signal for (or associated with) an application is received.
  • an operating system can provide a signal by calling a subroutine or invoking a method of the application randomization system implementing process 500 to indicate that the application should be instantiated.
  • the application randomization system accesses an annotated intermediate representation of the application at block 520.
  • the application randomization system can access the annotated intermediate representation of the application at a file system, database, or other data store.
  • FIG. 6 illustrates an example process to apply random modification to an application, and is discussed in more detail below.
  • the randomized intermediate representation of the application is used to generate a native-code representation of the application at block 540.
  • the application randomization system can include or access a compiler such as a just-in- time compiler to convert the randomized intermediate representation to a native- code representation.
  • the application randomization system can disable or exclude optimization functionalities of the compiler (e.g., a just-in-time compiler) to prevent the compiler from removing the random modifications applied to the randomized intermediate representation at block 540.
  • the application is then instantiated and the native-code representation of the application executed at block 550 by, for example, loading the native-code
  • the native-code representation of the application into a memory of a host and beginning to execute instruction at an entry point of the native-code representation of the application. That instance of the application executes until it terminates or is terminated at block 560, and the native-code representation of the application is discarded at block 570.
  • the native-code representation can be erased from a memory of the host and/or a file storing the native-code representation of the application can be deleted from a file system.
  • the native-code representation of the application is archived at a data store.
  • process 500 can be executed at the application randomization system for each instantiation signal generated for the application.
  • each instance of the application is based on a unique native-code
  • Process 500 illustrated in FIG. 5 is an example of a process to randomize an application.
  • process 500 can include additional and/or fewer blocks or steps than those illustrated in FIG. 5.
  • process 500 does not include blocks 560 and 570.
  • process 500 does not include block 550.
  • the application randomization system implementing process 500 can store the native-code representation of the application at a data store, and provide a signal to an environment such as an operating system to instantiate the application using the native-code representation.
  • FIG. 6 is a flowchart of a random modification process, according to an implementation.
  • Process 600 can be, for example, a sub-process of a process to randomize an application such as process 500. As a specific example, process 600 can be executed at block 530 of process 500.
  • an application For example, an application
  • randomization system implementing process 600 can parse the annotated
  • an annotation can identify a beginning instruction of the instruction, can encapsulate an intermediate representation of the instruction block, and/or can describe other features or characteristics of an instruction block.
  • the application randomization system determines a random input at block 620.
  • the random input can be, for example, a random number or value from a pseudo-random number generator or a random source.
  • the random input is then used to select a modification for the instruction block at block 630.
  • a hash function can be applied to the random input, and the output of the hash function is a value that indicates which of a group of modifications should be applied to the instruction block. More specifically, for example, the value from the hash function can be input to a lookup table to select a modification for the instruction block. Thus, the modification for the instruction block is chosen (or selected) at random.
  • the application randomization system can vary the amount of modification performed on an application.
  • the application randomization system can include an interface such as a graphical user interface via which a system administrator can specify a level or amount of modification.
  • the application randomization system can weight or bias, for example, a hash function or lookup table (e.g., include multiple entries for a preferred modification or group thereof) toward no modification, a particular group of modifications, or a particular modification based on this input. In other words, in implementations, some modifications can be preferred over (or be more likely than) other modifications.
  • the modification is then performed on the instruction block at block 640.
  • the instruction block identified at block 610 is modified according to the modification randomly selected at block 630. That is, for example, instructions are added to, removed from, modified within, or rearranged within the instruction block.
  • other instruction blocks are modified at block 640.
  • other instruction blocks associated with the instruction block identified at block 610 such as instruction blocks that end in a jump to that instruction block (i.e., instruction blocks for which that instruction block is a jump target) or instruction blocks that are jump targets of that instruction block can also be modified at block 640.
  • the modified instruction block is then stored as a randomized intermediate representation of the application at a memory or data store.
  • the modification or modifications can be, for example, disaggregation of one instruction block into multiple instructions by adding jump instructions,
  • the modification is recorded at block 650.
  • a description or identifier of the modification can be recorded at a modification log for later analysis or auditing.
  • recording the modification includes recording a description of the instruction block to which the modification was applied, a representation of that instruction block before the modification, a representation of that instruction block after the modification, and/or other information related to the modification.
  • Process 600 then proceeds to block 660 to determine whether there are additional instruction blocks within the annotated intermediate representation. If the annotated intermediate representation includes additional instruction blocks, process 600 returns to block 610 at which another instruction block is identified. If the annotated intermediate representation does not include additional instruction blocks, process 600 is complete. In other words, the randomized intermediate
  • representation of the application is complete when all the instruction blocks of the annotated intermediate representation have been processed or considered at blocks 610, 620, 630, 640, and 650.
  • Process 600 illustrated in FIG. 6 is an example of a process to randomize an application.
  • process 600 can include additional, fewer, and/or rearranged blocks or steps than those illustrated in FIG. 6.
  • process 600 does not include block 650. That is, the application randomization system does not record a modification log.
  • process 600 does not include block 650, but includes a block at which a randomization seed used to determine the random input at block 620 is recorded.
  • the random input can be an output of a pseudo-random number generator to which the randomization seed was provided as an initial state.
  • Recording the randomization seed allows, for example, a system administrator to later determine the random inputs used to randomly select the modifications by which the application randomization system randomized the application. Using the random inputs, the system administrator can determine which modifications were performed on which instruction blocks, and reconstruct the randomized intermediate representation of the application based on this information.
  • FIG. 7 is a schematic block diagram of an application randomization system, according to an implementation.
  • Application randomization system 700 illustrated in FIG. 7 includes intermediate representation generator 720, flow analysis module 730, random modification module 750, and native code generator 760.
  • these particular modules i.e., combinations of hardware and software
  • various other modules are illustrated and discussed in relation to FIG. 7 and other example implementations, other combinations or sub-combinations of modules can be included within other implementations.
  • the modules illustrated in FIG. 7 and discussed in other example implementations perform specific functionalities in the examples discussed herein, these and other
  • Intermediate representation generator 720, flow analysis module 730, random modification module 750, and native code generator 760 are similar to intermediate representation generator 120, flow analysis module 130, random modification module 150, and native code generator 160, respectively, discussed above in relation to FIG. 1 .
  • Intermediate representation generator 720, flow analysis module 730, random modification module 750, and native code generator 760 can be hosted at one host, or can be distributed.
  • intermediate representation generator 720 and flow analysis module 730 can be hosted within an application development environment, and random modification module 750 and native code generator 760 can be hosted at hosts of an application.
  • intermediate representation generator 720 and flow analysis module 730 can be hosted within an application built or compilation system (e.g., a computing system including software to compile a source code representation of an application), and random modification module 750 and native code generator 760 can each be hosted at many computing devices at which instances of an application can be hosted.
  • random modification module 750 and native code generator 760 can be referred to as an application randomization system.
  • FIG. 8 is a schematic block diagram of a computing system hosting an application randomization system, according to an implementation.
  • a computing system hosting an application randomization system is itself referred to as an application randomization system.
  • an application randomization system is itself referred to as an application randomization system.
  • computing system 800 includes processor 810 and memory 830.
  • Computing system 800 can be, for example, a personal computer such as a desktop computer or a notebook computer, a tablet device, a smartphone, a television, or some other computing system.
  • Processor 810 is any combination of hardware and software that executes or interprets instructions, codes, or signals.
  • processor 810 can be a microprocessor, an application-specific integrated circuit (ASIC), a distributed processor such as a cluster or network of processors or computing systems, a multi- core or multi-processor processor, or a virtual or logical processor of a virtual machine.
  • ASIC application-specific integrated circuit
  • Memory 830 is a processor-readable medium that stores instructions, codes, data, or other information.
  • a processor-readable medium is any medium that stores instructions, codes, data, or other information non-transitorily and is directly or indirectly accessible to a processor.
  • a processor- readable medium is a non-transitory medium at which a processor can access instructions, codes, data, or other information.
  • memory 830 can be a volatile random access memory (RAM), a persistent data store such as a hard disk drive or a solid-state drive, a compact disc (CD), a digital versatile disc (DVD), a Secure DigitalTM (SD) card, a MultiMediaCard (MMC) card, a CompactFlashTM (CF) card, or a combination thereof or other memories.
  • RAM volatile random access memory
  • CD compact disc
  • DVD digital versatile disc
  • SD Secure DigitalTM
  • MMC MultiMediaCard
  • CF CompactFlashTM
  • memory 830 can represent multiple processor-readable media.
  • memory 830 can be integrated with processor 810, separate from processor 810, or external to computing system 800.
  • Memory 830 includes instructions or codes that when executed at processor 810 implement operating system 831 , random modification module 835 and native code generator 836.
  • random modification module 835 and native code generator 836 can collectively be referred to as an application randomization system.
  • an application randomization system can include additional or fewer modules (or components) than illustrated in FIG. 8.
  • memory 830 is operable to store annotated
  • intermediate representation 839 For example, during run-time of operating system 831 , annotated intermediate representation 839 can be received via a
  • computing system 800 can include (not illustrated in FIG. 8) a processor- readable medium access device (e.g., CD, DVD, SD, MMC, or a CF drive or reader), and can access annotated intermediate representation 839 at a processor-readable medium via that processor-readable medium access device.
  • a processor- readable medium access device e.g., CD, DVD, SD, MMC, or a CF drive or reader
  • computing system 800 can be a virtualized computing system.
  • computing system 800 can be hosted as a virtual machine at a computing server.
  • computing system 800 can be a computing appliance or virtualized computing appliance, and operating system 831 is a minimal or just-enough operating system to support (e.g., provide services such as a communications protocol stack and access to
  • computing system 800 such as a communications interface
  • random modification module 835 random modification module 835 and native code generator 836.
  • the application randomization system including random modification module 835 and native code generator 836 can be accessed or installed at computing system 800 from a variety of memories or processor-readable media.
  • computing system 800 can access an application randomization system at a remote processor-readable medium via a communications interface (not shown).
  • computing system 810 can be a network-boot device that accesses operating system 831 , random modification module 835 and native code generator 836 during a boot process (or sequence).
  • computing system 800 can include (not illustrated in FIG. 8) a processor-readable medium access device (e.g., CD, DVD, SD, MMC, or a CF drive or reader), and can access random modification module 835 and native code generator 836 at a processor-readable medium via that processor-readable medium access device.
  • the processor-readable medium access device can be a DVD drive at which a DVD including an installation package for one or more of random modification module 835 and native code generator 836 is accessible.
  • the installation package can be executed or interpreted at processor 800 to install one or more of random modification module
  • computing system 800 can then host or execute one or more of random modification module 835 and native code generator 836 at computing system 800 (e.g., at memory 830).
  • Computing system 800 can then host or execute one or more of random modification module 835 and native code generator 836.
  • random modification module 835 and native code generator 836 can be accessed at or installed from multiple sources, locations, or resources.
  • some components of random modification module 835 and native code generator 836 can be installed via a communications link (e.g., from a file server accessible via a communication link), and other components of random modification module 835 and native code generator 836 can be installed from a DVD.
  • random modification module 835 and native code generator 836 can be distributed across multiple computing systems. That is, some components of random modification module 835 and native code generator 836 can be hosted at one computing system and other components of random modification module 835 and native code generator 836 can be hosted at another computing system. As a specific example, random modification module 835 and native code generator 836 can be hosted within a cluster of computing systems where
  • module refers to a combination of hardware (e.g., a processor such as an integrated circuit or other circuitry) and software (e.g., machine- or processor-executable instructions, commands, or code such as firmware, programming, or object code).
  • a combination of hardware and software includes hardware only (i.e., a hardware element with no software elements), software hosted at hardware (e.g., software that is stored at a memory and executed or interpreted at a processor), or hardware and software hosted at hardware.
  • the singular forms "a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise.
  • the term “module” is intended to mean one or more modules or a combination of modules.
  • the term “provide” as used herein includes push mechanism (e.g., sending data to a computing system or agent via a communications path or channel), pull mechanisms (e.g., delivering data to a computing system or agent in response to a request from the computing system or agent), and store mechanisms (e.g., storing data at a data store or service at which a computing system or agent can access the data).
  • the term “based on” means “based at least in part on.” Thus, a feature that is described as based on some cause, can be based only on the cause, or based on that cause and on one or more other causes.
PCT/US2012/057819 2012-09-28 2012-09-28 Application randomization WO2014051608A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP12885210.0A EP2901348A4 (en) 2012-09-28 2012-09-28 ANWENDUNGSRANDOMISIERUNG
CN201280077350.7A CN104798075A (zh) 2012-09-28 2012-09-28 应用随机化
PCT/US2012/057819 WO2014051608A1 (en) 2012-09-28 2012-09-28 Application randomization
US14/432,202 US20150294114A1 (en) 2012-09-28 2012-09-28 Application randomization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2012/057819 WO2014051608A1 (en) 2012-09-28 2012-09-28 Application randomization

Publications (1)

Publication Number Publication Date
WO2014051608A1 true WO2014051608A1 (en) 2014-04-03

Family

ID=50388797

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/057819 WO2014051608A1 (en) 2012-09-28 2012-09-28 Application randomization

Country Status (4)

Country Link
US (1) US20150294114A1 (zh)
EP (1) EP2901348A4 (zh)
CN (1) CN104798075A (zh)
WO (1) WO2014051608A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10402563B2 (en) 2016-02-11 2019-09-03 Morphisec Information Security Ltd. Automated classification of exploits based on runtime environmental features
US10528735B2 (en) 2014-11-17 2020-01-07 Morphisec Information Security 2014 Ltd. Malicious code protection for computer systems based on process modification

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089089B2 (en) * 2015-06-03 2018-10-02 The Mathworks, Inc. Data type reassignment
US10248434B2 (en) * 2015-10-27 2019-04-02 Blackberry Limited Launching an application
US10795976B2 (en) * 2016-01-11 2020-10-06 Siemens Aktiengesellschaft Program randomization for cyber-attack resilient control in programmable logic controllers
US10268601B2 (en) 2016-06-17 2019-04-23 Massachusetts Institute Of Technology Timely randomized memory protection
US10310991B2 (en) * 2016-08-11 2019-06-04 Massachusetts Institute Of Technology Timely address space randomization
US10133560B2 (en) * 2016-09-22 2018-11-20 Qualcomm Innovation Center, Inc. Link time program optimization in presence of a linker script
US20180275976A1 (en) * 2017-03-22 2018-09-27 Qualcomm Innovation Center, Inc. Link time optimization in presence of a linker script using path based rules
US11022950B2 (en) * 2017-03-24 2021-06-01 Siemens Aktiengesellschaft Resilient failover of industrial programmable logic controllers
US11250123B2 (en) * 2018-02-28 2022-02-15 Red Hat, Inc. Labeled security for control flow inside executable program code
US11763188B2 (en) 2018-05-03 2023-09-19 International Business Machines Corporation Layered stochastic anonymization of data
BR112021018798A2 (pt) * 2019-03-21 2022-02-15 Capzul Ltd Detecção e prevenção de engenharia reversa de programas de computador
US11074055B2 (en) * 2019-06-14 2021-07-27 International Business Machines Corporation Identification of components used in software binaries through approximate concrete execution
JP7335591B2 (ja) 2019-07-22 2023-08-30 コネクトフリー株式会社 コンピューティングシステムおよび情報処理方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006062849A2 (en) * 2004-12-06 2006-06-15 Microsoft Corporation Proactive computer malware protection through dynamic translation
US7171693B2 (en) * 2000-05-12 2007-01-30 Xtreamlok Pty Ltd Information security method and system
US20080016314A1 (en) * 2006-07-12 2008-01-17 Lixin Li Diversity-based security system and method
EP2264635A1 (en) * 2009-06-19 2010-12-22 Thomson Licensing Software resistant against reverse engineering
US20120246484A1 (en) * 2011-03-21 2012-09-27 Mocana Corporation Secure execution of unsecured apps on a device

Family Cites Families (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6643775B1 (en) * 1997-12-05 2003-11-04 Jamama, Llc Use of code obfuscation to inhibit generation of non-use-restricted versions of copy protected software applications
FR2775370B1 (fr) * 1998-02-20 2001-10-19 Sgs Thomson Microelectronics Procede de gestion d'interruptions dans un microprocesseur
US7092523B2 (en) * 1999-01-11 2006-08-15 Certicom Corp. Method and apparatus for minimizing differential power attacks on processors
US6598166B1 (en) * 1999-08-18 2003-07-22 Sun Microsystems, Inc. Microprocessor in which logic changes during execution
US7065652B1 (en) * 2000-06-21 2006-06-20 Aladdin Knowledge Systems, Ltd. System for obfuscating computer code upon disassembly
US7243340B2 (en) * 2001-11-15 2007-07-10 Pace Anti-Piracy Method and system for obfuscation of computer program execution flow to increase computer program security
JP2003280755A (ja) * 2002-03-25 2003-10-02 Nec Corp 自己復元型プログラム、プログラム生成方法及び装置、情報処理装置並びにプログラム
JP2003280754A (ja) * 2002-03-25 2003-10-02 Nec Corp 隠蔽化ソースプログラム、ソースプログラム変換方法及び装置並びにソース変換プログラム
US7424620B2 (en) * 2003-09-25 2008-09-09 Sun Microsystems, Inc. Interleaved data and instruction streams for application program obfuscation
US7383583B2 (en) * 2004-03-05 2008-06-03 Microsoft Corporation Static and run-time anti-disassembly and anti-debugging
US7587616B2 (en) * 2005-02-25 2009-09-08 Microsoft Corporation System and method of iterative code obfuscation
US7584364B2 (en) * 2005-05-09 2009-09-01 Microsoft Corporation Overlapped code obfuscation
US20090106744A1 (en) * 2005-08-05 2009-04-23 Jianhui Li Compiling and translating method and apparatus
JP4918544B2 (ja) * 2005-10-28 2012-04-18 パナソニック株式会社 難読化評価方法、難読化評価装置、難読化評価プログラム、記憶媒体および集積回路
CN101416197A (zh) * 2006-02-06 2009-04-22 松下电器产业株式会社 程序混淆装置
KR20080113277A (ko) * 2006-04-28 2008-12-29 파나소닉 주식회사 프로그램 난독화시스템, 프로그램 난독화장치 및 프로그램 난독화방법
JP4470982B2 (ja) * 2007-09-19 2010-06-02 富士ゼロックス株式会社 情報処理装置及び情報処理プログラム
US20090094443A1 (en) * 2007-10-05 2009-04-09 Canon Kabushiki Kaisha Information processing apparatus and method thereof, program, and storage medium
EP2235713A4 (en) * 2007-11-29 2012-04-25 Oculis Labs Inc METHOD AND APPARATUS FOR SECURE VISUAL CONTENT DISPLAY
JP4905480B2 (ja) * 2009-02-20 2012-03-28 富士ゼロックス株式会社 プログラム難読化プログラム及びプログラム難読化装置
EP2362314A1 (en) * 2010-02-18 2011-08-31 Thomson Licensing Method and apparatus for verifying the integrity of software code during execution and apparatus for generating such software code
WO2011116446A1 (en) * 2010-03-24 2011-09-29 Irdeto Canada Corporation System and method for random algorithm selection to dynamically conceal the operation of software
US9274976B2 (en) * 2010-11-05 2016-03-01 Apple Inc. Code tampering protection for insecure environments
US20120159193A1 (en) * 2010-12-18 2012-06-21 Microsoft Corporation Security through opcode randomization
US8707053B2 (en) * 2011-02-09 2014-04-22 Apple Inc. Performing boolean logic operations using arithmetic operations by code obfuscation
US8615735B2 (en) * 2011-05-03 2013-12-24 Apple Inc. System and method for blurring instructions and data via binary obfuscation
US8661549B2 (en) * 2012-03-02 2014-02-25 Apple Inc. Method and apparatus for obfuscating program source codes
US9213841B2 (en) * 2012-07-24 2015-12-15 Google Inc. Method, manufacture, and apparatus for secure debug and crash logging of obfuscated libraries
US9569184B2 (en) * 2012-09-05 2017-02-14 Microsoft Technology Licensing, Llc Generating native code from intermediate language code for an application

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7171693B2 (en) * 2000-05-12 2007-01-30 Xtreamlok Pty Ltd Information security method and system
WO2006062849A2 (en) * 2004-12-06 2006-06-15 Microsoft Corporation Proactive computer malware protection through dynamic translation
US20080016314A1 (en) * 2006-07-12 2008-01-17 Lixin Li Diversity-based security system and method
EP2264635A1 (en) * 2009-06-19 2010-12-22 Thomson Licensing Software resistant against reverse engineering
US20120246484A1 (en) * 2011-03-21 2012-09-27 Mocana Corporation Secure execution of unsecured apps on a device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2901348A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10528735B2 (en) 2014-11-17 2020-01-07 Morphisec Information Security 2014 Ltd. Malicious code protection for computer systems based on process modification
US10402563B2 (en) 2016-02-11 2019-09-03 Morphisec Information Security Ltd. Automated classification of exploits based on runtime environmental features

Also Published As

Publication number Publication date
US20150294114A1 (en) 2015-10-15
CN104798075A (zh) 2015-07-22
EP2901348A1 (en) 2015-08-05
EP2901348A4 (en) 2016-12-14

Similar Documents

Publication Publication Date Title
US20150294114A1 (en) Application randomization
US10339837B1 (en) Distribution of scrambled binary output using a randomized compiler
US9459893B2 (en) Virtualization for diversified tamper resistance
Caballero et al. Binary Code Extraction and Interface Identification for Security Applications.
KR101740134B1 (ko) 어플리케이션의 코드 난독화를 위한 시스템 및 방법
TW201807570A (zh) 使用基於偏移的虛擬位址映射對目標應用功能的基於核心的偵測
US8701104B2 (en) System and method for user agent code patch management
US20160210216A1 (en) Application Control Flow Models
KR20140124774A (ko) 소프트웨어 코드의 생성 및 캐싱 기법
EP3126973A1 (en) Method, apparatus, and computer-readable medium for obfuscating execution of application on virtual machine
US20220107827A1 (en) Applying security mitigation measures for stack corruption exploitation in intermediate code files
Sun et al. Blender: Self-randomizing address space layout for android apps
WO2016201853A1 (zh) 加解密功能的实现方法、装置及服务器
Mäki et al. Interface diversification in IoT operating systems
Sabanal Hiding behind ART
Kilic et al. Blind format string attacks
CN110597496B (zh) 应用程序的字节码文件获取方法及装置
Yang et al. How to make information-flow analysis based defense ineffective: an ART behavior-mask attack
RU2815242C1 (ru) Способ и система перехвата .net вызовов посредством патчей на промежуточном языке
Jiang et al. A code protection scheme via inline hooking for Android applications
Berlakovich et al. Look ma, no constants: Practical constant blinding in GraalVM
Pridgen Exploiting Generational Garbage Collection: Using Data Remnants to Improve Memory Analysis and Digital Forensics
Rauti Interface Diversification as a Software Security Mechanism–Benefits and Challenges
WO2022044021A1 (en) Exploit prevention based on generation of random chaotic execution context
Javaji FIREFOX ADD-ON FOR METAMORPHIC JAVASCRIPT MALWARE DETECTION

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12885210

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2012885210

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2012885210

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 14432202

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE