CN106462677B

CN106462677B - Method and device for protecting software project

Info

Publication number: CN106462677B
Application number: CN201580028800.7A
Authority: CN
Inventors: Y.古; H.约翰逊; Y.埃夫特卡里; B.西斯塔尼; R.杜兰德
Original assignee: Ai Dide Technology Co Ltd
Current assignee: Ai Dide Technology Co Ltd
Priority date: 2014-03-31
Filing date: 2015-03-31
Publication date: 2020-01-10
Anticipated expiration: 2035-03-31
Also published as: WO2015150376A1; US20170116396A1; EP3127029A1; GB201405755D0; US10409966B2; CN106462677A

Abstract

One method comprises the following steps: performing an optimization of the item of software in the first intermediate representation; the protection of the item of software is performed in a second intermediate representation different from the first intermediate representation.

Description

Method and device for protecting software project

Technical Field

The invention relates to a method and apparatus for providing security protection and performance optimization of software.

Background

There has been a substantial increase in the number of end-user computer devices for which programmers provide software in recent years, most of this increase being in the area of devices for mobile phones and mobile computing (including smart phones, tablet computers, etc.), but also in the area of more traditional models of desktop computers and computers embedded in other manufactured goods such as automobiles, televisions, etc. Most of the software provided to such devices is in the form of an application program commonly referred to as an "app," and this software may often be provided in the form of native (native) code, scripting languages such as JavaScript, and other languages such as Java.

Such software, as well as the data or content delivered to the user using the software, is often at risk of compromise if the software is not properly protected using various software protection techniques. For example, such techniques may be used to make it difficult for an attacker to extract encryption keys that may be used to gain unauthorized access to content such as video, audio, or other data types, and may be used to make it difficult for an attacker to duplicate software for unauthorized use on other devices.

However, the use of such software protection techniques may result in reduced software performance, such as reduced execution speed, increased amount of memory required to store the software on the user device, or increased memory required for execution. Such software protection techniques may also be difficult to apply across a wide range of different software types, such as pre-existing software written in different source code languages or existing in a particular native code format.

It would be desirable to be able to provide protection for a software project against attacks and provide such protection across a range of software representations, such as different source code languages and native code types, while also maintaining a good level of performance of the software on the end-user device. It would also be desirable to deliver software properly protected in this manner for use on a number of different platform types.

Disclosure of Invention

The present invention provides a unified security framework in which the advantages of software tools in a first set that are used for translation between representations, for optimization, compilation, etc., are combined with the advantages of software tools in a second set that are used for software protection. In one example, the software tools in the first set can be tools of LLVM items, which generally operate using LLVM intermediate representations. However, tools from other sets that operate using other intermediate representations may be used, such as tools from the Microsoft common language infrastructure, which typically uses common intermediate language CIL. In the following, the intermediate representation used by the software tools in the first set will be denoted as first intermediate representation. Note that the software tools in the first set may also include tools for software protection, such as a binary overwrite protection tool.

The intermediate representation is a software representation that is neither originally intended for execution on the end-user device nor originally intended for use by a software engineer in constructing the original source code, but such activities are of course possible in principle. In the examples of the invention described below, neither raw software input to the unified security framework nor transformed software output is casted in the intermediate representation for use on the end-user device.

The software tools in the second set of tools use different intermediate representations that are generally more suitable or originally intended for use by software tools that apply security protection transformations to items of software processed through the unified security framework. This intermediate representation is generally denoted as second intermediate representation below and is different from the first intermediate representation. The second intermediate representation may be designed in such a way that: so that source code in languages such as C and C + + can be readily converted by appropriate conversion tools into a second intermediate representation and source code in the same or similar language can be readily reconstructed therefrom.

More generally, the present invention provides a unified security framework in which software tools for applying security transformations to items of software are provided such that a plurality of security transformation steps can be performed on items of software in a plurality of different intermediate representations, e.g., in succession. The unified security framework may also provide a software tool for applying optimization transformations to the item of software such that multiple optimization transformation steps may be performed on the item of software in multiple different intermediate representations, e.g., in succession.

The present invention can be used to accept an input software item in any input language or native code/binary representation for optimization and protection, and output the protected and optimized software item in various forms, including any desired native code/binary representation, JavaScript or a subset of JavaScript, and the like. In some embodiments, for example, the input representation of a particular binary may be the same as the output representation, thereby performing optimizations and protections on an existing binary software project.

To this end, the invention provides a method comprising performing an optimization of an item of software in a first intermediate representation and performing a protection of the item of software in a second intermediate representation different from the first intermediate representation.

The optimization in the first intermediate representation may be performed both before and after performing the protection in the second intermediate representation, and the method may therefore comprise converting the item of software from the first intermediate representation to the second intermediate representation after performing the optimization a first time and before subsequently performing the protection, and converting the item of software from the second intermediate representation to the first intermediate representation after performing the protection and before subsequently performing the optimization a second time.

Similarly, the protection in the second intermediate representation may be performed both before and after performing the optimization in the first intermediate representation, and the method may therefore comprise converting the item of software from the second intermediate representation to the first intermediate representation after performing the protection for the first time and before subsequently performing the optimization, and converting the item of software from the first intermediate representation to the second intermediate representation after performing the optimization and before subsequently performing the protection for the second time.

The steps of protection and optimization in the relevant intermediate representation may be performed alternately any number of times, starting with protection or optimization and continuing with one or more other steps in an alternating manner.

As mentioned above, the first intermediate representation may be the LLVM intermediate representation LLVM IR, but other intermediate representations may be used, such as Microsoft CIL.

More generally, the invention may provide for performing optimization of an item of software using optimization steps performed in one or more intermediate representations, and performing protection of the item of software using protection steps in one or more intermediate representations, some or all of which may be the same as or different from the intermediate representations used to perform the optimization.

The optimization of the item of software may include various types of optimization, such as optimization for one or more of size, runtime speed, and runtime memory requirements of the item of software. Techniques to implement such optimizations may include vectoring, idle time, constant propagation, garbage assignment elimination, inline expansion, reachability analysis, protection break normal (protection normal) and other optimizations.

The protection of the item of software in the second intermediate representation comprises applying one or more protection techniques to the item of software, in particular security protection techniques that protect program and/or data aspects of the software from attacks. Such techniques may include, for example, white-box protection techniques, node locking techniques, data flow obfuscation, control flow obfuscation and transformation, homomorphic data transformation, key hiding, program interlocks, boundary blending, and others. The techniques used may be combined together in various ways to form one or more tools, such as a disguising (cloaking) engine as part of a set of optimization and protection tools.

The item of software is provided in an input representation that is generally different from both the first and second intermediate representations. The method may thus involve converting the item of software from the input representation to the first intermediate representation before performing the optimization and generally also before performing the protection mentioned above. In some embodiments, the item of software in the input representation is converted into a second intermediate representation and then from the second intermediate representation before the first optimization and optionally also before the protection is performed.

The input representation may be a source code representation such as C, C + +, Objective-C, Java, JavaScript, C #, Ada, Fortran, ActionScript, GLSL, Haskell, Julia, Python, Ruby, and Rust. However, the input representation may alternatively be a native code representation, e.g., a native code (i.e., binary code) representation for a particular processor family, such as any of the x86, x86-64, ARM, SPARC, PowerPC, MIPS, and m68k processor families. The input representation may also be a Hardware Description Language (HDL). As is well known, HDL is a computer programming language used to program the structure, design, and operation of electronic circuits. The HDL may be, for example, VHDL or Verilog, but it will be appreciated that many other HDLs exist and may alternatively be used in embodiments of the present invention. Since HDLs (and their uses and implementations) are well known, they will not be described in further detail herein, however, more details can be found, for example, at http:// en. wikipedia. org/wiki/Hardware _ description _ language, the entire disclosure of which is incorporated herein by reference.

When the above optimization and protection processes have been performed, the item of software may be converted into an output representation. This processing phase may also include further optimization and/or protection phases. In some embodiments, converting the item of software to an output representation includes compiling (and typically also linking) the item of software into an output representation, such as a native code representation. Further binary protection techniques may then be applied to the software project after compilation and linking.

After compilation, the item of software may first be converted from the first intermediate representation to the second intermediate representation and onto a source code representation that is passed to the compiler, or the item of software may be passed directly to the compiler in the first intermediate representation. In the first case, a compiler operating on the source code representation, such as a C/C + + compiler, may be used. In the second case, if the first intermediate representation is LLVM IR, an LLVM compiler may be used. In any case, the compiler may be an optimization compiler to provide a further level of optimization to the protected software item.

Converting the item of software to the output representation may also include applying a binary overwrite protection tool to the item of software in the first intermediate representation prior to compilation, and/or may apply such a tool at other times in the process.

Instead of compiling the software project into a native code representation, the software project may alternatively be converted into a script representation and in particular into a script representation that can be executed on the end-user device. Conveniently, JavaScript representations may be used for this purpose, as such scripts may be executed directly by a web browser on the end-user device. More particularly, an asm.js representation that is a subset of JavaScript may be used, as asm.js is suitable for particularly efficient execution on end-user devices. For example, if the first intermediate representation is LLVM IR, then the emscript tool can be used to convert the item of software from the first intermediate representation to an asm.

If the input representation is a hardware description language, the output representation may generally describe a corresponding representation of the electronic circuit at a more hardware-oriented level, such as in a netlist. Where processing aspects such as compilation and linking are described herein, skilled artisans will appreciate that when the present invention is used with HDL input representations, equivalent steps such as synthesis using appropriate tools may be used, and appropriate software tools suitable for HDL work may be used for the protection and optimization aspects of the present invention. The output software item is then a description of the electronic system with appropriate obfuscation/protection and optimization steps applied.

The items of software may be any of a variety of items of software, such as applications for execution on user devices, libraries, modules, agents, and the like. In particular, the item of software may be a secure item of software, such as a library, module or agent, containing software for implementing secure functions such as encryption/decryption and digital rights management functions. The method may be applied to two such items of software, and one of these items of software may use functionality in the other, for example by way of a procedure call or other reference. Similarly, a software item optimized and protected according to the present invention may utilize or invoke security-related or protected functionality in a lower layer, such as the system layer or the hardware layer. Similarly, items of software may describe electronic systems and are provided in HDL for input to embodiments of the present invention.

The present invention also provides a method of protecting a software project, comprising applying one or more protection techniques to the software project and optimizing the software project using one or more LLVM tools, and this aspect of the invention can be combined with various options mentioned elsewhere herein. For example, one or more protection techniques can be applied to the item of software using a protection component arranged to operate using an intermediate representation different from the LLVM intermediate representation, and the method can further include using the LLVM tool to transition the item of software between the one or more representations and the LLVM intermediate representation. The method may be used to output a protected and optimized item of software in one of asm.

After processing of the item of software as discussed above, the item of software may be delivered to one or more user devices for execution. The software items may be delivered to the user device in various ways, such as over a wired, optical, or wireless network, using a computer-readable medium, and in other ways.

Software for providing the methods and apparatus in question may be provided over a network or otherwise on one or more computer-readable media for execution on suitable computer apparatus (e.g., a computer device or devices including memory and one or more processors) in combination with suitable input and output facilities to enable an operator to control the apparatus such as a keyboard, mouse, and screen, along with persistent storage for storing computer program code to cause the invention to be practiced on the apparatus.

The invention may thus also provide a computer apparatus for protecting an item of software, comprising an optimizer component arranged to perform optimization of the item of software in a first intermediate representation, such as LLVMIR, and a protector component arranged to perform protection of the item of software in a second intermediate representation.

The apparatus may be arranged such that the optimizer component performs optimization of the item of software in the first intermediate representation both before and after the protector component performs protection of the item of software in the second intermediate representation.

The optimization component can include one or more LLVM optimization tools.

The protection component may be arranged to apply one or more protection techniques to the item of software, including one or more of white-box protection techniques, node locking techniques, data flow obfuscation, control flow obfuscation and transformation, homomorphic data transformation, key hiding, program interlocks, and boundary blending.

The apparatus may further comprise an input converter arranged to convert the item of software from an input representation to LLVM IR, and the input representation may be one of a binary or native code representation, a byte code representation, and a source code representation. The apparatus may further comprise a compiler and linker arranged to output the optimized and protected item of software as binary code and an output converter arranged to output the optimized and protected item of software as asm.

The present invention also provides a unified masquerading toolset that includes a protection component, an optimizer component, and one or more converters for converting between intermediate representations used by the protection component and the optimizer component. The optimizer component can include one or more LLVM optimizer tools, and the unified masquerading toolset can include one or more LLVM front-end tools for converting from the input representation to the LLVM intermediate representation. In some embodiments of the unified masquerading toolset, a protection component and/or optimizer component may be provided to apply transformations to the software project in more than one intermediate representation.

The unified masquerading toolset may also implement various other aspects of the embodiments as set forth herein, for example, with protection components that implement one or more of the following techniques: white-box protection techniques, node locking techniques, data flow obfuscation, control flow obfuscation and transformation, homomorphic data transformation, key hiding, program interlocking, and boundary blending; the unified masquerading toolset further comprises a compiler and a linker arranged to compile and link to a representation of the native code; and the unified masquerading toolset further comprises an output converter for converting to an output representation that is a subset of JavaScript.

The present invention also provides one or more items of software that have been optimized and protected using the methods and/or apparatus, and such items of software may be provided, stored or transmitted in computer memory, on computer-readable media, over telecommunications or computer networks, and in other ways.

Drawings

Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an example of a computer system including an optimization and protection toolset 40 in accordance with the present invention;

FIG. 2 illustrates an embodiment of the optimization and protection toolset 40 of FIG. 1 in more detail;

FIG. 3 provides a flow chart of a method embodiment of the present invention;

FIG. 4 illustrates a workflow that may be implemented by the optimization and protection toolset 40 of FIG. 2;

FIG. 5 illustrates a workflow similar to that of FIG. 4, but within which an input software item represented in source code is converted to LLVM IR using LLVM front-end tools;

FIG. 6 is similar to FIG. 5, but with the input items of software represented in binary or native code;

FIG. 7 illustrates a workflow similar to that of FIGS. 4 through 6, but within which binary overwrite protection of the software item in a first intermediate representation is implemented using a LLVM compiler middle layer tool;

FIG. 8 illustrates a workflow that may be implemented using the optimization and protection toolset of FIG. 2, where the output representation is asm. js or other executable script representation;

FIG. 9 schematically illustrates the optimization and protection toolset of FIG. 2 with some further variations and details;

FIG. 10 illustrates how the arrangement of FIG. 2 can be extended to use a larger number of intermediate representations and to apply optimization and/or protection in different ones of these intermediate representations;

FIG. 11 illustrates the processing of software items such as security libraries, modules, and agents through an optimization and protection toolset;

FIG. 12 is a flow chart that schematically illustrates a method of structure protection, in accordance with an embodiment of the present invention;

FIG. 13 schematically illustrates an example dictionary tree (trie); and

fig. 14 schematically illustrates a protected structure in the form of a dictionary tree.

Detailed Description

In the following description and in the drawings, certain embodiments of the invention are described. However, it will be appreciated that the invention is not limited to the described embodiments and that certain embodiments may not include all of the features described below. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims.

Referring now to FIG. 1, there is illustrated an exemplary computer system 10 within which the present invention may be put into practice. The item of software 12 is provided, for example, by a server 14 at which the item of software 12 has been previously stored. The item of software 12 may be intended for a variety of different purposes, but in the system of FIG. 1 it is an application program (sometimes referred to as an app, depending on aspects such as how the application program is delivered and how it is used in the context of the user device and the broader operating environment) that is intended for execution and use on one or more of the plurality of user computers 20. The user computer 20 may be a personal computer, a smart phone, a tablet computer, or any other suitable user device. Typically, such user equipment 20 will include an operating system 24 that provides services to other software entities running on the user equipment, such as a web browser 22. The item of software 12 may be delivered to the user device in various forms, but typically it may be in the form of native executable code, generic underlying code such as Java byte code, or a scripting language such as Java script. Typically, the generic underlying code or scripting language software item 12 will be executed within or under the direct control of the web browser 22. Software items 12 in native executable code are more likely to execute under the direct control of the operating system 24, but certain types of native code, such as Google NaCI and PNaCI, are executed within a web browser environment.

The item of software 12 of FIG. 1 may typically be delivered to one or more user devices by a remote web server 30 over a data network 28 (such as the Internet), although other delivery and installation arrangements may be used. The illustrated network server or one or more other servers may also provide data, support, digital rights management, and/or other services 32 to user device 20, and in particular to item of software 12 executing on user device 20.

The item of software 12 may be vulnerable and compromised in various ways on the user devices 20, whether before, during, or after execution on those devices 20. For example, a software item may implement a digital rights management technique that an attacker may attempt to compromise, for example, by extracting an encryption key or details of an algorithm (which may enable future circumvention of the digital rights management technique for that particular software item, for particular digital content, etc.).

The system 10 thus also provides an optimization and protection toolset 40 that is used to optimize and protect the item of software 12 prior to delivery of the item of software 12 to the user device 20. In FIG. 1, the optimization and protection toolset 40 acts on the software project 12 before the software project 12 is delivered to the web server 20, but it may be implemented in the server 14, the web server 30, in a development environment (not shown), or elsewhere. The optimization and protection toolset 40 in fig. 1 is shown executing on a suitable computer device 42 under the control of an operating system 43. The computer device 42 will typically include one or more processors 44 that execute the software code of the optimization and protection toolset 40 using memory 46 under the control of a user through an input/output facility 50. The functionality of computer device 42 and optimization and protection toolset 40 may be distributed across multiple computer units connected by appropriate data network connections. Some or all of the software used to provide the optimization and protection toolset 40 may be stored in non-volatile storage 48 and/or in one or more computer-readable media and/or transmitted to the computer device 42 over a data network.

Note that the item of software 12 to be optimized and protected by aspects of the present invention can also be a component for use with or by another item of software, such as an application program. To this end, the item of software 12 may be, for example, a library, a module, an agent, or the like.

An exemplary embodiment of an optimization and protection toolset 40 is schematically illustrated in fig. 2. The optimization and protection toolset 40 includes an optimizer component 100 and a protector component 110. The optimizer component 100 is adapted to implement optimization techniques on the software project 12. The optimizer component 100 is configured to implement such techniques in the first intermediate representation IR1 such that the item of software 12 needs to be presented as the first intermediate representation IR1 before the optimizer component 100 performs optimization of the item of software. The protector component 110 is adapted to implement protection techniques on the item of software 12. The protection component is configured to implement such techniques in the second intermediate representation IR2 such that the item of software 12 needs to be rendered into the second intermediate representation before the protector component 110 performs protection of the item of software 12. The first and second intermediate representations are mutually different representations. Typically, the protector component 110 is unable to operate on the item of software when the item of software is in the first intermediate representation, and the optimizer component is unable to operate on the item of software when the item of software is in the second intermediate representation.

Each of the optimizer component 100 and protector component 110 is implemented as a plurality of

subcomponents

102, 112 in the optimization and protection toolset 40. Sub-components of a particular component may provide functionality that is different and/or duplicated with respect to each other, e.g., such that the overall role of the component may be distributed in various ways within the software of optimization and protection toolset 40.

The optimization and protection toolset 40 also provides a plurality of translators adapted to translate the item of software 12 from one representation to another. These converters include a first converter component 120 arranged to convert the item of software from a first intermediate representation IR1 used by the optimizer component 100 to a second intermediate representation IR2 used by the protector component 110 and a second converter component 122 arranged to convert the item of software from a second intermediate representation IR2 used by the protector component 110 to a first intermediate representation IR1 used by the optimizer component 100. Of course, the first and

second translator components

120, 122 may be combined in a single functional software unit (such as a single module, executable, or object-oriented approach), if desired.

The software project 12 is provided to the optimization and protection toolset 40 in an input representation Ri. This input representation may be one of any number of different representations, e.g. either the first and second intermediate representations IR1, IR2 or another representation such as a source code representation, a binary code representation or the like.

Similarly, the item of software 12 is output from the optimization and protection toolset 40 in an output representation Ro. The output representation may also be one of any number of different representations, e.g. any of the first and second intermediate representations IR1, IR2, or another representation such as a source code representation, a binary code representation, etc.

The optimization and protection toolset 40 may also include one or more other components, each arranged to operate on the items of software 12 in a particular representation. Such components may include, for example, a binary protection component 130 that provides a binary protection tool arranged to operate on the item of software 12 in the binary representation Rb, a binary overwrite protection component 135 that provides a binary overwrite protection tool arranged to operate on the item of software 12 in the binary representation or some other representation, such as the first intermediate representation.

In addition to the first and

second translator components

120, 122, the optimization and protection toolset 40 is therefore provided with other translator components 124, also shown in FIG. 2 as X3 … … Xn, which are used to translate the software item 12 between various representations as desired. For example, one

such converter component

124, 126 may convert from the C/C + + source code representation to the second intermediate representation IR2, and another such converter component may convert from the second intermediate representation IR2 back to the C/C + + source code representation.

FIG. 2 also illustrates one or more compilers or compiler and linker components 140 that are part of the optimization and protection toolset 40, which may be used, for example, to compile and link the item of software 12 to generally convert the item of software 12 into a native or binary code representation or another suitable target representation.

Examples of source code representations that may be used for the input representation Ri and other representations within the optimization and protection toolset 40 include C, C + +, Objective-C, C #, Java, JavaScript, Ada, Fortran, ActionScript, GLSL, Haskelll, Julia, Python, Rubu, and Rust, although many others will be known to the skilled person. The input representation Ri may alternatively be a native or binary code, a byte code, etc. or possibly one of the first and second intermediate representations.

Examples of representations that may be used to output a representation Ro include native code representations for direct execution on a user device, including native code representations such as PNaCI and NaCI suitable for execution under control of a web browser, byte code representations such as Java byte code, representations such as Java source code suitable for interpreted execution or runtime compilation, script representations such as JavaScript and a subset of JavaScript such as asm.

The first intermediate representation IR1 may generally be selected as an intermediate representation that is convenient, suitable for, or otherwise selected for use in performing optimization techniques. In particular, the first intermediate representation may be LLVM IR (LLVM intermediate representation). LLVM projects, known to the skilled person and discussed for example at the LLVM website "http:// LLVM.

(i) Introducing a well-specified generic intermediate representation (LLVMIR) that supports language independent instruction sets and type systems;

(ii) a middle layer of a complete compiler system and infrastructure is provided that takes the items of software in LLVM IR and launches highly optimized versions of the items of software 12 in LLVM IR that are ready for compile-time, link-time, runtime, and "idle-time" optimization of written programs with a wide range of source code representations;

(iii) support rich LLVM front-end tools for source code and other representations including not only C and C + +, but also other popular programming languages, such as the source code language mentioned above, as well as Java byte code, etc.;

(iv) with a set of LLVM backend tools, many other popular platforms and systems are currently supported, and more mobile platforms will be supported in the near future; and

(v) work with OpenGL and low-end and high-end GPUs.

Other representations suitable for use as the first intermediate representation include Microsoft Common Intermediate Language (CIL).

The second intermediate representation IR2 may generally be selected as an intermediate representation that is convenient, suitable, or otherwise selected for use in performing protection techniques. The second intermediate representation may for example be designed and implemented in such a way that source code in a specific language, such as C and C + +, may be easily converted into the second intermediate representation, and such that source code in the same or similar language may be easily constituted by the second intermediate representation.

The optimization techniques performed by the optimizer may include techniques to increase the execution speed of the item of software 12, reduce execution idle time, reduce memory required for storage and/or execution of the item of software 12, increase use of cores or GPUs, and the like. These and other optimization functions are conveniently provided by LLVM projects. Techniques to implement such optimizations may include vectoring, idle time, constant propagation, garbage allocation elimination, inline deployment, reachability analysis, normal protection breaches, and other optimizations.

The purpose of the protector component 110 is to protect the functionality or data processing of the item of software 12 and/or to protect data used or processed by the item of software 12. This may be achieved by applying disguise techniques such as homomorphic data transformations, control flow transformations, white-box cryptography, key hiding, program interlocks, and border blending.

In particular, the item of software 12 after being processed by the protector component 110 will provide the same functionality or data processing as before such processing — however, this functionality or data processing is typically implemented in the protected item of software 12 in such a way that an operator of the user device 20 cannot access or use this functionality or data processing from the item of software 12 in an unplanned or unauthorized manner (whereas if the user device 20 is provided with an unprotected form of the item of software 12, the operator of the user device 20 may be able to access or use this functionality or data processing in an unplanned or unauthorized manner). Similarly, the item of software 12, after being processed by the protector component 110, can store secret information (such as cryptographic keys) in a protected or obfuscated manner to thereby make it more difficult, if not impossible, for an attacker to infer or access the secret information (whereas if the user device 20 is provided with the item of software 12 in an unprotected form, the operator of the user device 20 may be able to infer or access the secret information).

For example:

the item of software 12 can include a decision (e.g., a decision block or branch point) based at least in part on one or more data items to be processed by the item of software 12. If the item of software 12 is provided to the user device 20 in an unprotected form, an attacker may be able to force the item of software 12 to execute such that an execution path is followed after the decision is processed, even if the execution path is not intended to be followed. For example, the decision may comprise testing whether the program variable B is true or false, and the item of software 12 may be arranged such that if the decision is made that B is true, the execution path P is followed/executed_TAnd if the decision criterion is that B is false, the execution path P is followed/executed_F. In this case, the attacker may force the item of software 12 to follow path P (e.g. by using a debugger) if it is decided that identification B is true_FAnd/or forcing the item of software 12 to follow the path P in case it is decided that B is identified as false_T. Thus, in some embodiments, the protector component 110 is intended to prevent (or at least make more difficult for) an attacker to do so by applying one or more software protection techniques to the decisions within the item of software 12.

The item of software 12 may include one or more of security-related functions, access control functions, cryptographic functions, and rights management functions. Such functions often involve the use of secret data, such as one or more cryptographic keys. Processing may involve using and/or operating on or with one or more cryptographic keys. If an attacker is able to identify or determine secret data, a security breach has occurred and control or management of data (such as audio and/or video content) protected by the secret data may be circumvented. Thus, in some embodiments, the protector component 110 is intended to prevent (or at least make more difficult for) an attacker to identify or determine one or more secret data pieces by applying one or more software protection techniques to such functionality within the item of software 12.

A "white-box" environment is an execution environment for a software item, where it is assumed that an attacker of the software item has full access to manipulated data (including intermediate values), memory content, and execution/process flows of the software item, as well as visibility. Furthermore, in a white-box environment, it is assumed that an attacker can modify the execution/process flow of manipulated data, memory contents and software items, for example by using a debugger-in this way, the attacker can experiment on the software item and try to manipulate its operation with the aim of circumventing originally intended functions and/or identifying secret information and/or for other purposes.

In fact, it may even be assumed that the attacker knows the underlying algorithm executed by the item of software. However, the item of software may require the use of secret information (e.g., one or more cryptographic keys), where this information needs to remain hidden from the attacker. Similarly, it would be desirable to prevent an attacker from modifying the execution/control flow of an item of software, for example to prevent the attacker from forcing the item of software to take an execution path after a decision block rather than a legitimate execution path. There are many techniques, referred to herein as "white-box obfuscation techniques," for transforming the item of software 12 so that it is resistant to white-box attacks. In S, Chow et al in Selected Areas in Cryptography, 9^thExamples of such White-box obfuscation techniques can be found in "White-Box-BoxCryptographiy and an AES augmentation" in Annual International Workshop, SAC 2002, feature Notes in Computer Science 2595 (2003), p250-270 and S. Chow et al in Digital rights management, ACM CCS-9 Workshop, DRM 2002, feature Notes in Computer Science2696 (2003), A White-box DES augmentation for DRM Applications "in p 1-15, the entire disclosure of which is incorporated herein by reference. Additional examples can be found in US 61/055,694 and WO 2009/140774, the entire disclosures of which are incorporated herein by reference. Certain white-box obfuscation techniques implement data stream obfuscation-see, e.g., US7,350,085, US7,397,916, US6,594,761 and US6,842,862, the entire disclosures of which are incorporated herein by reference. Certain white-box obfuscation techniques achieve control-flow obfuscation-see, for example, US6,779,114, US6,594,761, and US6,842,862, the entire disclosures of which are incorporated herein by reference. However, it will be appreciated that other white-box obfuscation techniques exist, and that embodiments of the invention may use any white-box obfuscation technique.

As another example, it may be that the item of software 12 may be intended to be provided to (or distributed to) and used by a particular user device 20 (or a particular group of user devices 20), and thus it is desirable to "lock" the item of software 12 to that particular user device 20, i.e., prevent the item of software 12 from executing on another user device 20. Accordingly, there are many techniques, referred to herein as "node-lock" protection techniques, for transforming the item of software 12 such that the protected item of software 12 may be executed on (or by) one or more predetermined/particular user devices 20, but will not be executed on other user devices. An example of such node locking techniques can be found in WO2012/126077, the entire disclosure of which is incorporated herein by reference. However, it will be appreciated that other node locking techniques exist, and that embodiments of the present invention may use any node locking technique.

Digital watermarking is a well-known technique. In particular, digital watermarking involves modifying an original digital object to produce a watermarked digital object. Modifications are made to embed or hide certain data, referred to as payload data, into the original digital object. The payload data may, for example, include data identifying ownership or other rights information for the digital object. The payload data may identify the (intended) recipient of the watermarked digital object, in which case the payload data is referred to as a digital fingerprint-such a digital watermark may be used to help track the origin of unauthorized copies of the digital object. A digital watermark may be applied to the software project. An example of such software watermarking techniques can be found in US7,395,433, the entire disclosure of which is incorporated herein by reference. However, it should be appreciated that other software watermarking techniques exist, and that embodiments of the present invention may use any software watermarking technique.

It may be desirable to provide different versions of the item of software 12 to different user devices 20. Different versions of the item of software 12 provide the same functionality for different user devices 20-however, different versions of the protected item of software 12 are programmed or implemented differently. This helps limit the impact of an attacker who successfully attacks the protected item of software 12. In particular, if an attacker successfully attacks a version of their protected item of software 12, the attack (or data discovered or accessed by the attack, such as a cryptographic key) may not be suitable for use with a different version of the protected item of software 12. Accordingly, there are many techniques for transforming the item of software 12 such that different protected versions of the item of software 12 are generated (i.e. such that "diversity" is introduced), referred to herein as "diversity" techniques. Examples of such diversity techniques can be found in WO2011/120123, the entire disclosure of which is incorporated herein by reference. However, it should be appreciated that other diversity techniques exist, and that embodiments of the present invention may use any diversity technique.

The white-box obfuscation technique, node locking technique, software watermarking technique and diversity technique mentioned above are examples of software protection techniques. It will be appreciated that there are other ways of applying protection to the item of software 12. Thus, the term "software protection technique" as used herein should be understood to mean any method of applying protection to the item of software 12 (intended to thwart or at least make it more difficult for an attacker to successfully make his attack), such as any of the above-mentioned white-box obfuscation techniques and/or any of the above-mentioned node-locking techniques and/or any of the above-mentioned software watermarking techniques and/or any of the above-mentioned diversity techniques.

There are many ways in which protector component 110 can be used to implement the software protection techniques mentioned above within software item 260. For example, to protect the item of software 12, the protector module 110 can modify one or more portions of code within the item of software 12 and/or can add or introduce one or more new portions of code to the item of software 220. The actual manner in which these modifications are made, or the actual manner in which the new code portions are written, may, of course, vary — after all, there are many ways in which software can be written to achieve the same functionality.

Binary protection component 130 is used to accept software item 12 in native or binary code or bytecode after being compiled by compiler and linker 140 and apply binary protection techniques such as integrity verification, anti-debugging, code encryption, secure loading, and secure storage. The binary protection component then typically repackages the item of software 12 into a fully protected binary having the necessary secure data that can be accessed and used during its loading and execution on the user device 20.

Thus, for a software project 12 in which a developer has access to all source code, the optimization and protection toolset 40 may be used to apply source code protection tools first to the source code of an application in a second intermediate representation using the protection component 112, and then apply binary protection to binaries that have been protected by using source code protection techniques. Applying such protection to the item of software 12 in both the source code and binary code domains results in a more effectively protected item of software 12.

FIG. 3 illustrates some of the workflows 200 that can be implemented using the optimization and protection toolset 40. The software project 12 is provided to input the representation Ri to the tool set. This representation may typically be a source code or binary code representation as discussed above. At step 205, the item of software is converted into a first intermediate representation. This may involve the use of a single transducer assembly 120-128 or two or more transducer assemblies. In general, the software item may be converted from the input representation Ri to the first intermediate representation directly or from the input representation Ri to the first intermediate representation via another representation such as a second intermediate representation.

The optimizer component 100 of FIG. 2 is then used at step 210 to optimize the items of software 12 in the first intermediate representation IR1, and then converted to a second intermediate representation IR2 at step S215 using the first converter 120 of FIG. 2. The protector component 110 of FIG. 2 is then used at step 220 to protect the item of software 12 in the second intermediate representation IR2, and then at step 225 is converted back to the first intermediate representation IR1 using the second converter 122 of FIG. 2.

The optimizer component 100 of FIG. 2 is then used again at step 230 to optimize the software items 12 in the first intermediate representation IR 1. Which may then undergo various aspects of further processing in step 235 before being output as an output representative of Ro. Aspects of the further processing may include one or more of compilation and linking, binary protection, conversion to other representations, and the like.

The dashed flow arrows in the figure indicate that after the second optimization step 230, the workflow 200 may return to step 215 for transition back to the second intermediate representation and one or more other steps of protection and optimization.

The workflow 200 of fig. 3 may be altered in different ways. For example, the item of software 12 may be optimized only once before or after the protecting step 220, and the step 235 of further processing may omit or include multiple steps. The protection or optimization may be performed before the other, and any number of other steps of optimization and protection may be performed. The conversion from the input representation Ri to the representation IR1 used for optimization may comprise a number of conversion steps, for example a conversion from Ri to IR2 followed by a conversion from IR2 to IR 1. Another processing step 235 may include other optimization and/or protection steps, such as a binary overwrite protection step.

A more specific example of how the optimization and protection toolset 40 of fig. 2 and workflows such as those of fig. 3 may be implemented will now be described. In these particular examples, the first intermediate representation is typically LLVM IR discussed above. This enables the present invention to extend the scope of native application protection for better performance and security, and also opens up new security possibilities for optimizing and protecting the much larger operating scope of the toolset 40.

It has become apparent to the inventors that there is a conflict between security and performance when preparing a software item 12 for distribution to a plurality of user devices 20. Generally, protected software introduces the required redundancy and overhead that will reduce the performance of the protected, and especially masquerading, form of the software. The more protection techniques that are applied to a software project, the more significant the impact on performance. Therefore, a balance of performance and safety is required.

Typical protection techniques may transform static program dependencies into partially static and partially dynamic dependencies. This completely prevents static attacks, which are usually easier to perform than dynamic attacks. However, it also introduces the limitation that these protection techniques may undermine certain optimization capabilities of analysis that rely on static correlation properties. Due to this limitation, protection and optimization strategies need to choose between less security/protection but better optimization and/or smaller program size, e.g., in terms of execution speed, and more security/protection but less optimization.

FIG. 4 illustrates a workflow that may be implemented using the optimization and protection toolset 40. The software project 12 is provided to the optimization and protection toolset 40 in an input representation Ri (which is the C/C + + source code representation Rc). This is passed to the toolset component group 300, which consists of the transformer X3 from representation Rc to the second intermediate representation IR2, the protector component 110, and the transformer X4 from the second intermediate representation IR2 back to the source code representation Rc. If LLVM optimization in the first intermediate representation is not to occur, the item of software 12 can be passed successively through each of these functions to protect the item of software 12 before being passed to the compiler, optimizer, and linker 140, and then onto the binary protection component 130 to output the item of software 12 in an output representation (which is the native/binary code representation Rb). A set of security libraries and agents 145 are also provided for use in compiling/linking the item of software 12 and for use by the binary protection component 130 if required.

The toolset component group 300 is supplemented by an optimizer component 100, which optimizer component 100 is shown here for purposes of illustration as a single sub-component 102 implementing one or more LLVM optimization tools, although multiple sub-components 102 can be used, e.g., different sub-components, multiple sub-components, or different combinations of sub-components at each stage of optimization. The item of software 12 is then converted from the second intermediate representation formed using the X3 converter 124 and/or as output by the protector component 110 in the toolset component group 300 to the first intermediate representation for use by the LLVM optimization tool using the X1 and X2 converters of fig. 2, and the item of software 12 is converted for protection by the protector component 110 and/or conversion by the X4 converter back to the Rc representation after being optimized by the LLVM optimization tool.

Some alternative workflow paths are illustrated in fig. 4 using dashed lines. For example, after processing by the protector component 110 and conversion to the IR1 representation, the item of software 12 can be sent directly to the compiler, optimizer and linker 140 without a second processing step by the optimizer component 100. Similarly, after the second processing step by the optimizer component 100, the item of software 12 can be sent directly to the compiler, optimizer and linker 140 without conversion by the X1 and X4 converters if the compiler, optimizer and linker 140 is capable of processing the input in the first intermediate representation.

The X1 and X2 converters thus provide a bridge between the domain of protection techniques provided by the protector component in the second intermediate representation and the domain of optimization techniques provided by the LLVM optimization tool in the first intermediate representation, thereby integrating these two operating regions of the optimization and protection toolset 40. The method also helps resolve conflicts between protection and optimization discussed above, as the optimization and protection toolset 40 can leverage the capabilities of the available LLVM optimization tools and techniques to provide optimization both before and after the protection techniques are applied by the protector component 110. By enabling optimization at multiple levels, it is possible to remove the restriction between security and performance, so that both better security and improved performance can be achieved for the same item of software 12.

FIG. 5 illustrates another workflow that may be implemented using the optimization and protection toolset 40. In this figure, the software project 12 is provided to the optimization and protection toolset 40 with an input representation as a source code representation Rs. The source code representation Rs may be, for example, Objective-C, Java, JavaScript, C #, Ada, Fortran, ActionScript, GLSL, Haskell, Julia, Python, Ruby, or Rust. The item of software 12 is passed to a translator X5 which translates the source code representation Rs into a first intermediate representation. The converter X5 can be provided as part of a set of LLVM front-end tools 320 that provide conversion from a variety of source code representations to LLVM IR. The item of software 12 now in LLVM IR can be passed to the optimizer component 100 for a first optimization step by the LLVM optimizer tool, or directly to an X1 converter (as shown in dashed lines) for conversion to a second intermediate representation before passing to the protector component 110. The rest of fig. 5 corresponds to fig. 4. Note that the toolset component group 300 of FIG. 5 is not shown as including an X3 converter, as it is not necessary in the workflow of FIG. 5, but may nonetheless be included in this group, if desired.

Since a very rich set of available LLVM front-end tools 320 can convert many different languages into LLVMIR and thus leverage LLVM compilation facilities for sophisticated analysis and better performance, these LLVM front-end tools can be used to extend the front-end capabilities of the optimization and protection toolset 40 to convert program source code in a large set of programming languages into a second intermediate representation via a first intermediate representation in which the protection techniques of the protector component 110 can be applied, as shown in fig. 5.

FIG. 6 illustrates another workflow that may be implemented using the optimization and protection toolset 40. In this figure, the software project 12 is provided to the optimization and protection toolset 40 with an input representation Ri as a native/binary representation Rb for execution on a particular platform or class of user equipment 20. The binary representation Rb may be any of the x86, x86-64, ARM, SPARC, PowerPC, MIPS, and m68k binary representations, for example. The item of software 12 is passed to a converter X6 which converts the binary representation Rb into a first intermediate representation. The converter X6 can be provided as part of a set of LLVM binary tools 330 that provide conversion from a variety of binary representations to LLVM IR. The rest of fig. 6 corresponds to fig. 4 and 5.

By using the LLVM binary tool in this manner, the item of software 12 in native/binary code can be converted to LLVM IR form before being converted in the second intermediate representation for input to the protector component 300 for applying a protection technique such as masquerading. If the output representation Ro is binary code for a different target platform than the input representation binary code, the optimization and protection toolset 40 can be readily used to achieve this output goal for the different target platforms simultaneously with the appropriate configuration of the compiler, optimizer and linker 140 and the protection techniques required by the application.

The LLVM compiler middle layer tool includes sophisticated program analysis capabilities, such as more accurate aliasing analysis, pointer overflow analysis, and dependency analysis, which can provide rich program properties and dependencies that can be used to transform a program for different purposes. The binary overwrite protection component 135 shown in FIG. 2 provides one or more binary overwrite protection tools that accept the item of software 12 in the LLVM IR, perform obfuscation transformations by utilizing program licensing functionality of the LLVM, and result in a more secure version of the item of software 12 in the LLVM IR.

The binary overwrite protection component 135 can enhance protection of the item of software 12 in a number of different ways, including independent binary overwrite protection, binary overwrite protection with a binary protection tool, and binary overwrite protection with both a source masquerading tool and a binary protection tool:

independent binary overwrite protection-generally, binary protection protects binary code in binary form, and some such protection techniques require work on the binary representation, such as integrity verification, secure loading, and dynamic code encryption. Also, binary protection may apply some kind of transformation if the required program information becomes available.

However, existing binary protection tools tend to have limited support for analysis capabilities, such that very limited binary transformations can be done directly in binary form. Alternatively, the binary overwrite protection tool may be adapted to act on the item of software 12 in an intermediate representation such as LLVM IR, where more sophisticated program analysis support may be utilized, applying many transformation techniques that cannot be readily applied directly to software in a binary representation.

In a standalone mode, the item of software 12 in the unprotected binary code representation is converted to LLVM IR using one or more LLVM binary tools 330, and then certain program transformations are applied to the item of software 12 by interacting with LLVM program analysis tools using binary overwrite protection component 135. The rewritten software items 12 in the LLVM IR are then converted into a protected binary code representation by using LLVM IR to binary converters, compilers, optimizers, and linkers or otherwise.

The binary overwrite protection with the binary protection tool, in this mode representing the software item 12 provided to the optimization and protection toolset 40 in binary code, can be obfuscated into a protected binary representation by using the binary overwrite protection component 135. The item of software 12 may then be protected by using a general binary protection tool such as that provided by the binary protection component 130 of FIG. 2.

Combining different protection layers in this manner by using both binary overwrite protection and binary protection results in a more secure item of software 12.

Binary overwrite protection with both source level protection and binary protection-generally protecting processing of a source code type representation such as the second intermediate representation discussed above-can provide more extensive and in-depth data flow and control flow protection. FIG. 7 illustrates this using a workflow similar to that of FIG. 6, wherein an LLVM binary tool is used to convert the software items 12 provided in binary representation to the optimization and protection toolset 40 into a first intermediate representation. Also in FIG. 7, the software item 12 output from the optimizer component 100 or alternatively directly from the translator X2 after the action of the protector component 112 is directed to a binary overwrite protection tool 135. Following operation of binary overwrite protection tool 135, the item of software 12 is then passed on to compiler, optimizer and linker 140, as previously described. The binary overwrite protection tool 135 is an example of an LLVM compiler middle tier tool 345 that can be used in this arrangement. As shown by the dashed lines in FIG. 7, the item of software 12 after the first optimization may alternatively be directed directly to a binary overwrite protection tool without the processing or second optimization stage of the protector component 112, or may be processed in such a way that the first or second optimization steps are omitted.

A web application is an application that uses a web browser as a client environment. Web applications are typically coded in a programming language supported by the browser (such as JavaScript), combined with a browser rendering markup language such as HTML, and rely on their host web browser to make it executable, "asm. Js supports computation similar to C, but since it is a subset of JavaScript, it runs correctly in any JavaScript-enabled web browser without requiring any further special support. The subset used by asm.js makes it easy to identify low-level operations using the usual method of type inference, "asm.js" relies on the extensions needed to support WebGL (buffers and type arrays, such as UInt32, INt 16, etc.) in order to support low-level structures, arrays, etc., but these are typically available in host web browsers. Js representation can be marked in JavaScript files using the "use asm" instruction. The host web browser can then ignore this instruction without explicit support for "asm. Js code can run at a greatly increased speed and efficiency compared to ordinary JavaScript if support is available in a web browser, typically through compilation of the asm.js code into native binary code representations.

Tools are provided in the prior art for converting source code representations such as C and C + + into asm. One such tool chain would consist of the Clang tool (see http:// company. LLVM. org) which converts C and C + + representations to LLVR IR, and the emscript tool (see https:// github. com/kripken/emscript) which converts LLVM IR to asm. js representations. Optimization can be implemented by applying the LLVM optimization tool as part of this tool chain before the application of the escripten tool.

Fig. 8 illustrates how the optimization and protection toolset 40 may be used to optimize and protect a software project 12 provided in the C/C + + source representation Rc and to export the software project 12 in asm. The workflow of fig. 8 follows a similar scheme to those of fig. 4 to 7.

According to a first workflow route illustrated with a thick dashed line, the software project 12 entered in the C/C + + representation Rc is passed to the toolset component group 300, where it is converted by the converter X3 into a second intermediate representation, then protected by the protection component 112, and then converted back to the C/C + + representation Rc. The protected item of software 12 is then passed to a Clang component 350, denoted X7, which converts the C/+ + source code representation Rc into a first intermediate representation IR1, typically LLVM IR. This representation is passed to LLVM optimizer 310, which forms part of optimizer component 102, and then to Emscriten component 360, denoted X8, which converts the first intermediate representation to asm.

According to a second workflow route, shown generally with solid lines, the software project 12 entered in the C/C + + representation Rc is first passed to a Clang component 350, denoted X7, which converts the C/+ + source code representation Rc into a first intermediate representation IR1, typically LLVM IR. This representation is then passed to the LLVM optimizer 310, which forms part of the optimizer component 102, and then to the first converter 122, denoted X1, for conversion to a second intermediate representation, typically passed to the protector component 112. After being processed by the protector component 112, the item of software 12 is passed to a second translator 120, denoted X2, for translation back to the first intermediate representation and then to the optimizer component 102 for a second optimization phase. Finally, the item of software 12 is passed to an Emscript component 360, denoted X8, which converts the first intermediate representation to an asm. Some alternatives within this workflow are shown with thin dashed lines, whereby the first or second optimization step can be omitted.

By implementing a C/C + + to asm.js conversion that includes protection and optimization using the optimization and protection toolset 40, a new software item 12, such as a web application, can be developed in C/C + + for delivery to user devices in asm.js, and existing software items 12 in C/C + + are also migrated to the protected and optimized asm.js representation. Since asm.js-enabled browsers can perform much stronger runtime optimization than if general JavaScript was used, the optimized and protected asm.js software item 12 can be run at high speed. In fact, tests conducted by the present invention have shown that a software item 12 written in C/C + + and processed using the optimization and protection toolset 40 as discussed above to form optimized and protected asm.js code can perform better than the corresponding software item 12 originally written in native code. This indicates good performance of the optimization used in the optimization and protection toolset 40.

While FIG. 8 illustrates the use of the optimization and protection toolset 40 to accept software items 12 entered in C or C + +, with subsequent steps of optimization and protection and final conversion to asm. js representing Ra as already discussed, other source code representations such as Object-C, Java, JavaScript, C #, etc. can be used for the input representation Ri by using a different LLVM front-end tool instead of the Clang tool 350 shown in FIG. 8. This opens up many new opportunities to migrate existing applications in languages other than C/C + + to web applications or to develop new web applications in those languages that may be made available for use in a browser environment.

Similarly, the workflow shown in FIG. 8 can be altered by replacing the Clang tool 350 with one or more LLVM binary tools 330 (as discussed in connection with FIG. 7) to accept the input software items 12 in the local chicken/binary representation Rb. A significant advantage of such a workflow is that existing items of software 12 in native code representation can be migrated to a web application to run in a browser environment (e.g., HTML 5) with enhanced security provided by the protection component 112 while maintaining performance, e.g., in terms of execution speed.

Fig. 9 again illustrates the optimization and protection toolset 40 already shown in fig. 2, but now replaces the workflow discussed in connection with fig. 3-8 with certain other specific details and aspects. For example, the optimization and protection toolset 40 shown in fig. 9 makes specific reference to using LVM IR as the first intermediate representation. Employing a technical framework such as LLVM may facilitate applying software protection capabilities oriented to, or originally written for, C/C + + source code structures or the like to the protection of software items 12 provided in a seven-day source code representation, binary code representation, or the like.

FIG. 9 thus shows that the software project 12 for input to the optimization and protection toolset 40 may take the C/C + + source code (representing Rc), another source code (representing Rs), or native/binary code (representing Rb). If the input software project 12 is in a C/C + + source code representation, it can be converted using an X3 converter to a second intermediate representation that is used by the protected component 112. All of the different representations of the input item of software 12 can be converted into a first intermediate representation, which is LLVM IR, using LLVM front-end/binary tools 320, 330.

The input software items 12 may then be processed in various ways by the elements of the unified toolset group 400. These components include a protector component 110 that operates on the item of software 12 in the second intermediate representation, a binary overwrite protection component 135 that operates on the item of software 12 in the LLVM intermediate representation, and an optimization component 102 that operates on the item of software 12 in the LLVM intermediate representation. The unified toolset group 400 also includes at least first and second X1,

X2 converters

122, 120 that convert between the LLVM intermediate representation and the second intermediate representation so that any component of the unified toolset group 400 can act on the software item 12.

After processing by the components of the unified toolset group 400, the item of software 12 may be passed to various components for further processing to form the item of software 12 in a relevant output representation. If passed from the unified toolset group 400 in the second intermediate representation, the software project 12 may be converted back to the C/C + + source code representation Rc using converter X4126 for compilation and linking by the C/C + + compiler and linker component 140-1. If passed from the unified toolset group 400 in the LLVM intermediate representation, the software project 12 can be compiled and linked by the LLVM compiler and linker 140-2. In both cases, the output from optimization and protection toolset 40 is then a software item 12 in the native/binary code representation Rb. Alternatively, the software items 12 may be passed from the unified toolset group 400 in the LLVM intermediate representation to the transformer X8 provided by the emscript tool 360, so that the output from the optimization and protection toolset 40 is then the software items 12 in the asm.

Using the optimization and protection toolset 40 of FIG. 9, a software item 12, such as an application or software module or library, can be protected using the same protector component 110 and toolset of masquerading and other techniques that can be implemented by the component 110, regardless of the language used to implement the software item. This can run in the native execution environment (including PNaCI) if the software item 12 is exported in native/binary code from the optimization and protection toolset 40, or in the web browser environment if exported in JavaScript or asm. This is achieved in the optimization and protection tool 40 of fig. 9 by operating the components of the unified toolset group 400 in two different intermediate representations, the protection component operating on the item of software 12 in the second intermediate representation, and at least the optimization component 100 operating on the item of software 12 in the LLVM intermediate representation.

The arrangements shown in fig. 2-9 utilize, for the most part, a first intermediate representation for performing optimization of the item of software and a second intermediate representation for performing optimization of the item of software. However, with reference to FIG. 10, more generally, embodiments of the invention may also use the first representation for protection of the executing item of software and/or the second representation for optimization of the executing item of software. In addition, while the arrangements of FIGS. 2-9 utilize two intermediate representations, embodiments of the invention may utilize three or more intermediate representations, each of which is used for one or both of optimization and protection of the item of software.

FIG. 10 is similar to FIG. 2, but illustrates how any number of intermediate representations IR1 … … IRNs may be used by optimization and protection toolset 40, each intermediate representation being used for one or both of protection and optimization. For example, in the arrangement of fig. 10, a first intermediate representation of IR1 is used by both the optimizer component 100-1 and the protector component 110-1, a second intermediate representation is used by the optimizer component 100-2 but not by any protector component, and a third intermediate representation is used by the protector component 110-3 but not by any optimizer component. As with fig. 2, each optimizer assembly may include one or more optimizer subassemblies (not shown in fig. 10), and each protector assembly may include one or more protector subassemblies (also not shown in fig. 10). These sub-components may perform any of the functions of optimization and protection as discussed above but within the scope of appropriate intermediate representations.

Note that while fig. 10 shows different functional protector and/or optimizer components for use with each different intermediate representation, it is also possible to have one or more of the protector and/or optimizer components operate within multiple different ones of the intermediate representations. While the components shown in FIG. 10 with respect to each intermediate representation are optimizer and/or protector components, components for performing other tasks and transformations on the software project may be provided for use in one or more of the intermediate representations.

The various intermediate representation IR1 … … IRNs may include LLVM IR as well as various other representations such as discussed above. In order to generally convert the item of software between the various intermediate representations IR1 … … IRN under various states of protection and/or optimization when using the toolset, an appropriate converter function 125 is provided. Converter function 125 may be implemented, for example, as a single library, class, tool, or other element or as a plurality of such elements, each of which performs one or more of the desired conversion types. Not all of the possible conversions between the various intermediate representations need to be provided all the time, and similarly, some conversions may be provided as a combination of two or more other conversions, e.g., by a more commonly used intermediate representation such as LLVM IR.

Also shown as part of the optimization and protection toolset 40 in fig. 10 are one or more binary rewrite tools 135, one or more binary protection tools 130, and one or more compiler and/or linker tools 140. Each of these may operate using one or more of the intermediate representations IR1 … … IRN or other representations, as required by the toolset 40.

The optimization and protection toolset 40 discussed above and shown in fig. 2, 9 and 10 may be used to protect software components, such as libraries, modules and agents, and applications, and all such software components fall within the scope of the software project 12. This is illustrated in FIG. 11, where various items of software 12, which may be security libraries, modules, agents, etc., are input to an optimization and protection toolset 40, which outputs these items of software 12 in a protected and optimized form. Any such item of software 12 may be output on demand in the native/binary code representation Rb and/or the asm. The arrow 420 connecting one or more of the optimized and protected items of software 12 in the asm.js representation with one or more of the optimized and protected items of software 12 in the native/binary code representation and each of these with the underlying system layer 430 and the other underlying hardware layer 440 represents that each of the asm.js, native and system layers can access and use features, such as security features, in each of the underlying levels in the hierarchy.

Generally, software components such as security libraries, modules and agents have their own security capabilities and features, and the robustness and security of these software components is critical in ensuring the security of the applications that use the software components internally or are used to make references or calls to the software components.

The optimization and protection toolset 40 and workflows described herein may thus be used to improve the security of such software components and thus applications that use such components internally.

Using aspects of the present invention, user device 20 may be provided with multiple security layers, including hardware level security features, system or operating system level security features, native layer security features, and network layer security features. Software components such as libraries, modules, and agents protected using the optimization and protection toolset 40 may provide access to hardware and system level security features that should not be made available to the network application layer. Since the optimization and protection toolset 40 can be used to create protected software components in both native code and JavaScript (including asm.js), it can be used to construct and support dependencies that call from protected software components that take JavaScript/asm.js to protected software components that take native code.

Exemplary protection techniques

Described below is one exemplary method/technique for applying protection to a software project (although as discussed above, it will be appreciated that many different protection techniques are available and may be used with embodiments of the present invention). This method shall be referred to herein as the "structure protection method". In certain embodiments of the present invention, the structural protection method is implemented/applied by the protector component 110 (or one of its subcomponents 112) of the toolset 40 mentioned above.

However, it will be appreciated that certain embodiments of the present invention do not utilize the toolset 40, and thus the present structural protection method may be implemented/applied by a different software protection system (executed by one or more processors of one or more data processing apparatuses).

FIG. 12 is a flow chart that schematically illustrates a method of structure protection, in accordance with an embodiment of the present invention.

The present structure protection method operates on the source code (i.e., the item of software that the present structure protection method modifies to apply protection is in source code format) -of course, as mentioned above, the original item of software that is not in source code format may be converted to source code format to apply the structure protection method. The present structural protection method is specifically contemplated to operate on JavaScript code, but it will be appreciated that the protection method may be implemented to operate on software written in other languages, such as C/C + +, source code, Visual Basic source code, Java source code, and the like. Thus, in general, the present structural protection method involves receiving an input item of source code, applying a protection technique (described below) to the source code input item, and outputting the protected source code item.

More particularly, the present structure protection method targets protecting a structured data item in source code, where the structured data item has independently modifiable components or fields. Examples of such structured data items are objects or classes or structures, etc. (which independently modifiable components are referred to as properties or elements) and arrays and lists (which independently modifiable components are indexed elements of an array or list). In the following, any such structured data item shall be referred to simply as a "structure" (although this should of course not be understood to mean that the embodiments are limited to protecting structures such as C/C + + structs only), and the independently modifiable components (or fields or elements or properties) of the structure shall be referred to simply as "elements" of the structure. An element of a structure may be another structure.

As will become apparent, the structure is protected by modifying how it is represented (in terms of its format/layout). The representation of the actual element itself (now stored within the modified structure) may also be modified. That is, the information in the structure is preserved, but its form/layout and representation is modified to make its analysis more challenging for an attacker.

Note that the structure may be used in a number of different ways-for example, a structure that is an array of two elements may be used to (a) represent the x and y coordinates of a point on a display or (b) represent the upper and lower bounds for a range for a variable/setting. It may therefore be desirable to be able to apply different protections (or levels and/or types of protection) to the same structure if the same structure is being used in a different manner in a software project.

Similarly, structures, when used for the same purpose, may require different protection (or levels and/or types of protection) depending on where the structure is located within the item of software or source code to be processed.

The structure protection method will be described below with reference to the following exemplary structure (but it will of course be appreciated that the structure protection method is applicable to other types of structures, and embodiments of the invention are not limited by this particular exemplary structure). This exemplary structure (shown below in pseudo-code) represents a record for data about employees of a company:

a user of the protection system may identify one or more structures within the source code to be protected. This may include, for example, identifying an employee record as the structure to be protected (since an instance of the employee record may contain data that is attractive to an attacker for attempted unauthorized access or modification). This may involve, for example, a user examining the source code and determining/discovering one or more such structures, or notifying a user that protection is required for any structure that involves or represents or contains certain data, etc.

Having identified one or more structures within the source code to be protected, a user of the protection system generates protection description information. This may be performed manually or may be performed in a fully or partially automated manner when one or more structures to be protected have been identified. In the following, two files or objects, referred to as KeyTemplates and DataTemplates (which may be provided in JSON form, for example), are applied to represent protection description information, but it will be appreciated that other ways of providing this information may be used, such that embodiments of the invention are not limited to the use of such KeyTemplates and DataTemplates objects/files, nor are embodiments of the invention limited to the particular formats of KeyTemplates and DataTemplates objects/files discussed below.

In summary, DataTemplates specify the initial/actual structure/convention of the (unprotected) structures to be protected, and potentially also what type and/or level of protection will apply to one or more elements of those structures, while KeyTemplates specify what type and/or level of protection applies to the structure/format/layout that protects those structures defined in the DataTemplates. Thus, the protection description information specifies or includes data identifying/indicating: (a) the initial/actual structure/format of the (unprotected) structure to be protected; (b) potentially what type and/or level of protection applies to the structure/format/layout of those structures; and (c) potentially, what type and/or level of protection is also applicable to one or more elements of those structures.

In some embodiments, the protection description information may be available to the user (which may have been previously generated, or may be provided by a third party, etc.), so the user need not go through the above-mentioned steps of identifying the structure and generating the protection description information — alternatively, the user may simply provide the protection description information to the protection system/component implementing the structure protection method.

Thus, generally, at step 1200, a system/component implementing the fabric protection method receives protection description information.

First go to KeyTemplates object/file (or specification). Both instances of the structure are similarly disguised/protected if they have the same "key". Here, a "key" may specify a series or type of protection or obfuscation (which may be represented by a string as an identifier in the source code language, such as a JavaScript identifier, as appropriate). Additionally or alternatively, the key may also specify a level of protection. For example, the key may be specified as the string 'boundaryProtection 5', which indicates that the name of the protection series is 'boundaryProtection' and the protection level is level 5. In other words, the key may identify or specify or indicate (a) a type of protection or obfuscation technique or a kind of encoding (in the above-mentioned example, the type is referred to as 'boundaryProtection') and/or (b) a level of protection or success based on the particular protection or obfuscation technique or kind of encoding (in the above-mentioned example, the level of protection is level 5). For example, level 1 of a given type of encoding may provide a linear wired string encoding for characters in the string, while level 10 may use a third degree of polynomial encoding for characters in the string that is slower to manipulate but makes it more difficult for an attacker to analyze. The skilled person will appreciate that there are many different types of protection for protecting large amounts of data, and that those protection types may be implemented with varying degrees of strength or complexity-therefore, such protection types and levels of protection should not be described in greater detail herein. It will be appreciated that for some embodiments the protection type may have only one "level", in which case the key may specify only the protection type and not the level. Similarly, some embodiments may use only one type of protection for which multiple levels may be available, in which case the key may specify only the level of protection and not the type. In the following, the key shall be represented as a string of the form "< protection type xprotection level >" (such as 'boundaryProtection'), but it will be appreciated that other ways of representing the key are possible.

The KeyTemplates object/file has one or more fields (or entries/properties). In the KeyTemplates object/file, each field has:

(a) a value, which is a key as set forth above, or

(b) A value having two components (e.g., an array of two elements), one of which identifies another field/entry/property in the KeyTemplates object/file, and the other of which is a key as set forth above.

As an example, the KeyTemplate object/file may be in the form of

Thus, with this particular KeyTemplates object/file:

there is a field of KeyTemplates object/file called EmployeeRecord indicating that an instance of the EmployeeRecord structure can be protected or obfuscated or encoded at level 10 to which the hrpiv type is applied;

there is a field of KeyTemplates object/file called EmployeeRecord high indicating that an instance of the EmployeeRecord structure can be protected or obfuscated or encoded by applying an hrpiv type thereto, but at level 15;

there is a field of KeyTemplates object/file called EmployeeRecordLow indicating that an instance of the EmployeeRecord structure can be protected or obfuscated or encoded at level 3 to which Basic type protection is applied.

Turning next to the DataTemplates object/file (or specification), as described, the purpose of the DataTemplates object/file is to specify the actual/initial (unprotected) structure or format/layout for the structure to be protected. These structures to be protected correspond to the structures for which the keys (i.e., protection types and/or protection levels) are specified in the KeyTemplates specification. The DataTemplates object/file may also specify, for one or more of the elements of the structure to be protected, a key for the element to specify or identify the type and/or level of protection to be applied to the element.

The DataTemplates object/file has one or more fields (or entries/properties). Since the DataTemplates object/file specifies structure/layout/format, there is a field in the DataTemplates object/file corresponding to each KeyTemplates field of type (a) as set forth above, i.e., a value that is a key. For example, for each KeyTemplates field of type (a) as set forth above, there may be a corresponding field in the DataTemplates object/file that has the same name as the KeyTemplates field. Note that if the KeyTemplates file/object has a field of type (b) as set forth above (i.e., a value having two components, one of which identifies another field/entry in the KeyTemplates object/file, the other of which is a key), then there need not be a corresponding field in the DataTemplates object/file, as the structure that such a field would have specified has already been specified for another field in the KeyTemplates object/file. This allows a given structure to have multiple entries in the KeyTemplates object/file so that different levels of protection and schemes can be associated with the structure depending on the context in which the structure is used. This also provides efficient storage/reference and efficient update/maintenance of DataTemplates and KeyTemplates objects/files.

Thus, continuing with the example mentioned above, wherein:

the DataTemplates object/file may also have a field called employee record, which may be as follows:

in particular, the value of each field in the DataTemplates object/file is a template, where the template for the structure to be defined/specified itself has one or more fields to specify the elements of the structure (i.e. its layout/format) (for one or more of these elements, possibly along with the key for that element, to specify or identify the type and/or level of protection to be applied to that element). In the above-mentioned example, the layout/format of the employee record structure is defined. In addition:

the elements empName, empID and managerID of the EmployeeRecord structure are described in the DataTemplates object/file with string values starting from 'c'. This indicates that the elements empName, empID, and managerID of the employee record structure are strings. The elements empName and empID also have an indication of the corresponding key, namely HRPriv 10. The element managerID has an indication of a different corresponding key, Basic 3.

The element, hourlyRate, of the EmployeeRecord structure is described in a DataTemplates object/file with a string value starting from 'N'.

This indicates that the element hourlyRate of the EmployeeRecord structure is a number. The element houriyRate also has an indication of the corresponding key, i.e. hrpivio. Elements of the structure that are numbers (as specified in the DataTemplates object/file with a string value starting from 'N') may be converted into strings when protection is applied, which strings may then be encoded in the same way as for elements of the structure that are strings (as specified in the DataTemplates object/file with a string value starting from 'c').

The element yearsAtCo of the EmployeeRecord structure is described in the DataTemplates object/file with a string value starting from 'K'. This indicates that the element yearsAtCo of the EmployeeRecord structure is an integer (e.g., one of the 32 bits fitted in the complement form of 2). The element yearsAtCo also has an indication of the corresponding key, HRPrivio. An element of the structure that is an integer (as specified in the DataTemplates object/file with a string value starting from 'K') may be encoded using lossless homomorphism encoding specified by the key for that element when protection is applied.

The elements regHours, ovtmHours and directreeports of the EmployeeRecord structure are described in the DataTemplates object/file with string values in the form of [ template, lowerSizeLimit, upperSizeLimit ] representing the array of elements, where each element of the array is described by a template, where the size of the array is between the lowerSizeLimit and upperSizeLimit elements, where lowerSizeLimit and upperSizeLimit are integers. If lowerSizeLimit is 0, this indicates that the array may be empty; if the uppersizeLimit is 0, this indicates that the array may be arbitrarily large. Thus, both regwaters and ovtmwaters are described, for example, in DataTemplates objects/files with string values [ xnhrpiv 5', 0, 52], indicating that they are (possibly empty) arrays of up to 52 elements, and that each element of those arrays is of the xnhrpiv 5' type, which, as mentioned above, indicates that they are numbers and that each number is to be protected with a protection type of hrpiv at a protection level of level 5. Similarly, directReports is described in the DataTemplates object/file with string values [ CBasic3', 0, 40] indicating that it is a (possibly empty) array of up to 40 elements, and that each element of the array is of the CBasic3' type, which, as mentioned above, indicates that they are strings and that each string is to be protected with a protection type of Basic at a protection level of level 3.

Other ways of specifying the type of element of the structure to be protected in DataTemplates may be used, and it will be appreciated that other element types may be used (depending of course on the source code language under consideration).

As mentioned above, embodiments of the present invention may utilize other ways to specify the initial/unprotected form (or format/layout/structure or particular element) of each structure to be protected, as well as the level and/or type of protection that is applied to the structure and element to be protected.

We turn next to how to modify the format/layout of the structure or more precisely how to represent the unprotected structure in a different format/layout according to the structure protection method.

Data structures known as dictionary trees are well known-see, for example, http:// en. wikipedia. org/wiki/Trie. A trie is a data structure used to quickly access a set of records based on keys (here, the term "key" is different from the "key" described above) or indices that may be naturally divided into parts (such as numbers or words). The trie may represent a tree of nodes in which the root node contains no content and each node's descendant contains or represents a "selection" (i.e., a selection of a child node moved from a parent node to the parent node by a key). A node may also indicate whether it is final (i.e., a leaf node), in which case it indicates (identifies or represents or stores) the corresponding record selected by the key/index, or it is not final, in which case it has one or more child nodes representing other selections.

FIG. 13 schematically illustrates an exemplary trie. In this dictionary tree, the keys/indices are words (in this case, words: a, it, in, map, mat, me). The links are marked with a selection of the selection alternative, in this case a letter (arrow). Nodes are marked with the cumulative part of the key/index indicated by the selection from the root node (circles). The number is the record/data stored by the dictionary tree (or may be an address for separately stored data or an index into a separately stored array of records holding data). Thus, the dictionary tree represents the mappings a → 7, it → 4, in → 8, map → 11, mat → 17, me → 5.

Some (optional) optimizations may be used to represent the dictionary tree:

the current label in the node can be removed because all the information it provides has been provided by the label on the arrow.

The marker on the arrow may then be moved to the (now unoccupied) node. The arrow need not actually be a separate label from the label on the node to which the arrow points.

These variations make the representation of nodes and arrows more compact while having no impact on the lookups that can be performed using the trie.

The format/layout of the structure to be protected may be represented as a dictionary tree. In particular, the root node of the trie may represent the structure itself, while other nodes of the trie may represent elements of the structure (i.e., non-root nodes may be or may represent properties, fields, elements, array indices, etc.).

The structure that has to be protected is represented as a dictionary tree and the structure protection method can then protect the structure by adapting/modifying the dictionary tree. In particular, the nodes of the dictionary tree may be relabeled (so that the examination of the dictionary tree by an attacker does not reveal any semantics of what the nodes represent).

Further, the trie may be modified/adjusted to include other nodes. For example, one or more paths of nodes within a trie may have one or more other nodes included (and potentially branches therefrom) such that the number of links from a root node to leaf nodes representing structural elements may be increased. Additionally or alternatively, properties of the nodes may be adjusted (e.g., instead of property nodes representing structures, the nodes may be changed to represent array indices). For example, a structure may have an element called "price" such that the trie then has a path for selecting one or more nodes of the element price — the structure protection method may then involve adjusting the trie such that to select the element price, the node path involves the root node, followed by selection of a node representing a property called Q790A (which may correspond to an array), then selection of a node representing an element from the array with an index of 7 (which may correspond to a structure), and then selection of a node representing a property of the structure called fT9_ x40 k.

Thus, for a structure to be protected, a protected/obfuscated trie may be generated (it will be appreciated that the trie representing the unprotected structure need not actually be generated and then modified-i.e., a "protected" trie may be generated directly based on the protection description information). In particular, given a root node, then for each element of the structure to be protected, the structure protection method may randomly select a step or node to insert into the trie (which may form one path within the trie or which may include one or more branches/paths within the trie). The minimum number of nodes to insert (randomly chosen) is one because one access step in the original unprotected structure cannot be smaller than one access step in the obfuscated trie. The random selection of the number of nodes may be based at least in part on the level of protection indicated in the key for the structure defined/specified in the KeyTemplates object/file-so that the higher the level of protection, the more likely the number of random selections will be higher, including more steps/nodes, and thus making it more difficult for an attacker to analyze the confusion/protection-i.e., the random number may be biased based on the level of protection. Similarly, the property name and array index may be randomly selected. In some embodiments, the number of different array indices should not greatly exceed the number of selections to be made for efficiency reasons (e.g., if there are N different elements to select, then no more than pN different indices are specified for the obfuscated dictionary tree, where p is a predetermined value, e.g., p ═ 2).

In some embodiments, for each element of the structure to be protected, an access path for a given node is selected within the trie independently of other elements of the structure (i.e., nodes/steps are added to the trie independently of nodes/steps present in the trie). However, this may result in a large size of the trie compared to the unprotected structure, which may be undesirable. Thus, in certain embodiments, the random selection of intermediate steps/nodes in the trie facilitates steps/nodes selected for neighboring elements having the same parent node in the labeled (unprotected) trie. For example, if (using the example mentioned above) we extend property access x.price for structure x with element "price

Where x represents the original structure and y represents the obfuscated structure, then for another element "product" in structure x, the structure protection method will tend to select a partial shared obfuscation for the x.product node, such as

Or with less sharing, it may be. The amount of "sharing" of a node and the decision as to whether or not to share may be selected randomly, again the random decision being based on/biased towards the level of protection (e.g., with a higher level of protection for the structure, there may be fewer shares, and with a lower level of protection for the structure, there may be more shares).

The manner in which nodes/steps are included in the dictionary can vary-for example, some methods may not include indices for the array (such as [3], [7], and [8] as described above), some methods may prefer to use a flatter but wider form for the dictionary tree, others may prefer to use a deeper form for the dictionary tree, and so on. These options may correspond to the type of protection indicated in the key for the structure as defined/specified in the KeyTemplates object/file.

To enable the obfuscated structure to be generated and used within the source code, various functions (or programs or routines) may be provided and made available. The structure protection method may thus include, at step 1202, including one or more of these functions within the source code based on the received protection description information, i.e., modifying the source code to use one or more of these functions. Examples of these functions are set forth below, but it will be appreciated that other functions may be provided, that not all of these functions need to be provided, and that other formulations/representations of these functions may alternatively be used. Then, at step 1204, the function may be converted into obfuscated code (based on the protection description information), as shall be described below. Exemplary functions include:

a function to create an obfuscated/protected structure from an unprotected structure (referred to herein as templateEncode), such as:

here, struct identifies the program variables (in the source code) for the instance of the unprotected structure to be protected;

the obfStruct identifies the program variable (in the source code) for the instance of the protected structure generated by the function templateEncode;

a templateName identifies or indicates an entry in the KeyTemplates object/file (to thereby identify/specify the type of encoding/protection and/or level of encoding/protection to be applied to the struct instance to obtain the obfStruct instance); and

tag is a value used to seed a random selection for generating the obfuscated structure and for performing protection on individual elements of the obfuscated structure.

Thus, for example, if x is a variable in the source code for an unprotected instance of the employee record structure, a corresponding protected/obfuscated version may be generated and represented in the source code by the variable y by including the following lines of code in the software project:

if a higher level of protection is desired, the following lines of code may alternatively be included:

the templateEncode function will generate the obfuscated/encoded structure instance using the dictionary tree approach mentioned above.

the templateEncode function will not require conversion at step 1204.

A function to create an unprotected structure from an obfuscated/protected structure (referred to herein as a templatedeode), such as:

here, the obfsstruct identifies the program variables (in the source code) for the instance of the protected structure that is not to be protected/decoded;

struct identifies the program variable (in the source code) for the instance of the unprotected structure generated by the function templateDecode;

a templateName identifies or indicates an entry in the KeyTemplates object/file (to thereby identify/specify the type of encoding/protection and/or level of encoding/protection that is applied when the protected bfstract structure was initially generated); and

tag is a value used to seed a random selection to generate an obfuscated structure and is used to perform protection on individual elements of the obfuscated structure.

Thus, for example, if y is a variable in the source code generated above via a function call for a protected/obfuscated instance of the employee record structure

The corresponding unprotected structure x can be (re) generated by including the following lines of code in the item of software:

the templateDecode function will not require a conversion at step 1204.

A function to access elements of the protected structure (referred to herein as templateacess), such as:

here, the obfsstruct identifies the program variables (in the source code) for the instance of the protected structure to be accessed;

tag is a value used to seed a random selection to generate an obfuscated structure and is used to perform protection on individual elements of the obfuscated structure. Providing access to the elements specified by the pure string path, translating the steps in that path into more complex steps selected by the encoding type and protection level specified for the templateName entry in KeyTemplates according to the particular random selection selected for a given tag.

For example, if access to the yearsAtCo element of the protected employee record structure is desired, the following lines of code may be included in the software project:

the returned element z may take an encoded form (if protection has been applied to the value of an element in the protected structure);

alternatively, the function templateacess may unprotect the protection applied to the element and thereby return the unprotected value of the accessed element.

A description of optindexArray is given briefly.

the templateAccess function requires a transition at step 1204. For example, the original source code may have originally had the following code:

. Instance x may have been converted to protected structure instance y via a call to the function templateEncode, as described above, such that the data for element price serves as the data for element price

But is accessible. At step 1202, the code will be replaced with a call to the function templateacess

E.g. of. It will then be replaced with code based on the protection description information at step 1204, where, as mentioned above, the nodes/steps in the dictionary tree (Q790A, [7]]And ft9_ x40 k) is based on the security specified by the key corresponding to the templateNameThe type and/or level of guard and based on the seed executed by the tag is randomly inserted and named through a call to the templateEncode function.

A function to update/set the value of an element of the protected structure (referred to herein as a template), such as:

here, the obfsstruct identifies the program variables (in the source code) for the instance of the protected structure to be accessed/updated;

tag is a value used to seed a random selection to generate an obfuscated structure and is used to perform protection on individual elements of the obfuscated structure. Providing access to the element specified by the pure string path, converting the steps in that path into more complex steps selected by the encoding type and protection level specified for the templateName entry in KeyTemplates according to the particular random selection selected for a given tag-the element will be updated by the element assignment exp.

For example, if an update of the yearsAtCo element of the protected employee record structure is desired (e.g., its value is set to 13), the following lines of code may be included in the software project:

)

when an obfuscated element is overwritten, it must be overwritten with an element having the same obfuscation, which means that the corresponding manipulation of the value exp is performed based on the corresponding key specified by the DataTemplates before storing the obfuscated value exp in the obfuscated trie.

A description of optindexArray is given below.

templateupdate functionAt step 1204 a transition is needed. For example, the original source code may have originally had the following code:

But is accessible. At step 1202, the code will be replaced with a call to the function templateUpdate

E.g. of

。

Then will be based on the protection description information in step 1204 using the code

To replace it (here, for simplicity, the value stored is not a protected value, so that the actual value z is stored in the protected structure y).

The purpose of the optional optindexArray is to provide an index that is computed at run-time. For example, such as' x3][15]The' like path does not require optindexArray because literal constant indices 3 and 15 are provided in the path. However, if the index needs to be computed at runtime, this is no longer applicable. If the source code has a code such as' x [ i ]][j]' where the values of i and j are unknown at the time the path is written in the source code, a mechanism to pass the index value is required. Such symbolic indices are ignored by the above-mentioned manipulation routine, but the function then expects the array arguments to follow 'path'. For example, the array argument optindexArray may compriseIt will cause the values of i and j to evaluate when a function call will be made, respectively, whenever 7 aWhat is + k and-2 × b. In this example, the symbolic index is replaced with elements of the array in the path from left to right.

Thus, as an example, assume x is an unprotected instance of the employee record structure. It can be protected by including the following lines of code in the source code to form a protected instance y:

here, a high protection level (specified by the EmployeeRecordHigh field in KeyTemplates) is being used, with a tag/seed value of 1352.

The result (i.e., protected structure y) may be represented by a trie as shown in fig. 14.

The element yearsAtCo may be accessed at step 1202 by including the following lines of code in the source code:

step 1204 will translate it into something quite different, such as:

in this connection, it is possible to use,

the protected value for the element yearsAtCo is accessed (as shown in figure 14). The actual value that has been applied to the element yearsAtCo is unprotected by using the constants vs0i98 and vs94io, which in this example are constants generated for the encoding of the value using linear mapping within the finite ring of the integer modulus 232.

The element yearsAtCo may be updated to the value w at step 1202 by including the following lines of code in the source code:

step 1204 will translate it into something quite different, such as:

in this connection, it is possible to use,

the protected value for the element yearsAtCo is accessed (as shown in figure 14). The protection that has been applied to the actual value of the element yearsAtCo is applied to the value w by using the constants vs4352 and vs93427, which in this example are constants generated for the encoding of the value using linear mapping within the finite loop of the integer modulus 232. As is known in the art, the constants vs4352 and vs93427 are related to the constants vs0i98 and vs94io (such that the constants vs4352 and vs93427 are used to protect a value and the constants vs0i98 and vs94io are used to render the protected value unprotected).

Although specific embodiments have been described, a worker of ordinary skill would recognize that modifications and variations of these embodiments would come within the spirit and scope of the invention.

It will be appreciated that the methods described have been shown as individual steps performed in a particular order. However, the skilled person will appreciate that the steps may be combined or performed in a different order, while still achieving the desired results.

It will be appreciated that embodiments of the invention may be implemented using a variety of different information processing systems. In particular, although the drawings and the discussion thereof provide exemplary computing systems and methods, they are presented merely to provide a useful reference in discussing various aspects of the invention. Embodiments of the invention may be performed on any suitable data processing device, such as a personal computer, laptop computer, personal digital assistant, mobile telephone, set-top box, television, server computer, or the like. Of course, the description of the systems and methods has been simplified for purposes of discussion, and it is just one of many different types of systems and methods that may be used with embodiments of the invention. It is to be appreciated that the boundaries between logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or elements or impose an alternate decomposition of functionality upon various logic blocks or elements.

It will be appreciated that the above-mentioned functions may be implemented as one or more respective modules, as hardware and/or software. For example, the above-mentioned functions may be implemented as one or more software components for execution by a processor of the system. Alternatively, the above-mentioned functions may be implemented as hardware, such as on one or more Field Programmable Gate Arrays (FPGAs) and/or one or more Application Specific Integrated Circuits (ASICs) and/or one or more Digital Signal Processors (DSPs) and/or other hardware arrangements. The method steps embodied in the flows contained therein or as described above may each be implemented by a respective module; a plurality of method steps implemented in a flowchart contained herein or as described above may be implemented together in a single module.

It will be appreciated that where embodiments of the invention are implemented by a computer program, then one or more storage media and/or one or more transmission media storing or carrying the computer program constitute aspects of the invention. The computer program may have one or more program instructions or program code that, when executed by one or more processors (or one or more computers), perform embodiments of the invention. The term "program" as used herein may be a sequence of instructions designed for execution on a computer system and may include a subroutine, a function, a program, a module, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a bytecode, a shared library, a dynamically linked library and/or other sequence of instructions designed for execution on a computer system. The storage medium may be a magnetic disk (such as a hard drive or floppy disk), an optical disk (such as a CD-ROM, DVD-ROM, or blu-ray disk), or memory (such as ROM, RAM, EEPROM, EPROM, flash memory, or portable/removable memory devices), among others. A transmission medium may be a communications signal, a data broadcast, a communications link between two or more computers, or the like.

Claims

1. A method of protecting an item of software, comprising:

performing in a first-class intermediate representation an optimization of the software items represented in the first-class intermediate representation;

performing protection of the item of software represented in a second class intermediate representation in the second class intermediate representation, wherein the second class intermediate representation is different from the first class intermediate representation; and

after performing the optimization and the protection, outputting the item of software in (i) a source code representation or (ii) a script representation suitable for use by a web browser.

2. The method of claim 1, wherein performing optimization comprises performing optimization in the first class intermediate representation before and after performing protection in the second class intermediate representation.

3. The method of claim 2, further comprising converting the item of software from the first type intermediate representation to the second type intermediate representation after performing the optimization and before subsequently performing the protection, and converting the item of software from the second type intermediate representation to the first type intermediate representation after performing the protection and before subsequently performing the optimization.

4. The method of claim 1, wherein performing protection comprises performing protection in the second type intermediate representation before and after performing optimization in the first type intermediate representation.

5. The method of claim 1, wherein the first type intermediate representation is an LLVM intermediate representation, LLVM IR.

6. The method of any of claims 1 to 5, wherein the optimization comprises optimization for one or more of size, runtime speed and runtime memory requirements of the item of software, and usage of cores and GPU processors.

7. The method of any of claims 1 to 5, wherein performing protection of the item of software in the second class intermediate representation comprises applying one or more protection techniques to the item of software.

8. The method of claim 6, wherein the one or more protection techniques include one or more of white-box protection techniques, node locking techniques, data flow obfuscation, control flow obfuscation and transformation, homomorphic data transformation, key hiding, program interlocking, and boundary blending.

9. The method of any of claims 1 to 5, wherein the protection component is a masquerading engine.

10. The method of any one of claims 1 to 5, further comprising providing the item of software represented in input and converting the item of software represented in input into the first-class intermediate representation before performing the optimization and protection.

11. The method of claim 10, wherein converting the item of software represented in the input to the first-class intermediate representation prior to performing the optimization and protection comprises converting the item of software from the input representation to the second-class intermediate representation and then converting the item of software from the second-class intermediate representation to the first-class intermediate representation.

12. The method of claim 10, wherein the input representation is a source code representation.

13. The method of claim 11, wherein the source code representation is one of C, C + +, Objective-C, Java, JavaScript, C #, Ada, Fortran, ActionScript, GLSL, Haskell, Julia, Python, Ruby, and Rust.

14. The method of claim 10, wherein the input representation is in binary code form.

15. The method of any of claims 1 to 5, wherein the outputting comprises converting the item of software into an output representation after performing the optimization and protection.

16. The method of any of claims 1 to 5, further comprising applying binary protections to the outputted item of software after compiling and linking the outputted item of software.

17. The method of any of claims 1 to 5, wherein converting the item of software to the output representation comprises converting the item of software from a first type intermediate representation to a second type intermediate representation and then converting the item of software from the second type intermediate representation to the source code representation.

18. The method of claim 15, wherein the output representation is a JavaScript representation.

19. The method of claim 15, wherein the output representation is a subset of JavaScript.

20. A method as claimed in any one of claims 1 to 5, comprising converting the item of software from the first-class intermediate representation to a script representation.

21. A method as claimed in any one of claims 1 to 5, wherein the item of software is an application for execution on the user device.

22. A method as claimed in any one of claims 1 to 5, wherein the items of software are one or more of libraries, modules and agents.

23. A method as claimed in any one of claims 1 to 5, wherein the item of software is a security item of software.

24. A method as claimed in any one of claims 1 to 5, further comprising delivering the item of software to the user device for execution.

25. The method of any of claims 1 to 5, further comprising performing protection of the item of software also in an intermediate representation of the first kind and/or performing optimization of the item of software in an intermediate representation of the second kind and/or performing protection of the item of software in another intermediate representation different from the intermediate representation of the first and second kind and/or performing optimization of the item of software in another intermediate representation different from the intermediate representation of the first and second kind.

26. A method of protecting an item of software, comprising performing the method of any preceding claim on two items of software, and invoking one of the other items of software from the other item of software.

27. A method of protecting an item of software, comprising:

performing protection of the item of software represented in the first-class intermediate representation in a first-class intermediate representation;

performing further protection of the software items represented in a second class intermediate representation in the second class intermediate representation, wherein the second class intermediate representation is different from the first class intermediate representation; and

after performing the protection and the further protection, outputting the item of software in (i) a source code representation or (ii) a script representation suitable for use by a web browser.

28. The method of claim 27, further comprising performing optimization of the item of software in at least one of: a first type intermediate representation; a second intermediate representation; and another intermediate representation different from the first and second intermediate representations.

29. An apparatus arranged to carry out the method of any one of claims 1 to 28.

30. One or more computer-readable media comprising computer program code arranged to implement the method of any of claims 1 to 28 when executed on a suitable computer device.

31. One or more computer-readable media comprising an item of software protected and optimized according to the method of any one of claims 1 to 28.

32. A computer apparatus for protecting an item of software, comprising:

an optimizer component arranged to perform in a first class intermediate representation optimization of the software items represented in the first class intermediate representation; and

a protector component arranged to perform protection of items of software represented in a second class intermediate representation in said second class intermediate representation, wherein said second class intermediate representation is different from said first class intermediate representation;

wherein the computer apparatus is arranged to output the item of software, after performing the optimization and the protection, in (i) a source code representation or (ii) a script representation suitable for use by a web browser.

33. The apparatus of claim 32 wherein the apparatus is arranged such that the optimizer component performs optimization of the item of software in the first class intermediate representation before and after the protector component performs protection of the item of software in the second class intermediate representation.

34. The apparatus of claim 32 wherein the apparatus is arranged such that the protector component performs protection of the item of software in the second type intermediate representation both before and after the optimizer component performs protection of the item of software in the first type intermediate representation.

35. The apparatus according to any one of claims 32 to 34, wherein the first type intermediate representation is LLVM IR.

36. The apparatus of any of claims 32 to 34, wherein the optimization component comprises one or more LLVM optimization tools.

37. The apparatus of any of claims 32 to 34, wherein the protection component is configured to apply one or more protection techniques to the item of software, including one or more of white-box protection techniques, node locking techniques, data flow obfuscation, control flow obfuscation and transformations, homomorphic data transformations, key hiding, program interlocks, and boundary blending.

38. The apparatus of any of claims 32 to 34, further comprising an input converter arranged to convert the item of software from the input representation to LLVM IR.

39. The apparatus of claim 38, wherein the input representation is one of a binary representation and a source code representation.

40. Apparatus as claimed in any of claims 32 to 34, further comprising a binary overwrite protection tool arranged to apply binary overwrite protection to the outputted item of software following compilation and linking of the outputted item of software.

41. An apparatus according to any one of claims 32 to 34, wherein the protector component is further arranged to perform protection of the item of software in the first-class intermediate representation.

42. Apparatus according to any one of claims 32 to 34, comprising a further protector component arranged to perform protection of items of software in the first-class intermediate representation.

43. A unified masquerading toolset, comprising:

a protection component;

an optimizer component;

a converter between the intermediate representations used by the protection component and the optimizer component; and

an output arranged to provide, after use of the optimizer component and protector component, an output represented in (i) source code or (ii) a script suitable for use by a web browser.

44. The unified masquerading toolset of claim 43, wherein the optimizer component comprises one or more LLVM optimizer tools, and the unified masquerading toolset comprises one or more LLVM front-end tools for converting from the input representation to the LLVM intermediate representation.

45. The unified masquerading toolset of claim 43, wherein the protection component implements one or more of the following techniques: white-box protection techniques, node locking techniques, data flow obfuscation, control flow obfuscation, homomorphic data transformations, control flow transformations, key locking, program interlocks, and boundary blending.

46. The unified masquerading toolset of any of claims 43 to 45, further comprising an output converter for converting to an output representation that is a subset of JavaScript.

47. The unified masquerading toolset of any of claims 43 to 45, arranged to apply a plurality of alternative steps of protection and optimization to a software project using a protection component and an optimization component.

48. A computer device comprising the unified masquerading toolset of any of claims 43 to 45.