WO2023060525A1 - Methods and systems for generating verifiable software releases - Google Patents

Methods and systems for generating verifiable software releases Download PDF

Info

Publication number
WO2023060525A1
WO2023060525A1 PCT/CN2021/123961 CN2021123961W WO2023060525A1 WO 2023060525 A1 WO2023060525 A1 WO 2023060525A1 CN 2021123961 W CN2021123961 W CN 2021123961W WO 2023060525 A1 WO2023060525 A1 WO 2023060525A1
Authority
WO
WIPO (PCT)
Prior art keywords
build
modified
instructions
software release
deterministic
Prior art date
Application number
PCT/CN2021/123961
Other languages
French (fr)
Inventor
Jiawen Xiong
Yong Shi
Boyuan Chen
Zhenming JIANG
Filipe Roseiro COGO
Original Assignee
Huawei Technologies Co.,Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co.,Ltd. filed Critical Huawei Technologies Co.,Ltd.
Priority to PCT/CN2021/123961 priority Critical patent/WO2023060525A1/en
Publication of WO2023060525A1 publication Critical patent/WO2023060525A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • G06F9/45516Runtime code conversion or optimisation
    • G06F9/4552Involving translation to a different instruction set architecture, e.g. just-in-time translation in a JVM

Definitions

  • the present disclosure relates to software systems, in particular, verifiable Java software releases.
  • Java is a popular class-based, object-oriented programming language used across a variety of computing platforms. Java applications are typically compiled into bytecode, stored in a . class file that is then executed on a Java virtual machine (JVM) . Java is a platform independent language, in that the compiled software can be executed on JVMs installed on any platform.
  • JVM Java virtual machine
  • Java software packages e.g. open source packages
  • central repositories e.g. Maven Central
  • Java software packages are typically distributed in pre-compiled form, and automated build tools such as Maven and Gradle facilitate the reuse of third party packages obtained from central repositories by assisting developers in integrating them into new software packages.
  • this process can expose developers to risks such as supply chain attacks, where the build infrastructure is compromised and malicious code is inserted during the software build process prior to distribution to end customers, ultimately compromising the customer’s data or system.
  • a reproducible build is one where the build artifacts generated independently by two different build instances are equivalent.
  • a verifiable build maximizes the equivalence between two build artifacts generated independently by two different build instances, where any remaining non-equivalences are identified and interpreted to be legitimate. For example, security requirements may introduce random signature keys in distributed software packages.
  • the present disclosure describes methods and systems to mitigate sources of non-determinism in Java software systems.
  • software is engaged to intercept build instructions at the JVM level to modify build instructions during build runtime.
  • a bytecode editor is then used to edit remaining sources of non-determinism in the software release.
  • the software release is verified by obtaining validation information and documenting all remaining sources of non-equivalence in the final build artifact.
  • the present disclosure provides the technical effect that a verifiable Java software release is obtained, along with an output file disclosing any remaining sources of non-equivalences.
  • Sources of non-determinism introduced at the JVM level during the build process are automatically replaced by custom methods, while other sources of non-equivalence are accounted for, in order for the Java software release to be verified as a trusted software package free from malicious code.
  • the present disclosure provides the technical advantage that a verifiable Java software release can be generated easily and automatically, without requiring direct editing of the Java source code of the build tool or introducing changes in the behavior of the build tool.
  • the present disclosure also provides the technical advantage that the methods and systems are flexible and easily integrated with any automated build tool, irrespective of version or legacy code and ensuring backwards compatibility with older versions. Verifiable software releases can be obtained using build tool packages that may no longer be maintained but are still widely used in the Java software community.
  • the present disclosure describes a method for generating a verifiable software release.
  • the method comprises a number of steps.
  • a build intervention is performed during build runtime of a build tool configured to build a software release in a Java runtime environment.
  • the build intervention comprises: instrumenting the build tool to enable interception of build instructions of the build tool; intercepting one or more build instructions causing non-determinism in the software release using a class transformer; modifying the one or more build instructions to remove the non-determinism using the class transformer, thereby generating one or more modified build instructions; and storing the one or more modified build instructions, such that the build tool uses the one or more modified build instructions to build a modified software release without the non-determinism caused by the one or more build instructions.
  • the method further comprises editing the modified software release to remove one or more non-deterministic portions using a bytecode editor, thereby generating an edited software release.
  • the method further comprises generating documentation documenting one or more remaining non-deterministic portions in the edited software release.
  • the method further comprises generating validation information for verifying further software releases generated using the preceding methods.
  • intercepting the one or more build instructions comprises: obtaining a non-deterministic method profile of one or more non-deterministic methods contained in the build tool; loading the class transformer into a Java Virtual Machine; and locating one or more . class files of the build tool running on the Java Virtual Machine containing non-deterministic methods, using the non-deterministic method profile.
  • the class transformer uses the Java Instrumentation Application Programming Interface (API) to intercept and modify the one or more build instructions.
  • API Java Instrumentation Application Programming Interface
  • modifying the one or more build instructions includes: modifying the bytecode of one or more . class files of the build tool to replace one or more time-related methods with a customized method that returns pre-defined values for timestamps; and modifying the bytecode of one or more . class files of the build tool to disable a cache mechanism of a compiler.
  • building the modified software release includes: loading the modified build instructions into memory; and executing the modified build instructions to generate a compressed archive file based on the compiled modified build instructions.
  • the compressed archive file is packaged as one of: a .jar file; an . ear file; and a . war file.
  • editing the modified software release to remove one or more non-deterministic portions using a bytecode editor includes: unpacking the compressed archive file of the modified software release; extracting one or more modified . class files and a metadata; examining the one or more modified . class files and the metadata to locate one or more non-deterministic portions; and editing the one or more modified . class files and the metadata to remove the one or more non-deterministic portions to generate a repackaged archive file containing one or more edited . class files and an edited metadata.
  • editing the one or more modified . class files includes: deduplicating a constant table to make all constant entries unique; sorting the constant table; sorting a method table; and sorting a local variable table.
  • generating validation information for verifying further software releases includes computing a checksum value for the edited software release.
  • the method further comprises uploading to a repository a verifiable software release distribution package, the verifiable software release distribution package containing: a source code of the edited software release; the validation information; and documentation.
  • the present disclosure describes a system for generating a verifiable software release.
  • the system comprises a processor device and a memory stores machine-executable instructions which, when executed by the processor device, cause the system to perform a number of steps.
  • a build intervention is performed during build runtime of a build tool configured to build a software release in a Java runtime environment by: instrumenting the build tool to enable interception of build instructions of the build tool; intercepting one or more build instructions causing non-determinism in the software release using a class transformer; modifying the one or more build instructions to remove the non-determinism using the class transformer, thereby generating one or more modified build instructions; and storing the one or more modified build instructions, such that the build tool uses the one or more modified build instructions to build a modified software release without the non-determinism caused by the one or more build instructions.
  • the present disclosure describes a non-transitory processor-readable medium having machine-executable instructions stored thereon which, when executed by a processor of a device, cause the device to perform a number of steps.
  • a build intervention is performed during build runtime of a build tool configured to build a software release in a Java runtime environment.
  • the build intervention comprises: instrumenting the build tool to enable interception of build instructions of the build tool; intercepting one or more build instructions causing non-determinism in the software release using a class transformer; modifying the one or more build instructions to remove the non-determinism using the class transformer, thereby generating one or more modified build instructions; and storing the one or more modified build instructions, such that the build tool uses the one or more modified build instructions to build a modified software release without the non-determinism caused by the one or more build instructions.
  • the present disclosure describes a processor-readable medium having machine-executable instructions stored thereon which, when executed by a processor of a device, cause the device to perform the steps of the above-mentioned methods.
  • the present disclosure describes a computer program comprising instructions which, when the program is executed by a computer, cause the computer to perform the steps of the above-mentioned methods.
  • FIG. 1 is a block diagram of an example computing system which may be used to implement examples of the present disclosure
  • FIG. 2 is a is a block diagram illustrating an example Java Virtual Machine architecture, in accordance with examples of the present disclosure
  • FIG. 3 is a flowchart illustrating an example method for building a verifiable software release in a Java Runtime Environment (JRE) , in accordance with examples of the present disclosure
  • FIG. 4 is a flowchart illustrating an example method for intercepting and modifying one or more build instructions using a class transformer, in accordance with examples of the present disclosure.
  • FIG. 5 is a flowchart illustrating an example method for editing a modified software release to generate an edited software release, in accordance with examples of the present disclosure.
  • the present disclosure describes methods and systems that help to address the problem of Java software release verifiability, by automatically removing sources of non-equivalence during software builds. More specifically, a class transformer is used to intercept non-deterministic method invocation during java software builds such that sources of non-determinism are removed by modifying the bytecode of . class files of the build tool. A bytecode editor is then used to remove further sources of non-determinism in the generated build artifact. Validation information is obtained for the edited software release and accompanying documentation verifies legitimate remaining non-equivalences in the verifiable software release.
  • the Java platform is a collection of programs to enable the development and execution of Java programs. Included in the Java platform is an execution engine (e.g. the Java Virtual Machine or JVM) , a compiler and libraries.
  • the compiler converts Java source code into bytecode, an intermediate representation of the Java program that can be understood by the JVM and which is stored as a . class file.
  • the JVM then executes the bytecode.
  • the Java Runtime Environment (JRE) is another component of the Java platform that introduces a just-in-time compiler to convert bytecode during runtime. Using the JRE, . class files are dynamically loaded into memory by the Java Class Loader.
  • the Java Instrumentation API is a tool that enables the instrumentation and modification of bytecode to existing Java classes by Java Agents.
  • a Java Agent is a . class file that leverages the Java Instrumentation API and contains instructions to intercept applications running on the JVM and make changes to the bytecode of these applications.
  • a Java Agent can be initiated at application startup, or dynamically during application runtime.
  • a “software build instance” is defined as a discrete initiation of a software build process resulting in the generation of a software release.
  • a “software release” is defined as a compiled output of a software build process, or a build artifact. Build artifacts are the files produced by a build process, and may also be referred to as software binaries.
  • a “verifiable software release” is defined as a software release in which sources of non-determinism resulting from the build process or build infrastructure are removed, and any remaining non-equivalencies can be explained and documented as being legitimate or necessary (e.g. for security purposes) .
  • a nondeterministic build process is a build process that produces different build artifacts from separate build instances despite using the same source code, build scripts and build environment.
  • Sources of non-determinism in build artifacts may include timestamps, build path and file ordering, among others.
  • the process of building verifiable software packages includes two aspects.
  • the first process known as the deterministic build process, aims to eliminate sources of non-determinism in the build process or introduced by build infrastructures that may contribute to non-equivalences in build artifacts. For example, insertion of a timestamp during the build process will cause non-equivalences, since timestamps inserted during two separate build instances will be different.
  • the second process known as the explainable build process, aims to identify and interpret non-equivalences in the build artifacts that are legitimate and should not be removed.
  • the contents of a distributed software package can be compared to the contents of a trusted software package.
  • multiple independent developers may use a trusted build infrastructure and share the results of the build process for other developers to compare against.
  • the contents of the final distribution will be different from those built with a trusted infrastructure.
  • a “verifiable software release distribution package” includes a source code of the verifiable software release, validation information that may be used to verify the contents of the verifiable software release and any associated documentation describing remaining sources of non-equivalence or that may be required to install or use the software release.
  • plug-ins directed to automated build tools.
  • a disadvantage of this approach is that these plug-ins require that the build infrastructure (e.g., compilers, class loaders, and other related tools) be configured and upgraded to specific versions that support the plug-ins, rendering them unsuitable to legacy code.
  • Another disadvantage is that these plug-ins only address limited sources of non-determinism associated with specific automated build tools, making them too specific to be applied in a generalist and heterogeneous setting typically found in large software systems.
  • Another existing approach to producing verifiable software builds includes automated tools directed at removing sources of non-determinism during the build process, however they are directed at C and C++ programming languages and require the interception of build instructions at the operating system level.
  • Java-based applications use automated build tools that run over the JVM and as such, many sources of non-determinism are introduced through the interaction between those build tools and the JVM. Addressing non-determinism at the operating system level is not appropriate for Java software, instead, build instructions would need to be intercepted at the JVM level.
  • Existing software tools for producing verifiable software builds are not able to directly interact with the JVM.
  • the present disclosure describes examples that may help to address some or all of the above drawbacks of existing technologies.
  • FIG. 1 is a block diagram illustrating a simplified example implementation of a computing system 100 that is suitable for implementing embodiments described herein. Examples of the present disclosure may be implemented in other computing systems, which may include components different from those discussed below.
  • the computing system 100 may be used to execute instructions for generating a verifiable software build, using any of the examples described above.
  • the computing system 100 may also be used to execute the verifiable software release, or the verifiable software release may be executed by another computing system.
  • FIG. 1 shows a single instance of each component, there may be multiple instances of each component in the computing system 100.
  • the computing system 100 may be a single physical machine or device (e.g., implemented as a single computing device, such as a single workstation, single consumer device, single server, etc. ) , or may comprise a plurality of physical machines or devices (e.g., implemented as a server cluster) .
  • the computing system 100 may represent a group of servers or cloud computing platform providing a virtualized pool of computing resources (e.g., a virtual machine, a virtual server) .
  • the computing system 100 includes at least one processor 102, such as a central processing unit, a microprocessor, a digital signal processor, an application-specific integrated circuit (ASIC) , a field-programmable gate array (FPGA) , a dedicated logic circuitry, a dedicated artificial intelligence processor unit, a graphics processing unit (GPU) , a tensor processing unit (TPU) , a neural processing unit (NPU) , a hardware accelerator, or combinations thereof.
  • processor 102 such as a central processing unit, a microprocessor, a digital signal processor, an application-specific integrated circuit (ASIC) , a field-programmable gate array (FPGA) , a dedicated logic circuitry, a dedicated artificial intelligence processor unit, a graphics processing unit (GPU) , a tensor processing unit (TPU) , a neural processing unit (NPU) , a hardware accelerator, or combinations thereof.
  • processor 102 such as a central processing unit, a microprocessor, a digital signal processor,
  • the computing system 100 may include an optional input/output (I/O) interface 104, which may enable interfacing with an optional input device 106 and/or optional output device 108.
  • the optional input device 106 e.g., a keyboard, a mouse, a microphone, a touchscreen, and/or a keypad
  • optional output device 108 e.g., a display, a speaker and/or a printer
  • the computing system 100 may include an optional network interface 110 for wired or wireless communication with other computing systems (e.g., other computing systems in a network) .
  • the network interface 110 may include wired links (e.g., Ethernet cable) and/or wireless links (e.g., one or more antennas) for intra-network and/or inter-network communications.
  • the computing system 100 may include one or more memories 112 (collectively referred to as “memory 112” ) , which may include a volatile or non-volatile memory (e.g., a flash memory, a random access memory (RAM) , and/or a read-only memory (ROM) ) .
  • the non-transitory memory 112 may store instructions for execution by the processor 102, such as to carry out examples described in the present disclosure.
  • the memory 112 may store instructions for implementing any of the networks and methods disclosed herein.
  • the memory 112 may include other software instructions, such as for implementing an operating system (OS) 114 and other applications/functions, such as a Virtual Machine 116, including a Java Virtual Machine.
  • OS operating system
  • Virtual Machine 116 including a Java Virtual Machine.
  • the memory 112 may also store other data, information, rules, policies, and machine-executable instructions described herein.
  • the computing system 100 may also include one or more electronic storage units (not shown) , such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive.
  • data and/or instructions may be provided by an external memory (e.g., an external drive in wired or wireless communication with the computing system 100) or may be provided by a transitory or non-transitory computer-readable medium. Examples of non-transitory computer readable media include a RAM, a ROM, an erasable programmable ROM (EPROM) , an electrically erasable programmable ROM (EEPROM) , a flash memory, a CD-ROM, or other portable memory storage.
  • the storage units and/or external memory may be used in conjunction with memory 112 to implement data storage, retrieval, and caching functions of the computing system 100.
  • the components of the computing system 100 may communicate with each other via a bus, for example.
  • FIG. 2 is a block diagram illustrating an example system with a Java Agent deployed on a Java Virtual Machine (JVM) architecture 200 that may be used to implement methods to modify Java software releases to improve verifiability, in accordance with examples of the present disclosure.
  • JVM Java Virtual Machine
  • a Java Instrumentation Application Programming Interface (API) 202 is used to instrument a build tool 220 to enable interception of build instructions 224 of the build tool 220.
  • a build tool 220 may be a Java Application, including automated build tools such as Apache Maven or Gradle.
  • the build tool may include one or more . class files containing Java bytecode, which is parsed by a class loader 210 running in the JVM 200 to generate build instructions 224.
  • the build tool 220 is invoked to build Java Application source code 226 into a software release, using some combination of build instructions 224 such as packing instructions and compilation instructions.
  • the Java Application Source Code 226 is compiled, packed, and otherwise built into a conventional software release including one or more . class files containing Java bytecode.
  • a conventional software release built in such a conventional environment may contain indeterminacies, as described above.
  • JRE Java runtime environment
  • The. class files of the build tool 220 are loaded dynamically into the class loader 210 of the JVM 200 to generate the build instructions 224.
  • the build tool 220 is instrumented to enable interception by a class transformer 204, which is loaded into the JVM 200 and intercepts one or more build instructions 224 as each .
  • class file is dynamically loaded and parsed during runtime by the class loader 210.
  • the class transformer 204 may then modify the bytecode of one or more of the build instructions 224 to generate modified build instructions 212.
  • the class transformer contains information about which .
  • class files of the build tool 220 to intercept as well as how to modify the bytecode of the loaded . class files.
  • a non-deterministic method profile 208 may be assembled in advance of initiating the build tool 220 to inform the class transformer 204 which . class files to intercept.
  • Some or all of the (unmodified) build instructions 224 may be written to memory 214 prior to execution.
  • Modified build instructions 212 in the form of modified bytecode may also be loaded into memory 214 to replace or overwrite the corresponding unmodified build instructions 224.
  • the build instructions 224 and modified build instructions 212 stored in the memory 214 are then executed by the execution engine 216 to generate a modified software release 230, which may contain one or more modified . class files.
  • the software build instance may output the modified software release 230 in the form of a compressed archive file.
  • the compressed archive file may be packaged as one of a . jar file, an . ear file or a . war file.
  • FIG. 3 is a flowchart illustrating an example method 300 for building a verifiable software release in a JRE, in accordance with examples of the present disclosure.
  • the method 300 may be performed by the computing system 100.
  • the processor 102 may execute computer readable instructions (which may be stored in the memory 112) to cause the computing system 100 to perform the method 400.
  • the method 400 may be performed using a single physical machine (e.g., a workstation or server) , a plurality of physical machines working together (e.g., a server cluster) , or cloud-based resources (e.g., using virtual resources on a cloud computing platform) .
  • a single physical machine e.g., a workstation or server
  • a plurality of physical machines working together e.g., a server cluster
  • cloud-based resources e.g., using virtual resources on a cloud computing platform
  • Method 300 begins with step 302 in which a build tool 220 configured to build a software release is instrumented to enable the interception of build instructions 224 of the build tool 220.
  • the build tool 220 is instrumented using the Java Instrumentation API 202 to engage a class transformer 204 upon initiation of a software build instance.
  • the class transformer is a special type of . class file that is able to intercept and modify applications running on a JVM.
  • step 304 one or more non-deterministic build instructions of the build tool 220 are intercepted using the class transformer 204.
  • one or more non-deterministic build instructions 224 of the build tool 220 are modified to remove the non-determinism using the class transformer 204. Further details about steps 304 and 306 are provided in the discussion of method 400 depicted in FIG. 4.
  • FIG. 4 is a flowchart illustrating an example method 400 for intercepting and modifying one or more build instructions using a class transformer, in accordance with examples of the present disclosure.
  • the method 400 may be performed by the computing system 100.
  • the processor 102 may execute computer readable instructions (which may be stored in the memory 112) to cause the computing system 100 to perform the method 400.
  • the method 400 may be performed using a single physical machine (e.g., a workstation or server) , a plurality of physical machines working together (e.g., a server cluster) , or cloud-based resources (e.g., using virtual resources on a cloud computing platform) .
  • a single physical machine e.g., a workstation or server
  • a plurality of physical machines working together e.g., a server cluster
  • cloud-based resources e.g., using virtual resources on a cloud computing platform
  • Method 400 begins with step 402, in which a non-deterministic method profile 208 of one or more non-deterministic methods contained within the build tool 220 is obtained and incorporated into the class transformer 204.
  • various non-deterministic methods may be invoked that introduce non-deterministic values into the software build process, and therefore result in non-equivalences between build artifacts.
  • An example of such a non-deterministic method is the method “java/lang/System#currentTimeMillis () ” which returns the current system timestamp. The insertion of a timestamp during the build process will cause non-equivalences between build artifacts since timestamps inserted during two separate software build instances will be different.
  • the methods included in the non-deterministic method profile 208 may be obtained manually through reviewing software documentation.
  • a software build instance may be initiated on a JVM 200 using an instrumented build tool 220. Initiating a software build instance may initialize the JVM 200.
  • initializing the JVM 200 also registers the class transformer 204 by calling the premain method of the Java agent containing the class transformer 204.
  • the class transformer 204 is loaded into the JVM 200 and the JVM 200 proceeds to call the main method of the Java Application (e.g. the build tool 220) to begin executing the software build.
  • the class loader 210 proceeds to load the . class files of the build tool 220, and as described at step 408, the class transformer 204 can check the . class files of the build tool 220 running on the JVM 200 with the non-deterministic method profile 208 to locate one or more . class files of the build tool 220 containing non-deterministic methods.
  • the class transformer 204 may then directly modify the bytecode of one or more . class files of the build tool 220 by replacing the original bytecode with alternate bytecode that removes the sources of non-determinism.
  • the class transformer 204 may replace one or more time-related methods in one or more . class files of the build tool 220 with a customized method that returns pre-defined values for timestamps.
  • the customized methods receive the same arguments as the original methods and are pushed into the stack at every method call, returning a fixed, predefined value that can be obtained from environment variables or can be directly returned as a scalar defined within the customized method.
  • the class transformer 204 may modify the bytecode of one or more . class files of the build tool 220 to disable a cache mechanism associated with the build tool 220.
  • the Tomcat Jasper compiler may, by default, enable caching functions, which can introduce a source of non-equivalence in build artifacts.
  • the class transformer 204 may modify the bytecode of the Jasper tool at build runtime to disable the caching mechanism.
  • the modified bytecode from steps 410 and 412 are assembled at step 308 to generate a modified build instructions 212.
  • the modified build instructions 212 may then be stored in the class transformer 204, where they are easily accessible to be appended in future build instances if new sources of non-determinism are identified in the build tool 220.
  • the Java software build instance may proceed to build a modified software release 230 without non-determinism caused by one or more build instructions 224, using the one or more modified build instructions 212.
  • Modified build instructions 212 in the form of modified bytecode of one or more . class files of the build tool 220, as well as unmodified build instructions 224, may be loaded into memory 214 by the class loader 210 and executed by the execution engine 216.
  • the software build instance may output a modified software release 230 in the form of a compressed archive file.
  • the compressed archive file may be packaged as one of a . jar file, an . ear file or a . war file.
  • step 312 a bytecode editor may be used to further edit the modified software release 230 by removing one or more non-deterministic portions, to generate an edited software release. Further details about step 312 is provided in the discussion of method 500 depicted in FIG. 5.
  • FIG. 5 is a flowchart illustrating an example method 500 for editing a modified software release to generate an edited software release, in accordance with examples of the present disclosure.
  • the method 500 may be performed by the computing system 100.
  • the processor 102 may execute computer readable instructions (which may be stored in the memory 112) to cause the computing system 100 to perform the method 400.
  • the method 400 may be performed using a single physical machine (e.g., a workstation or server) , a plurality of physical machines working together (e.g., a server cluster) , or cloud-based resources (e.g., using virtual resources on a cloud computing platform) .
  • a single physical machine e.g., a workstation or server
  • a plurality of physical machines working together e.g., a server cluster
  • cloud-based resources e.g., using virtual resources on a cloud computing platform
  • Method 500 begins with step 502, in which the modified software release 230 is inspected for non-equivalences. For example, a checksum of the modified software release may be computed and compared to the checksum of a trusted software release. During the execution of the build instructions at step 312, additional sources of non-determinism may be introduced through build path and file ordering, among others.
  • a bytecode editor is initiated at step 504.
  • the bytecode editor proceeds to unpack the compressed archive files of the modified software release 230 at step 506.
  • Compressed archive files may be packaged as one of a . jar file, an . ear file or a . war file and unpacking these compressed archive files may generate an output directory containing folders and uncompressed data.
  • one or more . class files and metadata including MANIFEST. MF and other text files of the modified software release may be extracted.
  • These . class files and metadata may then be examined at step 510 to locate non-deterministic portions.
  • non-deterministic portions may include duplicate values in constant tables, differences in the ordering of values in constant, method and variable tables introduced by the software build instance, and time attributes, among others.
  • the bytecode editor proceeds to edit one or more extracted . class files of the modified software release 230 to remove non-deterministic portions at step 512. Editing the one or more . class files may include deduplicating a constant table to make all constant entries unique, sorting a constant table, sorting a method table and sorting a local variable table.
  • step 514 additional non-deterministic portions are then removed from the extracted metadata, including MANIFEST. MF and other text files. For example, time attributes may be removed from the metadata.
  • Compressed archive files may be packaged as one of a . jar file, an . ear file or a . war file.
  • step 314 identifies and generates documentation documenting any remaining non-deterministic portions in the edited software release that were not removed in any of the preceding steps. For example, legitimate sources of non-determinism may need to remain in the edited software release for security requirements, such as random signature keys or necessary encryption that may be present in distributed software packages. The remaining non-deterministic portions are explained in accompanying documentation before being removed for the purpose of obtaining validation information for the edited software release.
  • validation information including a checksum value may be computed for the edited software release for the purposes of verifying the contents of the edited software release.
  • the checksum value may be compared against checksum values that are generated and published by a trusted source, or against checksum values for further software releases to build confidence that the contents of the software release are trusted and free from malicious code.
  • a verifiable software release distribution package may then be uploaded to a software repository, where the software release distribution package contains source code of the edited software release, the validation information and the documentation documenting legitimate non-deterministic portions in the software release.
  • the present disclosure is described, at least in part, in terms of methods, a person of ordinary skill in the art will understand that the present disclosure is also directed to the various components for performing at least some of the aspects and features of the described methods, be it by way of hardware components, software or any combination of the two. Accordingly, the technical solution of the present disclosure may be embodied in the form of a software product.
  • a suitable software product may be stored in a pre-recorded storage device or other similar non-volatile or non-transitory computer readable medium, including DVDs, CD-ROMs, USB flash disk, a removable hard disk, or other storage media, for example.
  • the software product includes instructions tangibly stored thereon that enable a computing system to execute examples of the methods disclosed herein.
  • the machine-executable instructions may be in the form of code sequences, configuration information, or other data, which, when executed, cause a machine (e.g., a processor or other processing unit) to perform steps in a method according to examples of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

Methods and systems are described for generating a verifiable software release. A build intervention is performed during build runtime of a build tool configured to build a software release in a Java runtime environment using a class transformer, in order to intercept and modify one or more build instructions to remove the non-determinism introduced by the build tool. A bytecode editor is then engaged to further remove remaining sources of non-determinism in the modified software release to generate an edited software release. Validation information is obtained for the edited software release and accompanying documentation is assembled to verify remaining legitimate sources of non-equivalence in the verifiable software release.

Description

METHODS AND SYSTEMS FOR GENERATING VERIFIABLE SOFTWARE RELEASES TECHNICAL FIELD
The present disclosure relates to software systems, in particular, verifiable Java software releases.
BACKGROUND
Java is a popular class-based, object-oriented programming language used across a variety of computing platforms. Java applications are typically compiled into bytecode, stored in a . class file that is then executed on a Java virtual machine (JVM) . Java is a platform independent language, in that the compiled software can be executed on JVMs installed on any platform.
Due to its popularity, developers have access to many third-party Java software packages (e.g. open source packages) that are distributed via central repositories (e.g. Maven Central) . Java software packages are typically distributed in pre-compiled form, and automated build tools such as Maven and Gradle facilitate the reuse of third party packages obtained from central repositories by assisting developers in integrating them into new software packages. Unfortunately, this process can expose developers to risks such as supply chain attacks, where the build infrastructure is compromised and malicious code is inserted during the software build process prior to distribution to end customers, ultimately compromising the customer’s data or system.
To mitigate the risk of supply chain attacks, developers have employed tools such as producing reproducible or verifiable builds. A reproducible build is one where the build artifacts generated independently by two different build instances are equivalent. A verifiable build maximizes the equivalence between two build artifacts generated independently by two different build instances, where any remaining non-equivalences are identified and interpreted to be legitimate. For example, security requirements may introduce random  signature keys in distributed software packages. By employing a reproducible or verifiable build, developers can be confident that the built software that is distributed to end users is free of malicious code.
Accordingly, it would be useful to provide a solution that can address non-equivalences in Java build artifacts.
SUMMARY
In various examples, the present disclosure describes methods and systems to mitigate sources of non-determinism in Java software systems. Specifically, software is engaged to intercept build instructions at the JVM level to modify build instructions during build runtime. A bytecode editor is then used to edit remaining sources of non-determinism in the software release. The software release is verified by obtaining validation information and documenting all remaining sources of non-equivalence in the final build artifact.
In various examples, the present disclosure provides the technical effect that a verifiable Java software release is obtained, along with an output file disclosing any remaining sources of non-equivalences. Sources of non-determinism introduced at the JVM level during the build process are automatically replaced by custom methods, while other sources of non-equivalence are accounted for, in order for the Java software release to be verified as a trusted software package free from malicious code.
In some examples, the present disclosure provides the technical advantage that a verifiable Java software release can be generated easily and automatically, without requiring direct editing of the Java source code of the build tool or introducing changes in the behavior of the build tool.
In some examples, the present disclosure also provides the technical advantage that the methods and systems are flexible and easily integrated with any automated build tool, irrespective of version or legacy code and ensuring backwards compatibility with older versions. Verifiable software releases can be  obtained using build tool packages that may no longer be maintained but are still widely used in the Java software community.
In some aspects, the present disclosure describes a method for generating a verifiable software release. The method comprises a number of steps. A build intervention is performed during build runtime of a build tool configured to build a software release in a Java runtime environment. The build intervention comprises: instrumenting the build tool to enable interception of build instructions of the build tool; intercepting one or more build instructions causing non-determinism in the software release using a class transformer; modifying the one or more build instructions to remove the non-determinism using the class transformer, thereby generating one or more modified build instructions; and storing the one or more modified build instructions, such that the build tool uses the one or more modified build instructions to build a modified software release without the non-determinism caused by the one or more build instructions.
In some examples, the method further comprises editing the modified software release to remove one or more non-deterministic portions using a bytecode editor, thereby generating an edited software release.
In some examples, the method further comprises generating documentation documenting one or more remaining non-deterministic portions in the edited software release.
In some examples, the method further comprises generating validation information for verifying further software releases generated using the preceding methods.
In some examples, intercepting the one or more build instructions comprises: obtaining a non-deterministic method profile of one or more non-deterministic methods contained in the build tool; loading the class transformer into a Java Virtual Machine; and locating one or more . class files of the build tool running on the Java Virtual Machine containing non-deterministic methods, using the non-deterministic method profile.
In some examples, the class transformer uses the Java Instrumentation Application Programming Interface (API) to intercept and modify the one or more build instructions.
In some examples, modifying the one or more build instructions includes: modifying the bytecode of one or more . class files of the build tool to replace one or more time-related methods with a customized method that returns pre-defined values for timestamps; and modifying the bytecode of one or more . class files of the build tool to disable a cache mechanism of a compiler.
In some examples, building the modified software release includes: loading the modified build instructions into memory; and executing the modified build instructions to generate a compressed archive file based on the compiled modified build instructions.
In some examples, the compressed archive file is packaged as one of: a .jar file; an . ear file; and a . war file.
In some examples, editing the modified software release to remove one or more non-deterministic portions using a bytecode editor includes: unpacking the compressed archive file of the modified software release; extracting one or more modified . class files and a metadata; examining the one or more modified . class files and the metadata to locate one or more non-deterministic portions; and editing the one or more modified . class files and the metadata to remove the one or more non-deterministic portions to generate a repackaged archive file containing one or more edited . class files and an edited metadata.
In some examples, editing the one or more modified . class files includes: deduplicating a constant table to make all constant entries unique; sorting the constant table; sorting a method table; and sorting a local variable table.
In some examples, generating validation information for verifying further software releases includes computing a checksum value for the edited software release.
In some examples the method further comprises uploading to a repository a verifiable software release distribution package, the verifiable software release distribution package containing: a source code of the edited software release; the validation information; and documentation.
In some aspects, the present disclosure describes a system for generating a verifiable software release. The system comprises a processor device and a memory stores machine-executable instructions which, when executed by the processor device, cause the system to perform a number of steps. A build intervention is performed during build runtime of a build tool configured to build a software release in a Java runtime environment by: instrumenting the build tool to enable interception of build instructions of the build tool; intercepting one or more build instructions causing non-determinism in the software release using a class transformer; modifying the one or more build instructions to remove the non-determinism using the class transformer, thereby generating one or more modified build instructions; and storing the one or more modified build instructions, such that the build tool uses the one or more modified build instructions to build a modified software release without the non-determinism caused by the one or more build instructions.
In some aspects, the present disclosure describes a non-transitory processor-readable medium having machine-executable instructions stored thereon which, when executed by a processor of a device, cause the device to perform a number of steps. A build intervention is performed during build runtime of a build tool configured to build a software release in a Java runtime environment. The build intervention comprises: instrumenting the build tool to enable interception of build instructions of the build tool; intercepting one or more build instructions causing non-determinism in the software release using a class transformer; modifying the one or more build instructions to remove the non-determinism using the class transformer, thereby generating one or more modified build instructions; and storing the one or more modified build instructions, such that the build tool uses the one or more modified build instructions to build a  modified software release without the non-determinism caused by the one or more build instructions.
In some aspects, the present disclosure describes a processor-readable medium having machine-executable instructions stored thereon which, when executed by a processor of a device, cause the device to perform the steps of the above-mentioned methods.
In some aspects, the present disclosure describes a computer program comprising instructions which, when the program is executed by a computer, cause the computer to perform the steps of the above-mentioned methods.
BRIEF DESCRIPTION OF THE DRAWINGS
Reference will now be made, by way of example, to the accompanying drawings which show example embodiments of the present application, and in which:
FIG. 1 is a block diagram of an example computing system which may be used to implement examples of the present disclosure;
FIG. 2 is a is a block diagram illustrating an example Java Virtual Machine architecture, in accordance with examples of the present disclosure;
FIG. 3 is a flowchart illustrating an example method for building a verifiable software release in a Java Runtime Environment (JRE) , in accordance with examples of the present disclosure;
FIG. 4 is a flowchart illustrating an example method for intercepting and modifying one or more build instructions using a class transformer, in accordance with examples of the present disclosure; and
FIG. 5 is a flowchart illustrating an example method for editing a modified software release to generate an edited software release, in accordance with examples of the present disclosure.
Similar reference numerals may have been used in different figures to denote similar components.
DETAILED DESCRIPTION
In various examples, the present disclosure describes methods and systems that help to address the problem of Java software release verifiability, by automatically removing sources of non-equivalence during software builds. More specifically, a class transformer is used to intercept non-deterministic method invocation during java software builds such that sources of non-determinism are removed by modifying the bytecode of . class files of the build tool. A bytecode editor is then used to remove further sources of non-determinism in the generated build artifact. Validation information is obtained for the edited software release and accompanying documentation verifies legitimate remaining non-equivalences in the verifiable software release.
To assist in understanding the present disclosure, some terminology is first introduced. The Java platform is a collection of programs to enable the development and execution of Java programs. Included in the Java platform is an execution engine (e.g. the Java Virtual Machine or JVM) , a compiler and libraries. The compiler converts Java source code into bytecode, an intermediate representation of the Java program that can be understood by the JVM and which is stored as a . class file. The JVM then executes the bytecode. The Java Runtime Environment (JRE) is another component of the Java platform that introduces a just-in-time compiler to convert bytecode during runtime. Using the JRE, . class files are dynamically loaded into memory by the Java Class Loader.
The Java Instrumentation API is a tool that enables the instrumentation and modification of bytecode to existing Java classes by Java Agents. A Java Agent is a . class file that leverages the Java Instrumentation API and contains instructions to intercept applications running on the JVM and make changes to the bytecode of these applications. A Java Agent can be initiated at application startup, or dynamically during application runtime.
In the present disclosure, a “software build instance” is defined as a discrete initiation of a software build process resulting in the generation of a software release. In the present disclosure, a “software release” is defined as a compiled output of a software build process, or a build artifact. Build artifacts are the files produced by a build process, and may also be referred to as software binaries.
In the present disclosure, a “verifiable software release” is defined as a software release in which sources of non-determinism resulting from the build process or build infrastructure are removed, and any remaining non-equivalencies can be explained and documented as being legitimate or necessary (e.g. for security purposes) . In computer programming, a nondeterministic build process is a build process that produces different build artifacts from separate build instances despite using the same source code, build scripts and build environment. One reason that nondeterministic algorithms exhibit different behaviors is due to their probabilistic nature, in that they employ elements of randomness in their logic. Sources of non-determinism in build artifacts may include timestamps, build path and file ordering, among others.
Software build verifiability maximizes the equivalence between two build artifacts generated independently by two different build instances, where any remaining non-equivalences are identified and interpreted to be legitimate and documented accordingly. The process of building verifiable software packages includes two aspects. The first process, known as the deterministic build process, aims to eliminate sources of non-determinism in the build process or introduced by build infrastructures that may contribute to non-equivalences in build artifacts. For example, insertion of a timestamp during the build process will cause non-equivalences, since timestamps inserted during two separate build instances will be different. The second process, known as the explainable build process, aims to identify and interpret non-equivalences in the build artifacts that are legitimate and should not be removed.
To verify builds, the contents of a distributed software package can be compared to the contents of a trusted software package. Similarly, multiple  independent developers may use a trusted build infrastructure and share the results of the build process for other developers to compare against. In the situation where malicious code has been silently inserted into the build process, the contents of the final distribution will be different from those built with a trusted infrastructure.
In the present disclosure, a “verifiable software release distribution package” includes a source code of the verifiable software release, validation information that may be used to verify the contents of the verifiable software release and any associated documentation describing remaining sources of non-equivalence or that may be required to install or use the software release.
To assist in understanding the present disclosure, some existing technologies are first discussed.
For example, for the Java programming language, current tools available to account for non-equivalences in build artifacts include plug-ins directed to automated build tools. A disadvantage of this approach is that these plug-ins require that the build infrastructure (e.g., compilers, class loaders, and other related tools) be configured and upgraded to specific versions that support the plug-ins, rendering them unsuitable to legacy code. Another disadvantage is that these plug-ins only address limited sources of non-determinism associated with specific automated build tools, making them too specific to be applied in a generalist and heterogeneous setting typically found in large software systems.
Another existing approach to producing verifiable software builds includes automated tools directed at removing sources of non-determinism during the build process, however they are directed at C and C++ programming languages and require the interception of build instructions at the operating system level. Java-based applications use automated build tools that run over the JVM and as such, many sources of non-determinism are introduced through the interaction between those build tools and the JVM. Addressing non-determinism at the operating system level is not appropriate for Java software, instead, build instructions would need to be intercepted at the JVM level. Existing software tools  for producing verifiable software builds are not able to directly interact with the JVM.
The present disclosure describes examples that may help to address some or all of the above drawbacks of existing technologies.
FIG. 1 is a block diagram illustrating a simplified example implementation of a computing system 100 that is suitable for implementing embodiments described herein. Examples of the present disclosure may be implemented in other computing systems, which may include components different from those discussed below. The computing system 100 may be used to execute instructions for generating a verifiable software build, using any of the examples described above. The computing system 100 may also be used to execute the verifiable software release, or the verifiable software release may be executed by another computing system.
Although FIG. 1 shows a single instance of each component, there may be multiple instances of each component in the computing system 100. Further, although the computing system 100 is illustrated as a single block, the computing system 100 may be a single physical machine or device (e.g., implemented as a single computing device, such as a single workstation, single consumer device, single server, etc. ) , or may comprise a plurality of physical machines or devices (e.g., implemented as a server cluster) . For example, the computing system 100 may represent a group of servers or cloud computing platform providing a virtualized pool of computing resources (e.g., a virtual machine, a virtual server) .
The computing system 100 includes at least one processor 102, such as a central processing unit, a microprocessor, a digital signal processor, an application-specific integrated circuit (ASIC) , a field-programmable gate array (FPGA) , a dedicated logic circuitry, a dedicated artificial intelligence processor unit, a graphics processing unit (GPU) , a tensor processing unit (TPU) , a neural processing unit (NPU) , a hardware accelerator, or combinations thereof.
The computing system 100 may include an optional input/output (I/O) interface 104, which may enable interfacing with an optional input device 106  and/or optional output device 108. In the example shown, the optional input device 106 (e.g., a keyboard, a mouse, a microphone, a touchscreen, and/or a keypad) and optional output device 108 (e.g., a display, a speaker and/or a printer) are shown as optional and external to the computing system 100. In other example embodiments, there may not be any input device 106 and output device 108, in which case the I/O interface 104 may not be needed.
The computing system 100 may include an optional network interface 110 for wired or wireless communication with other computing systems (e.g., other computing systems in a network) . The network interface 110 may include wired links (e.g., Ethernet cable) and/or wireless links (e.g., one or more antennas) for intra-network and/or inter-network communications.
The computing system 100 may include one or more memories 112 (collectively referred to as “memory 112” ) , which may include a volatile or non-volatile memory (e.g., a flash memory, a random access memory (RAM) , and/or a read-only memory (ROM) ) . The non-transitory memory 112 may store instructions for execution by the processor 102, such as to carry out examples described in the present disclosure. For example, the memory 112 may store instructions for implementing any of the networks and methods disclosed herein. The memory 112 may include other software instructions, such as for implementing an operating system (OS) 114 and other applications/functions, such as a Virtual Machine 116, including a Java Virtual Machine.
The memory 112 may also store other data, information, rules, policies, and machine-executable instructions described herein.
In some examples, the computing system 100 may also include one or more electronic storage units (not shown) , such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive. In some examples, data and/or instructions may be provided by an external memory (e.g., an external drive in wired or wireless communication with the computing system 100) or may be provided by a transitory or non-transitory computer-readable medium. Examples of non-transitory computer readable media include a RAM, a ROM, an erasable programmable ROM (EPROM) , an electrically erasable programmable  ROM (EEPROM) , a flash memory, a CD-ROM, or other portable memory storage. The storage units and/or external memory may be used in conjunction with memory 112 to implement data storage, retrieval, and caching functions of the computing system 100. The components of the computing system 100 may communicate with each other via a bus, for example.
FIG. 2 is a block diagram illustrating an example system with a Java Agent deployed on a Java Virtual Machine (JVM) architecture 200 that may be used to implement methods to modify Java software releases to improve verifiability, in accordance with examples of the present disclosure.
In some examples, a Java Instrumentation Application Programming Interface (API) 202 is used to instrument a build tool 220 to enable interception of build instructions 224 of the build tool 220. A build tool 220 may be a Java Application, including automated build tools such as Apache Maven or Gradle. The build tool may include one or more . class files containing Java bytecode, which is parsed by a class loader 210 running in the JVM 200 to generate build instructions 224. The build tool 220 is invoked to build Java Application source code 226 into a software release, using some combination of build instructions 224 such as packing instructions and compilation instructions. In a conventional build environment, the Java Application Source Code 226 is compiled, packed, and otherwise built into a conventional software release including one or more . class files containing Java bytecode. However, a conventional software release built in such a conventional environment may contain indeterminacies, as described above.
In some examples, when a software build instance using a build tool 220 is initiated in a Java runtime environment (JRE) , an instance of JVM 200 is initiated. The. class files of the build tool 220 are loaded dynamically into the class loader 210 of the JVM 200 to generate the build instructions 224. The build tool 220 is instrumented to enable interception by a class transformer 204, which is loaded into the JVM 200 and intercepts one or more build instructions 224 as each . class file is dynamically loaded and parsed during runtime by the class loader 210. The class transformer 204 may then modify the bytecode of one or more of the build instructions 224 to generate modified build instructions 212. The class transformer  contains information about which . class files of the build tool 220 to intercept, as well as how to modify the bytecode of the loaded . class files. In the present disclosure, a non-deterministic method profile 208 may be assembled in advance of initiating the build tool 220 to inform the class transformer 204 which . class files to intercept.
Some or all of the (unmodified) build instructions 224 may be written to memory 214 prior to execution. Modified build instructions 212 in the form of modified bytecode may also be loaded into memory 214 to replace or overwrite the corresponding unmodified build instructions 224. The build instructions 224 and modified build instructions 212 stored in the memory 214 are then executed by the execution engine 216 to generate a modified software release 230, which may contain one or more modified . class files. The software build instance may output the modified software release 230 in the form of a compressed archive file. The compressed archive file may be packaged as one of a . jar file, an . ear file or a . war file.
FIG. 3 is a flowchart illustrating an example method 300 for building a verifiable software release in a JRE, in accordance with examples of the present disclosure. The method 300 may be performed by the computing system 100. For example, the processor 102 may execute computer readable instructions (which may be stored in the memory 112) to cause the computing system 100 to perform the method 400. The method 400 may be performed using a single physical machine (e.g., a workstation or server) , a plurality of physical machines working together (e.g., a server cluster) , or cloud-based resources (e.g., using virtual resources on a cloud computing platform) .
Method 300 begins with step 302 in which a build tool 220 configured to build a software release is instrumented to enable the interception of build instructions 224 of the build tool 220. The build tool 220 is instrumented using the Java Instrumentation API 202 to engage a class transformer 204 upon initiation of a software build instance. The class transformer is a special type of . class file that is able to intercept and modify applications running on a JVM.
At step 304, one or more non-deterministic build instructions of the build tool 220 are intercepted using the class transformer 204. At step 306, one or more non-deterministic build instructions 224 of the build tool 220 are modified to remove the non-determinism using the class transformer 204. Further details about steps 304 and 306 are provided in the discussion of method 400 depicted in FIG. 4.
FIG. 4 is a flowchart illustrating an example method 400 for intercepting and modifying one or more build instructions using a class transformer, in accordance with examples of the present disclosure. The method 400 may be performed by the computing system 100. For example, the processor 102 may execute computer readable instructions (which may be stored in the memory 112) to cause the computing system 100 to perform the method 400. The method 400 may be performed using a single physical machine (e.g., a workstation or server) , a plurality of physical machines working together (e.g., a server cluster) , or cloud-based resources (e.g., using virtual resources on a cloud computing platform) .
Method 400 begins with step 402, in which a non-deterministic method profile 208 of one or more non-deterministic methods contained within the build tool 220 is obtained and incorporated into the class transformer 204. During software build runtime, various non-deterministic methods may be invoked that introduce non-deterministic values into the software build process, and therefore result in non-equivalences between build artifacts. An example of such a non-deterministic method is the method “java/lang/System#currentTimeMillis () ” which returns the current system timestamp. The insertion of a timestamp during the build process will cause non-equivalences between build artifacts since timestamps inserted during two separate software build instances will be different. The methods included in the non-deterministic method profile 208 may be obtained manually through reviewing software documentation.
Once a non-deterministic method profile 208 is obtained and incorporated into the class transformer 204, at step 404 a software build instance may be initiated on a JVM 200 using an instrumented build tool 220. Initiating a  software build instance may initialize the JVM 200. At step 406, initializing the JVM 200 also registers the class transformer 204 by calling the premain method of the Java agent containing the class transformer 204. The class transformer 204 is loaded into the JVM 200 and the JVM 200 proceeds to call the main method of the Java Application (e.g. the build tool 220) to begin executing the software build.
The class loader 210 proceeds to load the . class files of the build tool 220, and as described at step 408, the class transformer 204 can check the . class files of the build tool 220 running on the JVM 200 with the non-deterministic method profile 208 to locate one or more . class files of the build tool 220 containing non-deterministic methods. The class transformer 204 may then directly modify the bytecode of one or more . class files of the build tool 220 by replacing the original bytecode with alternate bytecode that removes the sources of non-determinism. For example, at step 410 the class transformer 204 may replace one or more time-related methods in one or more . class files of the build tool 220 with a customized method that returns pre-defined values for timestamps. The customized methods receive the same arguments as the original methods and are pushed into the stack at every method call, returning a fixed, predefined value that can be obtained from environment variables or can be directly returned as a scalar defined within the customized method.
In another example, at step 412 the class transformer 204 may modify the bytecode of one or more . class files of the build tool 220 to disable a cache mechanism associated with the build tool 220. For example, the Tomcat Jasper compiler may, by default, enable caching functions, which can introduce a source of non-equivalence in build artifacts. The class transformer 204 may modify the bytecode of the Jasper tool at build runtime to disable the caching mechanism.
Returning to FIG. 3, the modified bytecode from steps 410 and 412 are assembled at step 308 to generate a modified build instructions 212. The modified build instructions 212 may then be stored in the class transformer 204, where they are easily accessible to be appended in future build instances if new sources of non-determinism are identified in the build tool 220.
At step 310 the Java software build instance may proceed to build a modified software release 230 without non-determinism caused by one or more build instructions 224, using the one or more modified build instructions 212. Modified build instructions 212 in the form of modified bytecode of one or more . class files of the build tool 220, as well as unmodified build instructions 224, may be loaded into memory 214 by the class loader 210 and executed by the execution engine 216. The software build instance may output a modified software release 230 in the form of a compressed archive file. The compressed archive file may be packaged as one of a . jar file, an . ear file or a . war file.
Additional sources of non-determinism may be introduced into the software release when the bytecode is compiled. Therefore, in step 312 a bytecode editor may be used to further edit the modified software release 230 by removing one or more non-deterministic portions, to generate an edited software release. Further details about step 312 is provided in the discussion of method 500 depicted in FIG. 5.
FIG. 5 is a flowchart illustrating an example method 500 for editing a modified software release to generate an edited software release, in accordance with examples of the present disclosure. The method 500 may be performed by the computing system 100. For example, the processor 102 may execute computer readable instructions (which may be stored in the memory 112) to cause the computing system 100 to perform the method 400. The method 400 may be performed using a single physical machine (e.g., a workstation or server) , a plurality of physical machines working together (e.g., a server cluster) , or cloud-based resources (e.g., using virtual resources on a cloud computing platform) .
Method 500 begins with step 502, in which the modified software release 230 is inspected for non-equivalences. For example, a checksum of the modified software release may be computed and compared to the checksum of a trusted software release. During the execution of the build instructions at step 312, additional sources of non-determinism may be introduced through build path and file ordering, among others.
To address additional sources of non-determinism in the modified software release 230, a bytecode editor is initiated at step 504. The bytecode editor proceeds to unpack the compressed archive files of the modified software release 230 at step 506. Compressed archive files may be packaged as one of a . jar file, an . ear file or a . war file and unpacking these compressed archive files may generate an output directory containing folders and uncompressed data.
At step 508, one or more . class files and metadata, including MANIFEST. MF and other text files of the modified software release may be extracted. These . class files and metadata may then be examined at step 510 to locate non-deterministic portions. For example, non-deterministic portions may include duplicate values in constant tables, differences in the ordering of values in constant, method and variable tables introduced by the software build instance, and time attributes, among others.
The bytecode editor proceeds to edit one or more extracted . class files of the modified software release 230 to remove non-deterministic portions at step 512. Editing the one or more . class files may include deduplicating a constant table to make all constant entries unique, sorting a constant table, sorting a method table and sorting a local variable table.
At step 514, additional non-deterministic portions are then removed from the extracted metadata, including MANIFEST. MF and other text files. For example, time attributes may be removed from the metadata.
The edited . class files and metadata are then repacked into a compressed archive file in step 516, to generate an edited software release. Compressed archive files may be packaged as one of a . jar file, an . ear file or a . war file.
Returning to FIG. 3, step 314 identifies and generates documentation documenting any remaining non-deterministic portions in the edited software release that were not removed in any of the preceding steps. For example, legitimate sources of non-determinism may need to remain in the edited software release for security requirements, such as random signature keys or necessary encryption that may be present in distributed software packages. The remaining  non-deterministic portions are explained in accompanying documentation before being removed for the purpose of obtaining validation information for the edited software release.
At step 316, validation information including a checksum value may be computed for the edited software release for the purposes of verifying the contents of the edited software release. The checksum value may be compared against checksum values that are generated and published by a trusted source, or against checksum values for further software releases to build confidence that the contents of the software release are trusted and free from malicious code. A verifiable software release distribution package may then be uploaded to a software repository, where the software release distribution package contains source code of the edited software release, the validation information and the documentation documenting legitimate non-deterministic portions in the software release.
Although the present disclosure describes methods and processes with steps in a certain order, one or more steps of the methods and processes may be omitted or altered as appropriate. One or more steps may take place in an order other than that in which they are described, as appropriate.
Although the present disclosure is described, at least in part, in terms of methods, a person of ordinary skill in the art will understand that the present disclosure is also directed to the various components for performing at least some of the aspects and features of the described methods, be it by way of hardware components, software or any combination of the two. Accordingly, the technical solution of the present disclosure may be embodied in the form of a software product. A suitable software product may be stored in a pre-recorded storage device or other similar non-volatile or non-transitory computer readable medium, including DVDs, CD-ROMs, USB flash disk, a removable hard disk, or other storage media, for example. The software product includes instructions tangibly stored thereon that enable a computing system to execute examples of the methods disclosed herein. The machine-executable instructions may be in the form of code sequences, configuration information, or other data, which, when executed, cause  a machine (e.g., a processor or other processing unit) to perform steps in a method according to examples of the present disclosure.
The present disclosure may be embodied in other specific forms without departing from the subject matter of the claims. The described example embodiments are to be considered in all respects as being only illustrative and not restrictive. Selected features from one or more of the above-described embodiments may be combined to create alternative embodiments not explicitly described, features suitable for such combinations being understood within the scope of this disclosure.
All values and sub-ranges within disclosed ranges are also disclosed. Also, although the systems, devices and processes disclosed and shown herein may comprise a specific number of elements/components, the systems, devices and assemblies could be modified to include additional or fewer of such elements/components. For example, although any of the elements/components disclosed may be referenced as being singular, the embodiments disclosed herein could be modified to include a plurality of such elements/components. The subject matter described herein intends to cover and embrace all suitable changes in technology.

Claims (28)

  1. A method for generating a verifiable software release comprising:
    performing a build intervention during build runtime of a build tool configured to build a software release in a Java runtime environment, the build intervention comprising:
    instrumenting the build tool to enable interception of build instructions of the build tool;
    intercepting one or more build instructions causing non-determinism in the software release using a class transformer;
    modifying the one or more build instructions to remove the non-determinism using the class transformer, thereby generating one or more modified build instructions; and
    storing the one or more modified build instructions,
    such that the build tool uses the one or more modified build instructions to build a modified software release without the non-determinism caused by the one or more build instructions.
  2. The method of claim 1, further comprising:
    editing the modified software release to remove one or more non-deterministic portions using a bytecode editor, thereby generating an edited software release.
  3. The method of claim 2, further comprising:
    generating documentation documenting one or more remaining non-deterministic portions in the edited software release.
  4. The method of claim 3, further comprising:
    generating validation information for verifying further software releases generated.
  5. The method of any one of claims 1 to 4 wherein intercepting the one or more  build instructions includes:
    obtaining a non-deterministic method profile of one or more non-deterministic methods contained in the build tool;
    loading the class transformer into a Java Virtual Machine; and
    locating one or more . class files of the build tool running on the Java Virtual Machine containing non-deterministic methods, using the non-deterministic method profile.
  6. The method of any one of claims 1 to 5 wherein the class transformer uses the Java Instrumentation Application Programming Interface (API) to intercept and modify the one or more build instructions.
  7. The method of any one of claims 1 to 6 wherein modifying the one or more build instructions includes:
    modifying the bytecode of one or more . class files of the build tool to replace one or more time-related methods with a customized method that returns pre-defined values for timestamps; and
    modifying the bytecode of one or more . class files of the build tool to disable a cache mechanism of a compiler.
  8. The method of claim 7 wherein building the modified software release includes:
    loading the modified build instructions into memory; and
    executing the modified build instructions to generate a compressed archive
    file based on the compiled modified build instructions.
  9. The method of claim 8 wherein the compressed archive file is packaged as one of:
    a . jar file;
    a . ear file; and
    a . war file.
  10. The method of claim 8 or 9 wherein editing the modified software release to remove one or more non-deterministic portions using a bytecode editor includes:
    unpacking the compressed archive file of the modified software release;
    extracting one or more modified. class files and a metadata;
    examining the one or more modified. class files and the metadata to locate one or more non-deterministic portions; and
    editing the one or more modified. class files and the metadata to remove the one or more non-deterministic portions to generate a repackaged archive file containing one or more edited. class files and an edited metadata.
  11. The method of claim 10, wherein editing the one or more modified. class files includes:
    deduplicating a constant table to make all constant entries unique;
    sorting the constant table;
    sorting a method table; and
    sorting a local variable table.
  12. The method of claim 4 wherein generating validation information for verifying further software releases includes:
    computing a checksum value for the edited software release.
  13. The method of claim 4 or claim 12 further comprising uploading to a repository a verifiable software release distribution package, the verifiable software release distribution package containing:
    a source code of the edited software release;
    the validation information; and
    the documentation.
  14. A system for generating a verifiable software release, the system comprising:
    a processor device; and
    a memory storing machine-executable instructions which, when executed by the processor device, cause the system to:
    perform a build intervention during build runtime of a build tool configured to build a software release in a Java runtime environment by:
    instrumenting the build tool to enable interception of build instructions of the build tool;
    intercepting one or more build instructions causing non-determinism in the software release using a class transformer;
    modifying the one or more build instructions to remove the non-determinism using the class transformer, thereby generating one or more modified build instructions; and
    storing the one or more modified build instructions,
    such that the build tool uses the one or more modified build instructions to build a modified software release without the non-determinism caused by the one or more build instructions.
  15. The system of claim 14, wherein the machine-executable instructions, when executed by the processor device, further cause the system to:
    edit the modified software release to remove one or more non-deterministic portions using a bytecode editor, thereby generating an edited software release.
  16. The system of claim 15, wherein the machine-executable instructions, when executed by the processor device, further cause the system to:
    generate documentation documenting one or more remaining non-deterministic portions in the edited software release.
  17. The system of claim 16, wherein the machine-executable instructions, when executed by the processor device, further cause the system to:
    generate validation information for verifying further software releases generated.
  18. The system of any one of claims 14 to 17 wherein the machine-executable instructions, when executed by the processor device to intercept the one or more build instructions, further cause the system to:
    obtain a non-deterministic method profile of one or more non-deterministic methods contained in the build tool;
    load the class transformer into a Java Virtual Machine; and
    locate one or more. class files of the build tool running on the Java Virtual Machine containing non-deterministic methods, using the non-deterministic method profile.
  19. The system of any one of claims 14 to 18, wherein:
    the class transformer uses the Java Instrumentation Application Programming Interface (API) to intercept and modify the one or more build instructions.
  20. The system of any one of claims 14 to 19 wherein the machine-executable instructions, when executed by the processor device to modify the one or more build instructions, further cause the system to:
    modify the bytecode of one or more. class files of the build tool to replace one or more time-related methods with a customized method that returns pre-defined values for timestamps; and
    modify the bytecode of one or more. class files of the build tool to disable a cache mechanism of a compiler.
  21. The system of claim 20 wherein the machine-executable instructions, when executed by the processor device to build the modified software release, further cause the system to:
    load the modified build instructions into memory; and
    execute the modified build instructions to generate a compressed archive
    file based on the compiled modified build instructions.
  22. The system of claim 21, wherein the machine-executable instructions, when executed by the processor device, further cause the system to:
    package the compressed archive file as one of:
    a. jar file;
    a. ear file; and
    a. war file.
  23. The system of claim 21 or 22, wherein the machine-executable instructions, when executed by the processor device to edit the modified software release to remove one or more non-deterministic portions using a bytecode editor, further cause the system to:
    unpack the compressed archive file of the modified software release;
    extract one or more modified. class files and a metadata;
    examine the one or more modified. class files and the metadata to locate one or more non-deterministic portions; and
    edit the one or more modified. class files and the metadata to remove the one or more non-deterministic portions to generate a repackaged archive file containing one or more edited. class files and an edited metadata.
  24. The system of claim 23, wherein the machine-executable instructions, when executed by the processor device to edit the one or more modified. class files, further cause the system to:
    deduplicate a constant table to make all constant entries unique;
    sort the constant table;
    sort a method table; and
    sort a local variable table.
  25. The system of claim 17, wherein the machine-executable instructions, when executed by the processor device to generate validation information for verifying further software releases, further cause the system to:
    compute a checksum value for the edited software release.
  26. The system of claim 17 or 25, wherein the machine-executable instructions, when executed by the processor device, further cause the system to:
    upload to a repository a verifiable software release distribution package, the  verifiable software release distribution package containing:
    a source code of the edited software release;
    the validation information; and
    the documentation.
  27. A processor-readable medium having machine-executable instructions stored thereon which, when executed by a processor of a device, cause the device to perform the steps of the method of any one of claims 1 to 13.
  28. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to perform the steps of the method of any one of claims 1 to 13.
PCT/CN2021/123961 2021-10-15 2021-10-15 Methods and systems for generating verifiable software releases WO2023060525A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/123961 WO2023060525A1 (en) 2021-10-15 2021-10-15 Methods and systems for generating verifiable software releases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/123961 WO2023060525A1 (en) 2021-10-15 2021-10-15 Methods and systems for generating verifiable software releases

Publications (1)

Publication Number Publication Date
WO2023060525A1 true WO2023060525A1 (en) 2023-04-20

Family

ID=85987197

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/123961 WO2023060525A1 (en) 2021-10-15 2021-10-15 Methods and systems for generating verifiable software releases

Country Status (1)

Country Link
WO (1) WO2023060525A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130111449A1 (en) * 2011-10-26 2013-05-02 International Business Machines Corporation Static analysis with input reduction
CN105094939A (en) * 2015-07-16 2015-11-25 南京富士通南大软件技术有限公司 Method for realizing static analysis of software source files based on Makefile automatic compilation technology
US20160170726A1 (en) * 2014-12-11 2016-06-16 Samsung Electronics Co., Ltd. Compiler
CN112422581A (en) * 2020-11-30 2021-02-26 杭州安恒信息技术股份有限公司 Webshell webpage detection method, device and equipment in JVM (Java virtual machine)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130111449A1 (en) * 2011-10-26 2013-05-02 International Business Machines Corporation Static analysis with input reduction
US20160170726A1 (en) * 2014-12-11 2016-06-16 Samsung Electronics Co., Ltd. Compiler
CN105094939A (en) * 2015-07-16 2015-11-25 南京富士通南大软件技术有限公司 Method for realizing static analysis of software source files based on Makefile automatic compilation technology
CN112422581A (en) * 2020-11-30 2021-02-26 杭州安恒信息技术股份有限公司 Webshell webpage detection method, device and equipment in JVM (Java virtual machine)

Similar Documents

Publication Publication Date Title
US8726255B2 (en) Recompiling with generic to specific replacement
US10402208B2 (en) Adaptive portable libraries
US8707263B2 (en) Using a DSL for calling APIS to test software
EP3147783B1 (en) Automatic determination of compiler configuration
US20080229278A1 (en) Component-based development
KR101740604B1 (en) Generic unpacking of applications for malware detection
US20170024230A1 (en) Method, apparatus, and computer-readable medium for ofuscating execution of an application on a virtual machine
CN110059456B (en) Code protection method, code protection device, storage medium and electronic equipment
US9207920B2 (en) Systems and methods for remote analysis of software applications
EP3455736B1 (en) Dynamically sized locals with precise garbage collection reporting
CN109255235B (en) Mobile application third-party library isolation method based on user state sandbox
CN108595187A (en) Method, device and the storage medium of Android installation kit integrated software development kit
CN114610318A (en) Android application packaging method, device, equipment and storage medium
US10572275B2 (en) Compatible dictionary layout
EP3147781A1 (en) Wrapper calls identification
Krüger et al. Cognicrypt gen: generating code for the secure usage of crypto apis
CN111435312A (en) Application program management method and device and electronic equipment
WO2023060525A1 (en) Methods and systems for generating verifiable software releases
KR102439778B1 (en) Application converting apparatus and method for improving security
KR20180028666A (en) Method and apparatus for preventing reverse engineering
CN115576560A (en) Hot reloading method, device, equipment and medium for dynamic loader
Mohsin WGSLsmith: a random generator of WebGPU shader programs
US9612808B1 (en) Memory use for string object creation
CN114398102A (en) Application package generation method and device, compiling server and computer readable storage medium
David et al. Quack: Hindering deserialization attacks via static duck typing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21960261

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE