WO2023154092A1 - Dynamically overriding a function based on a capability set during load time - Google Patents
Dynamically overriding a function based on a capability set during load time Download PDFInfo
- Publication number
- WO2023154092A1 WO2023154092A1 PCT/US2022/050013 US2022050013W WO2023154092A1 WO 2023154092 A1 WO2023154092 A1 WO 2023154092A1 US 2022050013 W US2022050013 W US 2022050013W WO 2023154092 A1 WO2023154092 A1 WO 2023154092A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- function
- implementation
- memory address
- capability
- executable image
- Prior art date
Links
- 230000006870 function Effects 0.000 claims abstract description 202
- 238000000034 method Methods 0.000 claims description 45
- 238000003066 decision tree Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 description 18
- 230000008569 process Effects 0.000 description 11
- 238000006467 substitution reaction Methods 0.000 description 9
- 230000005540 biological transmission Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 101100285899 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) SSE2 gene Proteins 0.000 description 1
- 230000002730 additional effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 201000003231 brachydactyly type D Diseases 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000013403 standard screening design Methods 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44536—Selecting among different versions
- G06F9/44542—Retargetable
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/76—Adapting program code to run in a different environment; Porting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/36—Software reuse
Definitions
- processor instruction set architecture such as the X64 ISA (sometimes referred to as “AMD64”, “x86-64”, “AMD x86-64”, or “Intel64”) or the ARM64 ISA (sometimes referred to as “AArch64”) — two different processors may support different architectural extensions (e.g., additional instructions).
- the X64 ISA includes extensions such as Enhanced REP MOVSB and STOSB operation (ERMSB); Advanced Vector Extensions (AVX) such as AVX2, AVX-512, AVX-VNNI, etc.; Streaming SIMD Extensions (SSE) such as SSE2, SSE3, SSE4, etc.; Supplemental Streaming SIMD Extensions (SSSE) such as SSSE3; etc.
- the ARM64 ISA includes extensions such as interlocked intrinsics.
- different computer systems may support different OS and security features depending on OS version, software license, available hardware capability, etc. Additionally, different computer systems may include additional hardware functionality, such as hardware accelerators (e.g., GPUs, custom application-specific integrated circuits).
- a compiler toolchain it is beneficial for a compiler toolchain to produce executable code that has been optimized for a given set of one or more capabilities; for example, a commonly used function like memcpyO can be implemented more efficiently based on whether or not optional extensions, such as AVX-512, ERMSB, etc., are available.
- optional extensions such as AVX-512, ERMSB, etc.
- executable code that relies on these optional extensions would be nonfunctional on computer systems that lack those extensions.
- One mechanism is for a function, itself, to perform checks at runtime for available capabilities of the computer system on which the code is executing (e.g., based on one or more global variables), and to directly branch an appropriate implementation of that function based on those checks.
- these runtime checks incur processing overheads, and use of conditional branches in this manner can lead to branch mispredictions and further processing overheads.
- Another mechanism is to call the function via an indirect branch instruction, where an argument of the branch instruction is a location (e.g., a register) that specifies where the destination address of the next instruction is located. Then, the destination address is determined at runtime (e.g., via a virtual lookup) based on computer system capability.
- indirect branches incur significant overheads in the lookup, and in additional security checks.
- the embodiments described herein are directed to dynamically overriding a function based on a capability set during loading of an executable image into memory, based on replacing a first destination address of a default function implementation with a second destination address of a function implementation that is determined based on a capability set of the computer system on which the executable image is being loaded.
- a destination address e.g., of a direct call instruction, or of a jump instruction
- a destination address is changed at load time, and prior to code execution, so that a function does not need to perform runtime checks, and so that a lookup for an indirect branch is not needed.
- a compiler toolchain when compiling the executable image, includes a plurality of implementations of a function, including a default implementation that executes on all target computer system regardless of capability, and one or more optimized implementations that each relies one or more specific capabilities.
- the compiler toolchain then inserts an address of the default implementation of the function into a binary, and includes function override metadata that enables an operating system (OS) loader to “patch-up” (or fix-up) the binary to include an address of an optimized implementation of the function while loading the executable image into memory.
- OS operating system
- the binary executes the optimized implementation of the function, rather than the default implementation of the function.
- the embodiments herein can dramatically speed up program execution, particularly if the called function is relatively small and called frequently. For example, if a tight loop calls a function that is implemented using the embodiments herein, calls to an optimized implementation of the function can be made with a simple direct call. Without the embodiments herein, each iteration of the loop would incur either the overhead of the function performing a runtime check for computer system capability and a potential branch misprediction, or the overhead of a destination memory address lookup and additional security checks.
- the techniques described herein relate to a method, implemented at a computer system that includes a processor, for dynamically overriding a function based on a capability set, the method including: reading a portion of an executable image file, the portion including a first memory address corresponding to a first callee function implementation, the first memory address having been inserted into the portion by a compiler toolchain; based on extensible metadata included in the executable image file, and based on a capability set that is specific to the computer system, determining a second memory address corresponding to a second callee function implementation; and before execution of the portion, modifying the portion to replace the first memory address with the second memory address.
- the techniques described herein relate to a computer system for dynamically overriding a function based on a capability set, including: a processor; and a computer storage media that stores computer-executable instructions that are executable by the processor to cause the computer system to at least: read a portion of an executable image file, the portion including a first memory address corresponding to a first callee function implementation, the first memory address having been inserted into the portion by a compiler toolchain; based on extensible metadata included in the executable image file, and based on a capability set that is specific to the computer system, determine a second memory address corresponding to a second callee function implementation; and before execution of the portion, modify the portion to replace the first memory address with the second memory address.
- the techniques described herein relate to a computer program product including a computer storage media that stores computer-executable instructions that are executable by a processor to cause a computer system to dynamically override a function based on a capability set, the computer-executable instructions including instructions that are executable by the processor to cause the computer system to at least: read a portion of an executable image file, the portion including a first memory address corresponding to a first callee function implementation, the first memory address having been inserted into the portion by a compiler toolchain; based on extensible metadata included in the executable image file, and based on a capability set that is specific to the computer system, determine a second memory address corresponding to a second callee function implementation; and before execution of the portion, modify the portion to replace the first memory address with the second memory address.
- Figure 1 illustrates an example computer architecture that facilitates dynamically overriding a function based on a capability set
- Figure 2 illustrates an example of an executable image that includes a default implementation, and potentially one or more optimized implementations of at least one function
- Figure 3 A illustrates an example of calling a capability-optimized function directly
- Figure 3B illustrates an example of using a thunk function to call a capability-optimized function
- Figure 3C illustrates an example of using multiple thunk functions with multiple sets of call sites
- Figure 4 illustrates a flow chart of an example method for dynamically overriding a function based on a capability set
- Figure 5 illustrates an example of a binary decision diagram
- Figure 1 illustrates an example computer architecture 100 that facilitates dynamically overriding a function based on a capability set.
- computer architecture 100 includes a computer system 101 comprising a processor 102 (or a plurality of processors), a memory 103, and one or more computer storage media (storage media 104), all interconnected by a bus 106.
- Computer system 101 may also include a network interface 105 for interconnecting (via a network 107) to one or more remote computer systems (e.g., computer system 108).
- the storage media 104 is illustrated as storing computer-executable instructions implementing at least an OS loader 111 that is configured to load executable images, such as executable image 117, into memory 103 for execution at the processor 102.
- OS loader 111 that is configured to load executable images, such as executable image 117, into memory 103 for execution at the processor 102.
- computer architecture 100 illustrates process memory 109 within memory 103 and, in the description herein, this process memory 109 corresponds to at least executable memory pages (e.g., code pages) loaded from executable image 117.
- the executable image 117 is stored on the storage media 104, but it may originate from computer system 108.
- the executable image 117 is a Portable Executable (PE) binary used by the WINDOWS OS, a Mach-0 binary used by MACOS, an Executable and Linkable Format (ELF) binary used by UNIX and LINUX-based OS’s, and the like.
- PE Portable Executable
- ELF Executable and Linkable Format
- the executable image 117 comprises control flow targeting a memory address of a default implementation of a function (e.g., as an argument of a direct call instruction or a jump instruction).
- the executable image 117 also includes one or more optimized implementations of the function.
- the executable image 117 lacks any optimized implementation of the function; in these embodiments, optimized implementation(s) of the function are found in another executable image.
- the executable image 117 also comprises function override metadata that is used by the OS loader 111 to “patch-up” the executable image 117 when the OS loader 111 identifies the presence of one or more particular capabilities of computer system 101.
- OS loader 111 patches-up the executable image 117 to modify the targeted memory address (e.g., by modifying an argument of a direct call instruction or a jump instruction), such that after being patched-up the control flow targets a memory address of an optimized implementation of the function, rather than the default implementation of the function.
- Figure 2 illustrates an example 200 of an executable image 201 that includes a default implementation, and possibly also one or more optimized implementations at least one function.
- executable image 201 is an example of executable image 117.
- Figure 2 shows that the executable image 201 includes an executable code 202 section which, in turn, includes a caller function 204 and a callee function 206.
- An ellipsis 209 indicates that the executable code 202 can include other executable code (e.g., additional functions) as well.
- the caller function 204 includes a call site 205, which in some embodiments is a direct call to a default implementation 207 of the callee function 206.
- the call site 205 defaults to calling the default implementation 207, absent additional action by the OS loader 111.
- the default implementation 207 is a “generic” capability-agnostic implementation of the callee function 206.
- a capability-agnostic implementation of a function relies only on core features of a target platform and that are generally common across instances of that target platform.
- the default implementation 207 is configured to be executable on all target platforms, regardless of their specific optional capabilities. For example, if the executable image 201 targets the X64 ISA and the WINDOWS OS, then the default implementation 207 of callee function 206 is executable on all X64 computer systems running a compatible WINDOWS OS, regardless of particular additional capabilities of those computer systems (e.g., ISA extensions, OS capabilities, additional accelerator hardware, etc.).
- the executable code 202 may also include one or more capability-specific implementations (e.g., capability-specific implementation 208) of the callee function 206.
- each capability-specific implementation utilizes one or more specific optional capabilities (e.g., ISA extension, OS capability, additional accelerator hardware, etc.) that may be available at a given target computer, such as computer system 101.
- the OS loader 111 determines that one of the capability-specific implementations (e.g., capabilityspecific implementation 208) is compatible with a set of capabilities available at the computer system 101.
- the OS loader 111 modifies the call site 205 to replace the direct call to the default implementation 207 with a direct call to capability-specific implementation 208.
- the executable image 201 includes a function override metadata 203 section.
- the function override metadata 203 comprises one or more of a capability lookup structure 210 (or a plurality of capability lookup structures), an address table 211 (or a plurality of address tables), or a patch-up table 212 (or a plurality of patch-up tables).
- An ellipsis 213 indicates that the executable image 201 can include other sections as well, such as additional sections appropriate to the PE format, the Mach-0 format, or the ELF format.
- the capability lookup structure 210 comprises information that is evaluated by the OS loader 111 to identify which function implementation to choose for a given function, given a set of capabilities available at computer system 101. While the particular structure and format of the capability lookup structure 210 can vary, in embodiments the capability lookup structure 210 comprises a binary decision tree, which may take the form of a reduced binary decision diagram (BDD).
- BDD is a data structure that is used to represent a Boolean function as a rooted, directed, acyclic graph.
- a BDD data structure comprises of a plurality of decision nodes and two or more terminal (i.e., leaf) nodes.
- each decision node represents a Boolean variable (e.g., the presence or absence of a particular capability), and has two child nodes — one corresponding to an assignment of the value “true” to that variable, and the other corresponding to an assignment of the value “false” to that variable.
- the terminal nodes correspond to true and false evaluations of the represented Boolean function as a whole.
- each BDD represents a Boolean expression specifying a unique set of function override rules based on computer system capabilities, such as unique combinations of processor ISA extensions, OS capabilities, additional hardware capabilities, and the like.
- a BDD s terminal node provides an indication of a function to use based on an evaluation of the function override rules embodied by the BDD.
- terminal nodes may be associated with an offset (e.g., index) into one or more address tables (e.g., address table 211), or be associated with a default function implementation.
- FIG. 5 illustrates an example of a BDD 500 comprising five decision nodes (i.e., nodes 501- 505) and four terminal nodes (i.e., nodes 506-509).
- This BDD 500 embodies the following override rules:
- BDD 500 Evaluating BDD 500 starting at node 501, if capability A (node 501) and capability B (node 502) are present at computer system 101, evaluation of BDD 500 terminates at node 506, which indicates an address table index of one. If capabilities A & B are not present at computer system 101, evaluation of BDD 500 continues at node 503. Here, if capability C (node 503) and capability D (node 504) are present at computer system 101, evaluation of BDD 500 terminates at node 507, which indicates an address table index of two. If capabilities C & D are also not present at computer system 101, evaluation of BDD 500 continues at node 505.
- each address table 211 corresponds to a different capability-optimized function (i.e., a function such as callee function 206, for which there exists a default implementation and one or more optimized implementations).
- each record in the address table 211 comprises a relative virtual address (RVA) specifying a location within the executable image 201 of a different implementation of a capability-optimized function to which the address table 211 corresponds.
- RVA relative virtual address
- an address table 211 can be used by the OS loader 111 to look up the RVA of an optimized implementation of a given function.
- address table 211 can include a first entry with an RVA of a function implementation optimized for capabilities A and B, a second entry with an RVA of a function implementation optimized for capabilities C and D, and a third entry with an RVA of a function implementation optimized for capability E.
- BDDs and address tables can be structured in such a way that a given BDD can be utilized to provide indices into a plurality of different address tables.
- the patch-up table 212 specifies one or more locations (relocation sites) within the executable code 202 at which a memory address for a generic implementation of a function can be substituted with a memory address for an optimized implementation of the function.
- these location(s) are a call site, such as call site 205 but, in general, these location(s) can be any location in which a control flow instruction targets a memory address of a generic implementation of a function that can be substituted with a memory address for an optimized implementation of the function.
- the patch-up table 212 specifies these locations as offsets within the executable image 201, such as by RVA. While a single patch-up table could correspond to a plurality of functions, in embodiments there is a different patch-up table for each capability-optimized function.
- the patch-up table 212 specifies each identified location with a capability- optimized function as a destination. If this capability-optimized function is called frequently, this means that the patch-up table 212 could include a large number of entries.
- Figure 3A illustrates an example 300a of calling a capability-optimized function directly.
- Example 300a shows that, in source 301a, there is a caller function 302 that includes a call site 303 that directly calls a callee function 304 (e.g., a capability-optimized function).
- caller function 204 corresponds to caller function 302
- call site 205 corresponds to call site 303
- callee function 206 corresponds to callee function 304.
- entries in the patch-up table 212 are reduced by instead calling a “thunk” function which, in turn, calls the capability-optimized function.
- a “thunk” function which, in turn, calls the capability-optimized function.
- FIG 3B illustrates an example 300b of using a thunk function to call a capability- optimized function.
- Example 300b shows that, in source 301b, the call site 303 in the caller function 302 now directly calls thunk 305 (rather than the callee function 304, as in example 300a).
- the thunk 305 includes a call site 306 that directly calls the callee function 304.
- example 300b shows the thunk 305 as being part of source 301b
- a thunk is dynamically generated by a compiler toolchain.
- caller function 204 now corresponds to thunk 305
- call site 205 now corresponds to call site 306
- callee function 206 still corresponds to callee function 304.
- the patch-up table 212 only needs to include an entry corresponding to call site 306.
- FIG. 3C illustrates an example 300c of using multiple thunk functions with multiple sets of call sites.
- source 301c includes a first set of caller functions, exemplified by caller function 302a, that have a call site 303a calling a thunk 305a that includes a call site 306a for calling a callee function 304a.
- thunk 305a is a “memcpy” thunk that directly calls a default implementation of memcpy (e.g., memcpy()), or that can be patched-up to call optimized implementations of memcpy (e.g., memcpy _avx(), memcpy _esrmb(), etc.).
- memcpy _avx() memcpy _esrmb(), etc.
- thunk 305b is a “memmove” thunk that directly calls a default implementation of memmove, or that can be patched-up to call optimized implementations of memmove.
- thunks such as thunk 305 in example 300b
- use of thunks also facilitates code sharing between processes, particularly in the context of containerized environments.
- memory image pages are only shared between a container host and a container if those memory image pages have no relocation sites (e.g., call site 205) in them.
- thunks embodiments can decrease the number of memory image pages that have relocation sites (e.g., by ensuring that only thunk memory pages have relocation sites), and therefore increase the number of memory pages that can be shared between host and container.
- those thunks are placed together (e.g., on the same memory page) to reduce a number of relocation sites.
- the OS loader 111 is shown as including an executable image access component 112, a capability determination component 113, a metadata evaluation component 114, an address substitution component 115, and a memory page loading component 116.
- the depicted components of OS loader 111 represent various functions that the OS loader 111 might implement or utilize in accordance with various embodiments described herein. It will be appreciated, however, that the depicted components — including their identity, sub-components, and arrangement — are presented merely as an aid in describing various embodiments of the OS loader 111 described herein, and that these components are non-limiting to how software and/or hardware might implement various embodiments of the OS loader 111 described herein, or of the particular functionality thereof.
- the executable image access component 112 accesses executable image 117 (e.g., executable image 201), such as based on a request to load and initiate execution of executable image 117.
- the capability determination component 113 determines a set of one or more capabilities of computer system 101, such as ISA extensions, OS capabilities (including security capabilities), accelerator hardware, and the like.
- the capability determination component 113 operates based on the executable image access component 112 having accessed executable image 117 (e.g., based on the executable image 117 comprising function override metadata 203). In other embodiments, the capability determination component 113 operates at some other time independent of access of the executable image 117, such as during an OS boot process.
- the metadata evaluation component 114 evaluates the function override metadata 203 contained therein. For example, the metadata evaluation component 114 evaluates the capability lookup structure 210 (or structures) to determine if the set of one or more capabilities of computer system 101 (as determined by the capability determination component 113) can be used to identify an optimized function implementation. For example, the metadata evaluation component 114 may evaluate one or more BDD’s, in light of Boolean values derived from the set of one or more capabilities of computer system 101.
- the metadata evaluation component 114 evaluates the address table 211 (or tables), in light of the evaluation of the capability lookup structure 210, to determine address(es) for any applicable optimized function implementations specific to the set of one or more capabilities of computer system 101.
- the metadata evaluation component 114 evaluates the patch-up table 212 (or tables) to determine the location(s) at which address substitutions should be performed to call these optimized function implementation(s), rather than default function implementation(s).
- the metadata evaluation component 114 stores its results as loader metadata 110. In one example, in a WINDOWS environment, at least a portion of the loader metadata 110 could be stored as part of a dynamic value relocation table.
- the address substitution component 115 based on the loader metadata 110, and for each of one or more call sites, the address substitution component 115 “patches up” the binary by substituting a target memory address with a replacement target memory address for an optimized function implementation. In embodiments, the address substitution component 115 substitutes a direct call to a default implementation of a function with a direct call to an optimized implementation of the function, which is optimized based on the set of one or more capabilities of computer system 101.
- the address substitution component 115 patches up all locations identified in the loader metadata 110, loads the entirety of the executable image 117 into the process memory 109 using memory page loading component 116, and then initiates execution of the executable image 117.
- the address substitution component 115 operates on-demand (e.g., lazily) as each memory page comprising one (or more) of these locations is being loaded by the memory page loading component 116.
- the OS loader 111 initiates execution of the executable image 117 prior to fully loading all memory pages of the executable image 117 into process memory 109, but patches up memory pages as they loaded into the process memory 109, as needed, prior to executing code from those memory pages. Either way, a program that has been patched-up to use an optimized implementation of a function is never exposed to a memory page that doesn't have a needed patch-up applied.
- instructions for implementing method 400 are encoded as computer-executable instructions (e.g., OS loader 111) stored on a computer program product (e.g., storage media 104) that are executable by a processor (e.g., processor 102) to cause a computer system (e.g., computer system 101) to perform method 400.
- OS loader 111 stored on a computer program product (e.g., storage media 104) that are executable by a processor (e.g., processor 102) to cause a computer system (e.g., computer system 101) to perform method 400.
- method 400 comprises an act 401 of, within an executable image, identifying a control flow that targets a callee function.
- act 401 comprises reading a portion of an executable image file, the portion including a first memory address corresponding to a first callee function implementation, the first memory address having been inserted into the portion by a compiler toolchain.
- the portion corresponds to a call site having as a call target, the first memory address.
- the executable image access component 112 access a portion of executable image 201, such as a portion corresponding to caller function 204.
- the caller function 204 includes a call site having, as a call target, a first memory address corresponding to the default implementation 207 of callee function 206.
- act 401 is performed as part of loading of a memory page into process memory 109 — either prior to execution of the executable image 201 or dynamically (e.g., on-demand) after executable image 201 has started — but prior to execution of code from the memory page being loaded.
- the caller function 204 could be a thunk 305 used to call callee function 304.
- the portion corresponds to a thunk used by a caller function to call a callee function.
- Method 400 also comprises an act 402 of, based on extensible metadata, and based on a systemspecific capability set, determining a replacement callee function.
- act 402 comprises, based on extensible metadata included in the executable image file, and based on a capability set that is specific to the computer system, determining a second memory address corresponding to a second callee function implementation.
- the address substitution component 115 determines that the destination address of call site 205 is to patch-up with a destination memory address of the capability-specific implementation 208. Effects of act 402 include dynamically identifying an optimized version of a function to execute at a computer system, based on a set of capabilities specific to that computer system.
- a set of one or more capabilities of computer system 101 can include one or more of ISA extensions, OS capabilities (including security capabilities), accelerator hardware, and the like.
- the capability set includes one or more of a processor architectural feature, an OS feature, a security feature, or a hardware functionality.
- the function override metadata 203 comprises a capability lookup structure 210, an address table 211, and a patch-up table 212.
- the capability lookup structure 210 comprises information that is evaluated by the OS loader 111 to decide which function implementation to choose for a given function, given a set of capabilities available at computer system 101.
- the extensible metadata includes a data structure that maps the capability set that is specific to the computer system to an identification of the second memory address.
- the capability lookup structure 210 is a binary decision tree, which may take the form of a reduced BDD.
- the data structure comprises a binary decision tree or a BDD.
- the address table 211 specifies, for a given capability-optimized function, an RVA specifying a location within the executable image 201 of a different implementation of the capability-optimized function.
- an address table 211 can be used by the OS loader 111 to look up the RVA of an optimized implementation of a given function.
- the identification of the second memory address is an offset within a memory address table for a callee function, each entry in the memory address table comprising a corresponding memory address for a different implementation of the callee function.
- each memory address in the memory address table is an RVA.
- the patch-up table 212 specifies one or more offsets within the executable code 202 at which a memory address for a generic implementation of a function can be substituted with a memory address for an optimized implementation of the function.
- the extensible metadata includes an offset that identifies the portion.
- Method 400 also comprises an act 403 of, prior to executing the call, modifying the control flow to target the replacement callee function.
- act 403 comprises, before execution of the portion, modifying the portion to replace the first memory address with the second memory address.
- the address substitution component 115 patches-up the call site 205 with the destination memory address of the capability-specific implementation 208.
- the call site 205 is patched-up to have the destination memory address of the capabilityspecific implementation 208 as its target. Effects of act 403 include using a set of computer system specific capabilities to modify and executable to execute an optimized version of a function, rather than executing a default version of the function.
- the address substitution component 115 operates on- demand as a memory page comprising one (or more) of these call sites is being loaded by the memory page loading component 116.
- modifying the portion is performed in connection with loading a memory page comprising the portion into memory.
- memory pages that are not patched-up are potentially available for loading into shared memory pages.
- the memory page is loaded into shared memory that is utilized by at least one different executable image file.
- the first callee function implementation is a capability-agnostic implementation (e.g., default implementation 207) of a callee function (e.g., callee function 206), and the second callee function implementation is a capability-specific implementation (e.g., capability-specific implementation 208) of the callee function that utilizes at least one capability in the capability set.
- the first callee function implementation and the second callee function implementation both exist within the executable image 117.
- the first callee function implementation is included in the executable image file
- the second callee function implementation is also included in the executable image file.
- the principles herein can be used to provide a default implementation of a function within the executable image 117, but to call to another implementation of the function in some other executable image (e.g., a dynamically linked library, the OS itself, etc.) if that executable image is available at computer system 101 and if the computer system 101 meets a given set of capabilities.
- some other executable image e.g., a dynamically linked library, the OS itself, etc.
- the first callee function implementation is included in the executable image file
- the second callee function implementation is included in a different executable image file.
- the executable image 117 includes only the default implementation, and lacks any optimized implementation of the function, instead relying on this other executable image to provide the optimized implementation. In one example, this is useful to utilize an OS special function, in which the OS dynamically inserts an optimized function into the process memory 109.
- the second callee function implementation is an OS special function.
- the embodiments described herein dynamically override a function based on a capability set during loading of an executable image into memory, based on replacing a first destination address of a default function implementation with a second destination address of a function implementation that is determined based on a capability set of the computer system on which the executable image is being loaded.
- a destination address e.g., of a direct call instruction, or of a jump instruction
- the embodiments herein can dramatically speed up program execution, particularly if the called function is relatively small and called frequently.
- Embodiments of the present invention may comprise or utilize a special-purpose or general- purpose computer system (e.g., computer system 101) that includes computer hardware, such as, for example, one or more processors (e.g., processor 102) and system memory (e.g., memory 103), as discussed in greater detail below.
- Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures.
- Such computer-readable media can be any available media that can be accessed by a general-purpose or special-purpose computer system.
- Computer-readable media that store computer-executable instructions and/or data structures are computer storage media (e.g., storage media 104).
- Computer-readable media that carry computer-executable instructions and/or data structures are transmission media.
- embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: computer storage media and transmission media.
- Computer storage media are physical storage media that store computer-executable instructions and/or data structures.
- Physical storage media include computer hardware, such as RAM, ROM, EEPROM, solid state drives (“SSDs”), flash memory, phase-change memory (“PCM”), optical disk storage, magnetic disk storage or other magnetic storage devices, or any other hardware storage device(s) which can be used to store program code in the form of computer-executable instructions or data structures, which can be accessed and executed by a general-purpose or special-purpose computer system to implement the disclosed functionality of the invention.
- Transmission media can include a network and/or data links which can be used to carry program code in the form of computer-executable instructions or data structures, and which can be accessed by a general-purpose or special-purpose computer system.
- a “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices.
- program code in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to computer storage media (or vice versa).
- program code in the form of computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., network interface 105), and then eventually transferred to computer system RAM and/or to less volatile computer storage media at a computer system.
- network interface module e.g., network interface 105
- computer storage media can be included in computer system components that also (or even primarily) utilize transmission media.
- Computer-executable instructions comprise, for example, instructions and data which, when executed at one or more processors, cause a general-purpose computer system, special-purpose computer system, or special-purpose processing device to perform a certain function or group of functions.
- Computer-executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
- the invention may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like.
- the invention may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks.
- a computer system may include a plurality of constituent computer systems.
- program modules may be located in both local and remote memory storage devices.
- Cloud computing environments may be distributed, although this is not required. When distributed, cloud computing environments may be distributed internationally within an organization and/or have components possessed across multiple organizations.
- cloud computing is defined as a model for enabling on- demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services). The definition of “cloud computing” is not limited to any of the other numerous advantages that can be obtained from such a model when properly deployed.
- a cloud computing model can be composed of various characteristics, such as on-demand self- service, broad network access, resource pooling, rapid elasticity, measured service, and so forth.
- a cloud computing model may also come in the form of various service models such as, for example, Software as a Service (“SaaS”), Platform as a Service (“PaaS”), and Infrastructure as a Service (“laaS”).
- SaaS Software as a Service
- PaaS Platform as a Service
- laaS Infrastructure as a Service
- the cloud computing model may also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth.
- Some embodiments may comprise a system that includes one or more hosts that are each capable of running one or more virtual machines.
- virtual machines emulate an operational computing system, supporting an OS and perhaps one or more other applications as well.
- each host includes a hypervisor that emulates virtual resources for the virtual machines using physical resources that are abstracted from view of the virtual machines.
- the hypervisor also provides proper isolation between the virtual machines.
- the hypervisor provides the illusion that the virtual machine is interfacing with a physical resource, even though the virtual machine only interfaces with the appearance (e.g., a virtual resource) of a physical resource. Examples of physical resources including processing capacity, memory, disk space, network bandwidth, media drives, and so forth.
- the terms “set,” “superset,” and “subset” are intended to exclude an empty set, and thus “set” is defined as a non-empty set, “superset” is defined as a nonempty superset, and “subset” is defined as a non-empty subset.
- the term “subset” excludes the entirety of its superset (i.e., the superset contains at least one item not included in the subset).
- a “superset” can include at least one additional element, and a “subset” can exclude at least one element.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280088587.9A CN118556228A (en) | 2022-02-11 | 2022-11-15 | Dynamically overwriting functions during loading based on capability sets |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263309112P | 2022-02-11 | 2022-02-11 | |
US63/309,112 | 2022-02-11 | ||
US17/724,329 US11720374B1 (en) | 2022-02-11 | 2022-04-19 | Dynamically overriding a function based on a capability set |
US17/724,329 | 2022-04-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023154092A1 true WO2023154092A1 (en) | 2023-08-17 |
Family
ID=84767012
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2022/050013 WO2023154092A1 (en) | 2022-02-11 | 2022-11-15 | Dynamically overriding a function based on a capability set during load time |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023154092A1 (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6718546B1 (en) * | 1999-04-23 | 2004-04-06 | International Business Machines Corporation | Application management |
-
2022
- 2022-11-15 WO PCT/US2022/050013 patent/WO2023154092A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6718546B1 (en) * | 1999-04-23 | 2004-04-06 | International Business Machines Corporation | Application management |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10795660B1 (en) | Live code updates | |
US8739147B2 (en) | Class isolation to minimize memory usage in a device | |
EP3035191B1 (en) | Identifying source code used to build executable files | |
JP5090169B2 (en) | Platform independent dynamic linking | |
KR101793306B1 (en) | Virtual application extension points | |
Chun et al. | Clonecloud: boosting mobile device applications through cloud clone execution | |
CN109614165B (en) | Multi-version parallel operation method and device for COM (component object model) component | |
US10838745B2 (en) | Loading dependency library files from a shared library repository in an application runtime environment | |
US11726810B2 (en) | Systemic extensible blockchain object model comprising a first-class object model and a distributed ledger technology | |
CN112083968A (en) | Plug-in loading method and device in host | |
WO2021211911A1 (en) | Artificial intelligence cloud operating system | |
US20170329622A1 (en) | Shared virtual data structure of nested hypervisors | |
EP3719645B1 (en) | Extension application mechanisms through intra-process operation systems | |
US10552135B1 (en) | Reducing a size of an application package | |
KR100818919B1 (en) | Method for invoking method and java virtual machine using the same | |
US12050928B2 (en) | Method and apparatus of providing a function as a service (faas) deployment of an application | |
US11720374B1 (en) | Dynamically overriding a function based on a capability set | |
WO2022179101A1 (en) | Software storage method under storage architecture | |
WO2023154092A1 (en) | Dynamically overriding a function based on a capability set during load time | |
Wang et al. | Reg: An ultra-lightweight container that maximizes memory sharing and minimizes the runtime environment | |
CN118556228A (en) | Dynamically overwriting functions during loading based on capability sets | |
US20220283789A1 (en) | Methods and apparatuses for providing a function as a service platform | |
CN113867776A (en) | Method and device for publishing middle station application, electronic equipment and storage medium | |
CN115543486B (en) | Server-free computing oriented cold start delay optimization method, device and equipment | |
US20070174820A1 (en) | Transparent context switching for software code |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22835170 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280088587.9 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022835170 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022835170 Country of ref document: EP Effective date: 20240911 |