WO2014172062A2 - Secure computing - Google Patents

Secure computing Download PDF

Info

Publication number
WO2014172062A2
WO2014172062A2 PCT/US2014/031396 US2014031396W WO2014172062A2 WO 2014172062 A2 WO2014172062 A2 WO 2014172062A2 US 2014031396 W US2014031396 W US 2014031396W WO 2014172062 A2 WO2014172062 A2 WO 2014172062A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
instructions
lfsr
instruction
cache
Prior art date
Application number
PCT/US2014/031396
Other languages
French (fr)
Other versions
WO2014172062A3 (en
Inventor
Laurence H. Cooke
Original Assignee
Cooke Laurence H
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cooke Laurence H filed Critical Cooke Laurence H
Priority to EP14784683.6A priority Critical patent/EP2987086B1/en
Publication of WO2014172062A2 publication Critical patent/WO2014172062A2/en
Publication of WO2014172062A3 publication Critical patent/WO2014172062A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/14Protection against unauthorised use of memory or access to memory
    • G06F12/1408Protection against unauthorised use of memory or access to memory by using cryptography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/82Protecting input, output or interconnection devices
    • G06F21/85Protecting input, output or interconnection devices interconnection devices, e.g. bus-connected or in-line devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • G06F9/30178Runtime instruction translation, e.g. macros of compressed or encrypted instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/40Specific encoding of data in memory or cache
    • G06F2212/402Encrypted data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/452Instruction code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2125Just-in-time application of countermeasures, e.g., on-the-fly decryption, just-in-time obfuscation or de-obfuscation

Definitions

  • Embodiments of the present invention may relate to processing encrypted code and data in a secure maimer in a processor.
  • thai is not the state of all computing.. e.g. cloud computing, today.
  • Most of today's servers contain multiple LCs, each with multiple processors and multiple levels of shared cache, processing potentially different applications on virtual machines in the same chip. In this environment., one application may snoop another application, within the same chip, and may do so well enough to hack it.
  • Boiler's encryption, approach may require too much computational overhead for encoding and decoding a processor's instructions and data, as may other software-based encryption techniques, such as that described by Horovitz et al. in US Patent Application Publication No. 2 13/0067245, published March 14, 201.3; and while- Henry et al., in US Patent Applicatio Publication No.
  • a nudti -processor system e.g., on an IC, may have two or more, processors, where each, processor may include art instruction unit for processing instructions, an execution unit for operating on data, at least one cache memory, at least one interface to a system bus, logic for translating instructions accessed by the instruction unit from a cache memory and logic between the execution unit and.
  • the logic for translating instructions may include logic for decrypting encoded instructions, and the logic for translating data may include logic for decrypting data being accessed by the execution unit and logic for encrypting data being written to the cache.
  • the logic for translating instructions may include a LFSR., and the logic for translating data may include code transformation logic.
  • the logic for translating data may also include logic for selectively encrypting data written to a system bus.
  • the cache .memories may include an instruction cache and a data cache. The logic for translating. instructions may access the instruction cache, and the logic for translating data may access the data cache,
  • a multi-processor system e.g., on an IC, may have two or more processors, where each processor may include an instruction, unit for processing instructions, an execution unit ibr operating on data, at least one cache memory, at. least one. interface to a system bus, and logic for translating data and instructions transferred between the system and a cache memory.
  • the logic ibr translating instructions may include logic for decrypting encoded instructions
  • the logic for translating data may include logic for decrypting data being accessed by the execution unit and logic for encrypting data being written to the cache.
  • a method for encrypting a program's instructions and data may include the steps of:
  • the selective instructions may include instructions for loading and storing registers containing addresses, branch instructions, and instructions for calling and returning from -subroutines.
  • the LFSR may be a programmable LFSR, and the step of creating initial codes may include programming the LFSR with one of the initial codes.
  • debugging unencrypted, applications may be performed without recompiling the application or altering the encrypted application's cycle-by-cycle operation, e.g., by using, zero translation codes.
  • instructions to generate the data for transform may be encrypted, appended in front of the encoded application, and may be executed .following the loading of the LFSR mask register and initial translation code.
  • the transform mask, registers ' data . may be generated by:
  • Figure la and l b are conceptual diagrams of examples of multi-processor systems with encryption and decryption translators
  • Figures 2a and 2b are diagrams of examples of instruction decryption using an LFSR in conjunction with instructions
  • Figure- 3a, 3b. and 3c are diagram of examples of data encryption and decryption using offset translated codes associated with base addresses of the data.
  • Figure 4 is a simplified diagram of an example of a L I data cache
  • Figure 5 is a diagram of an example -four-bit Galois LFSR
  • Figure 6a is a high level diagram of code translation based on the example LFSR
  • Figure 7 is another high level diagram of a code transformation example with extended offset addressing
  • Figure 8 is a diagram of an. example of a programmable Galois LFSR
  • Figures 9 is a diagram of an example of programmable code transformation, logic
  • Figure 10 Is another diagram of an example of a processor with checksum l ogic
  • Figure 1. is a diagram of an example of checksum logic.
  • FIG. la a conceptual diagram of an example of a multiprocessor system with encryption and decryption.
  • the system may include two processors 10 connected via a common system bus 17 to a shared 12 cache 18 and/or other devices, such as a memory controller 19, one or more co-processor ⁇ s) 20, and/or a bridge to more peripheral devices 21.
  • Each processor 10 may include an instruction unit 1 1 and its corresponding Li 1- eache 13 with a translator 15 for decrypting the instructions being read into the I -cac he coupled between them, in a similar manner, each processor 10 may contain an execution unit 12 and its corresponding Li D-cache 14 with another translator 16 fo encrypting and decrypting data being read into and written out of the D-cache ' coupled between them.
  • the instructions and data may be decrypted as they are read, out of their respective LI caches, and. selected decrypted data may be sent past the L..1 D-cache- 14 to the bridge 21 and co-processors 20, while maintaining the encrypted data in the cache.
  • the architecture shown i Figure, la may .have processors with reduced instruction set computin (RISC) and instructions that use base register plus index addressing, but it is further contemplated that LFS R encryption and decryptio of individual processor instructions and data may be employed in other processor architectures, such as complex instruction set computing (CISC) with direc addressing modes, and a system architecture, including but not limited to the one shown in Figure lb.
  • RISC reduced instruction set computin
  • LFS R encryption and decryptio of individual processor instructions and data may be employed in other processor architectures, such as complex instruction set computing (CISC) with direc addressing modes, and a system architecture, including but not limited to the one shown in Figure lb.
  • CISC complex instruction set computing
  • Figure 2a a diagram of an example of instruction decryption using an LFSR in conjunction with an. instruction address register.
  • an instruction LFSR 23 containing a code that may be used to decrypt the encoded instructions 24 using a set of exemsive ⁇ OR gates 25,
  • the instruction LFSR.23 may be clocked each time the instruction address register 22 is incremented, thereby providing each, instruction with a unique code from which, the instruction may be decoded, in this manner, each occurrence of an insimction may be encoded and decoded differently, thereby obscuring the actual instruct on's identity.
  • FIG. 2b a diagram, of an example of instruction memory including branch instructions.
  • the normal instructions may consist of opcode 26 and address 27 fields.
  • the address fie ids may contain offsets from base register addresses for accessing data.
  • selected instructions may also include translation eodes 28, The fields and translation codes may initially be encrypted and may be decrypted before being used.
  • One such selected instruction, the branch instruction may load both the instruction address register and the instruction LFSR while executing the branch, such that the instruction LFSR may contain the proper code for the next instructio 29 after takin the branch.
  • the instruction address register may be incremented, and the LFSR may be clocked to provide the proper code for decrypting the next instruction. Subsequent
  • incrementing of the instruction address register and clocking of the LFSR may occur until the instruction 29 may be reached, at which point the contents of the LFSR may match the decrypted translation code 2$ of the corresponding branch instruction.
  • An. initial translation code for the first instruction 30, and other data may be obtained by the operating system through some separate secure encryption protocol such as RS A, which may be loaded into the instruction. LFSR using. a. branch to the first, instruction.
  • some encrypted instructions may include translation codes, whic may also be encrypted, thereby securing all but the ⁇ initial translation code for the first, instruction.
  • the proper translation code for each instruction may be easil obtained in. one clock cycle. by either loading or clocking the LFSR.
  • Each processor may contain a set of N registers 30, where the first . registers .may be used, to .hold data, and the last N- base registers may contain addresses pointing to one .or more sections of data and/or instruction code,
  • the processor may further contain N-K translation code registers 31, which may correspond to the 1 - base registers.
  • an executed instruction When, an executed instruction loads or stores data from or to a memory location, it may calculate the address of the memory location by adding an offset to the address within a base register.
  • the translation code corresponding to that address may be calculated by transforming the code from the translation code associated with the base register by the amount of the offset, thereby creating a code for decrypting or encrypting the corresponding data.
  • the translation code for a location 37 with an offset 35 from the address in base register K may be calculated by loading the offset 35 and the translation code from code register K 33 Into the code transformation logic 34, producing a translation code in.
  • register 36 for the location 37 which, when applied to the exciusive-ORs 38 in Figure 3c, may decrypt or encrypt the data 39 being loaded from or stored into location 37,
  • the same translation code may be used to encrypt and decrypt data being put- into or taken from the target, address.
  • the same translation code result as. obtained, through the code transformation-logic 34 may be obtained by loading a properly wired LFSR with the translation code from code register K and clocking the LFSR by a count equal to the offset 35.
  • multiplexors 40 may be used to select the unencrypted data 3 when sending- the data to either the co-proeessors 20 in Figure la or other peripheral devices through the peripheral bridge 21 in Figure- la, or to other locations that may require decrypted data.
  • the multiplexors 40 in Figure 3c may continue to select the encrypted dat when sending the data either back to the caches 14 and 18 in Figure la or to main memory through the memory controller 19 in Figure la. Such selection may be done by a select line 41 in Figure 3c, which may be driven either from a control register (not shown) or by control bits within the instructions themselves.
  • instructions for loading a base register's address may also load the base register's associated code register, in a similar manner, a subroutine call may store the translation code associated with the instruction in the instruction LFSR after saving the prior contents of the LFSR in the code register associated with the base register where it stores its return address. Similarly, a return instruction may branch to the address in the base register while loading the contents of the corresponding code register into the instruction LFSR. 100032]
  • Initial encryption of the instructions in a program and data space may be performed after compilation and before the final load module creation, -e.g., by: creating an initial translation code; incrementing the LFSR function to obtain the translation code for each instruction;
  • the instructions requiring appended translation codes may include instructions involving addresses in base registers, branches and/or subroutine calls,
  • FIG 4 a simplified diagram of an example of a L I data cache.
  • the LI Cache 45 may be coupled to both a write line buffer 43 and a read line buffer 44.
  • the contents of the data cache may be encrypted in a manner similar to the system architecture shown in Figure la, and the read line buffer 42 may be coupled to the data, inputs 39 in Figure 3 c for decryption of the encrypted data.
  • the write line buffer 43 may also be connected to the outputs 42 in Figure 3c for re-encryption of updated, data and may be configured to read the register 36, to load the associated translation code into a portion of the data cache 46 along with the encrypted updated data.
  • Part; of the write line buffer 43 may also be connected to read the output register 36 following execution of the code transformation logic 34, in this manner, on a cache line miss, the translation code associated with the new cache line data may be generated while the data portion of the write line buffer 43 may be filled from externa! cache ' or memory. Similarly, the translation code may be read from a section of the cache 46 into the register 36 to subsequently translate the read line buffer's 44 data for the execution unit. Finally, when othe devices snoop the LI Cache, they may onl extract encrypted data.
  • the contents of the LI cache may be decrypted in the system architecture depicted in Figure l .
  • the processor may not decrypt the data or instructions from the cache.
  • the translation code 36 for the address of the data or instructions being written into the write line buffer may be used to decrypt the data or instructions and may be subsequently stored in a code portion 46 of the cache along with the decrypted data.
  • the translation code associated with the data may be used to encrypt the snooped data
  • an LFSR starting with a translation code for the first instruction or word of data in a cache line buffer, may be clocked and applied to each subsequent instruction or word of data being read from or written into the cache line buffer. If the read or write is out of order, the translation code may be adjusted by a single transformation function that may "subtract" the buffer size from the LFS when the data or instructions wrap around the line buffer. Given an .LFSR function with M unique values before repeating, "subtraction" of N is equivalent to a transformation function of M-N, where >N.
  • FIG. 3 a diagram of an example four-bit Galois LFSR composed of four flip-flops 5 ! -54 seriall coupled in a ring, with one. e clusive-OR gate 55 connected between, the first flip-flop 51 and the second flip- flop 52, and a feedback loop 56. -connecting the fourth, flip-flop 54 to the exclusive-OR 55 and the first flip-flop 51, where the outputs of all the flip-flops are coupled to a code register 57,
  • This particular LFSR may be used to generate all fifteen distinct non-zero translation codes (i.e.,. all fifteen distinct non-zero combinations of four bits) before repeating.
  • M clocks may he used to generate the translation code for an address of the Mfh word after the specific address.
  • the initial translation code Co.60 may applied .with an -offset value 61 to code transformation logic 63 to produce a translation code CM in an outpol code register 62, which corresponds to the translation code the Mill word after the specific address.
  • the code transformation logic may produce translation code C by successively, for each set (i.e., or "high"), or alternatively s clear (i.e., **0" or "lo w 5" ) bit in an Nth position of the binary offset value, transforming the initial t anslation code by a function that is equivalent to clocking the LFSR by 2 times.
  • the code transformation logic may transform the translation code by some combination -of the transforation functions: J I (one clock) 64, J2 (two ciocks) 65, J4 (tour clocks) 66 and/or 18 (8 ciocks) 67,
  • Each of the functions J! through 18 may be comprised of exclusive- OR gates 69 and multiplexors 68.
  • the function Jl 64 may select the four b ts from Co 60 if the lowest order hit from the offset 61 is low, and may select Che same values the LFSR would generate in. one clock cycle if the lowest order bit from the offset 61 is high.
  • the second bit is the exelusive-OR 69 of the first and fourth bits, as would be captured in the second flip-flop 51 of the LFSR in Figure 5 after one clock.
  • the function 12 65 may select between its input values if bit 1 of the offset is low and the same values the LFSR would generate after two clock cycles, which is equivalent to two cycles of 11 , if bit!
  • each respecti ve bit of the offset selects a corresponding function equivalent to clocking the LFSR by the number of clock cycles corresponding to the position of the. offset bit, depending on whether the respective offset bit is high or low.
  • J I is a! ⁇ -d0 s bl ⁇ -(a0+d0), c-K-bO, dl ⁇ -c0;
  • the LFSR 80 may contain any number of flip-flops 81 serially coupled in a ring, where, each flip-flop, except the last flip-flop 82, may drive an e clusive-OR gate 83:, and all may be driven, by a multiplexor 84, which may select between, loading a code 85 into the LFSR or sequencing the LFSR.
  • Each exclusive-OR gate S3 may also be driven by an. AND gate 86, which may be used to enable the signal on the feedback line 87 with a bit from an LFSR mask register 88.
  • the LFSR mask register 88 may contain N-l bits for a fllp-fiop LFSR, The LFSR mask register bits may be loaded with any Galois LFSR configuration.
  • the LFSR may be clocked on all increments of the .instruction address register, thereby stepping through the LFSR. states. In any one of the ideal configurations, the LFSR may repeat ever state possible except zero. Loading a zero code into the LFSR may be equivalent to no decryption, given that no amount of clock may change the ' state of the LFSR and that no bits are changed when exclusive-ORed with zero.
  • the LFSRs and all decryption may be disabled by loading translation codes of zero. This may be performed, e.g., when exiting an encrypted application.
  • Each respective transformation function 90 may include N transform mask registers 91 of N bits each, where N is the number of flip- flops in the corresponding LFSR. There may be one transform mask register for each output of the transformation function. Each of the bits in the transform m sk registers may be used to select, via an AND ga te 92, a corresponding input to the transformation function. The selected inputs may be exclusive- ORed, through a tree of exclusive-OR gates 93, to form the output 94 selected by a bit 95 from the offset 96.
  • programmable code transformation logic for an N-bit offset may require up to N J programming bits. It should be noted that the transformation functions selected by all offset bits above may be copies of the first N transformation functions, thereby requiring no additional programming bits. 00044] The actual number of unique bits required to program the code transformation logic may be much less than N *.
  • the transform mask register bits for the .first transformation function when viewed as an NxN matrix, may be generated by rotating an identity matrix down one ro after ORing the N-l LFSR mask register bits into the first bits of the last column, in a manner that properly simulates one clock shift of the associated LFS ' R.
  • the second transformation function's matrix may be generated b multiplying modulo 2 the first transformation function's matrix by itself the third transformation function's matrix may be generated by multiplying modulo 2 the second matrix by itself, and each, successive transformation function's matrix may be generated from the matrix of the previous
  • programmable code transformation-function may be generated with as few as N- i
  • the single shift matrix [J!] may be:
  • the matrix for two shifts 12] may be:
  • the LFSR mask register bits needed for programming the LFSR may not be the. bits used to program the transformation functions, thereby providing different encryption algorithms for the instruction and data. Such, additional mask register bits may also be included with the initial translation code.
  • the mask register bits may be encrypted with the initial translation code, and prior to executing the encrypted program, the mask register data may be decrypted by loading the initial translation code into the LFSR, using the initial translation code to decode the mask register data without clocking the LFSR, and then loading the LFSR's decrypted mask register data.
  • instructions to generate the data for the transform mask registers from the LFSR's mask register foils may be. encrypted, appended in front of the encoded application, and may be executed following the loading of the LFSR mask register and initial translation code.
  • this code may not address data memory, which may require the use of the yet-to-be-programmed code transformation, logic.
  • all transform mask registers may foe directly addressable by instructions, and all generation of the transform mask register data may be done in situ, thereby avoiding use of addressed data memory.
  • the processor's legal instruction codes may be a small fraction of the possible values in the opcode field of an instruction.
  • the execution of an illegal instruction may cause an operating system interrupt, thereby allowing the operating -system to- detect instruction tampering.
  • illegal addresses may also cause operatin system interrupts, thereby allowing the operating system to detect data tampering.
  • ⁇ OOeSOJ Small examples such as those above, may be useful for illustrating the detailed logic, but in current .more realistic multi-processor environments, -a practical example ma be a 32-bit RISC processor with 20-bit offset address fields in the instructions and multiple levels of cache.
  • the instructions, and data may remain encrypted, within their respect ve caches, the ' .LFSR may be 32 bits long, and the LFSR mask register may be 31 bits long, ' both manageable sizes of separately encrypted initial codes.
  • the longest path between flip-flops on the programmable LFSR may be an AND gate followed by XQR gate, and loading the LFSR may also only take one clock cycle; hence, the decryption of the instructions may easily occur during the instruction unit ' s fetch cycle.
  • the proper decrypted translation codes for each stream may be stored with the branch predictions or loop instructions.
  • the offset address field may contain a 20-bit offset, which may result In 20 transformation functions, each of which may have 32 bits of 32 AND gates masking the input signals to a 6-level tree of 31 XOR gates. Bach of the 20 transformation functions may then contain eight levels of logic (I AND, 6 XORs and 1 mu!tiplexor), for a total of 1 ,024 AND gates, 992 XOR gates, 32 multiplexors, and 32 32-bit transform mask registers. The worst-ease path in such structure may he up to 160 gate levels long. This may be reduced where the terms are not needed, but the result may still require many- clock cycles.
  • the i ne needed to calculate the proper cache line translation code may overlap with the time required to process a cache line miss request to either an 12 cache or main memory, which also may take many clock cycles.
  • the translation code may be stored in the LI data cache with the encrypted cache line.
  • the translation code may be retrieved to decrypt the data retrieved from the cache or to encrypt the data written to the cache, as shown in Figure 4.
  • the translation code stored in the cache may be only applicable to the first word in a 2* 1 word cache line.
  • a -bit code transformation logic block may theft be used to create the translation code for the proper word out of the cache line, or a combination of a K-M bit code transformation logic block and 2** cycles of an appropriately loaded LFSR may be used. It should be noted that, because of the short path within the LFSR, the LFSR may also be clocked at multiple of the processor clock.
  • the mask register and code transformation logic may be reduced by limiting the programming to a subset of the bits.
  • debugging of applications may be performed without, recompiling the application or altering its cycle-by-cycle operation.
  • Unencrypted applications may also be modified before the -final load module creation, e.g., by creating a zero initial translation code and appending to the selected instructions a zero translation code.
  • Execution of the unencrypted application may then be performed with all the available transparent debug facilities as may exist in the processor, and with the translation logic enabled.
  • the unencrypted code may then perform in the same eyele-hy-cyele manner as the encrypted code.
  • when subsequently encrypting the application, or re-encrypting the application its size and cycle-by-cycle operation may not change.
  • the LFSR, code transformation logic, and checksum logic may be used to generate random instructions and data to test the processor prior to normal operation.
  • the LI caches 103 and 04 may be initialized to zero.
  • Figure 11 a diagram of an example of checksum logic.
  • the checksum register 1 J 1 may be cleared or loaded with an initial code.
  • the input data 112 may be combined with the current contents of the checksum register 1 1 1 through exelusive-G (XOR) gates 1 13 to update the checksum register 1 1 1.
  • the input data 1 12 may be instructions or control signals from the instruction unit or may be data and control signals from the execution unit. Testing may proceed by: a) Loadin an LFSR translation code and an initial instruction address, b) Disabling cache misses by loading just the translation codes and
  • the control signals may include interrupt signals, instruction addresses, and/or other signals generated by the execution of the test and captured by the checksum prior to bein disabled.
  • some amount of encoded instructions may be loaded into the i-cache, and mcodad data into the D-cache to perform partial or full diagnostic tests.
  • the LFSR, transformation logic and checksums may be used to perform processor BIST or to aid in processor diagnostic tests.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Storage Device Security (AREA)
  • Executing Machine-Instructions (AREA)
  • Mathematical Physics (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

Techniques and logic are presented for encrypting and decrypting programs and related data within a multi-processor system to prevent tampering. The decryption and encryption may be performed either between a system bus and a processor's individual. LI. cache memory or between a processor's instruction and execution unit and their respective LI caches. The logic may include one or more linear feedback shift registers (L.FSRs) that may be used for generation of unique sequential address related codes to perform the decryption of instructions and transformation logic that may be used for generation of equivalent offset address related codes to perform decryption and encryption of data. The logic may also be programmable and may be used tor test purposes.

Description

Secure Computing
Field of the Inven ton 0 01| Embodiments of the present invention may relate to processing encrypted code and data in a secure maimer in a processor.
Background of th Invention
[0 02| With the frequent stories of hacking personal information of millions of customers and clients of corporations and government, data and computer securit have become significant issues in computing. Organizations, such as the Trusted Computing Group, have created a number of standards for secure authentication to decrypt pr ileged information. f PMI and other standard communication protocols have methods for encrypting and decrypting privileged data as well but all these solutions deal with encrypting transmitted or stored data, not the actual code and data used In the actual computer, which has left. a gap where hackers may be able to get access to decrypted information 'Within the computing systems themselves. Goto et al, in US Patent No, 7,865,733 granted January 4, 201 L suggests securely decrypting data received from any external memory into the processor chip, and encrypting any data being sent off of the processor chip to external memory, and Buer, in US Patent No. 7,734,932 granted June 8, 201.0, suggests a solution by leaving the data and instruction encrypted in main memory, decrypting it when fetched, into cache. Furthermore, while Hail, in US Patent No. 7,657.756 granted February 2, 2010, suggests storing the metadata for decryption in cache, it is along with the decrypted data. These may address the problem of single-threaded, single processors residing with their own cache on secure integrated cireutts(ICs), but thai is not the state of all computing.. e.g„. cloud computing, today. Most of today's servers contain multiple LCs, each with multiple processors and multiple levels of shared cache, processing potentially different applications on virtual machines in the same chip. In this environment., one application may snoop another application, within the same chip, and may do so well enough to hack it.
| 0 31 Convolution encrypting the source code, while helpful, ma still be decrypted by detecting the frequency of the instruction codes. Other techniques such s decrypting the instruction by applying the XOR of the encrypted instruction and one or more fixed keys such as described by Henry et ah, m US Patent Application Publication No. 2011/0296204, published December 1, 2011, are only as good as the keys. A sufficiently robust encryption technique may be needed to be adequately temper proof. Such a technique should be sufficiently random, lo be difficult to break. Butler, in US Patent No. 7,412,468 granted August 12. 2008, suggested using a Multiple input Shift Register (MISR). also known as a linear feedback shift register (LFS ), for both built-in self test (BIST) and the generation, of random keys for encryption, which he suggested may be applied to external messages using a form of Rivest- Shamir-Adelman (RSA) encryption. Unfortunately, Boiler's encryption, approach may require too much computational overhead for encoding and decoding a processor's instructions and data, as may other software-based encryption techniques, such as that described by Horovitz et al. in US Patent Application Publication No. 2 13/0067245, published March 14, 201.3; and while- Henry et al., in US Patent Applicatio Publication No. 2012/0096282, published April 19, •2012,: .suggest using the XOR operation's to decrypt in the' "same time" as not decrypting, they still require additional instructions · to 'switch their encryption keys. Therefore, in order to provide an adequate tamper proofing mechanism for cloud computing in multi-processor systems -with shared cache memory systems, it may be desirable to emplo a. pseudo-random key based technique for transparently encoding and decoding instructions and data, with minimal overhead, within individual processors, 'such that protected applications and data may remain encrypted in shared memory spaces.
Siffifflsn' of Enibedhnests of the Invention
|0004] Various embodiments of the invention may relate to hard ware and software encryption and decryption techniques using pseudo-random numbers generated by LFSRs for use in both testing a processor's hardware and protecting a processor's data and instructions in shared, memory by encrypting and decrypting the data and instructions between the processor and all shared memory. |0 05| In one embodiment, a nudti -processor system, e.g., on an IC, may have two or more, processors, where each, processor may include art instruction unit for processing instructions, an execution unit for operating on data, at least one cache memory, at least one interface to a system bus, logic for translating instructions accessed by the instruction unit from a cache memory and logic between the execution unit and. a cache .for translating data, where the translating may use pseudo random numbers. The logic for translating instructions may include logic for decrypting encoded instructions, and the logic for translating data may include logic for decrypting data being accessed by the execution unit and logic for encrypting data being written to the cache. The logic for translating instructions may include a LFSR., and the logic for translating data may include code transformation logic. The logic for translating data may also include logic for selectively encrypting data written to a system bus. The cache .memories may include an instruction cache and a data cache. The logic for translating. instructions may access the instruction cache, and the logic for translating data may access the data cache,
[0006] In another embodiment a multi-processor system, e.g., on an IC, may have two or more processors, where each processor may include an instruction, unit for processing instructions, an execution unit ibr operating on data, at least one cache memory, at. least one. interface to a system bus, and logic for translating data and instructions transferred between the system and a cache memory. The logic ibr translating instructions may include logic for decrypting encoded instructions, and the logic for translating data may include logic for decrypting data being accessed by the execution unit and logic for encrypting data being written to the cache.
[0.007] In another embodiment a method for encrypting a program's instructions and data may include the steps of:
a) creating initial codes and loading an LFSR with one of the initial codes; b) for each, instruction, incrementing an LFSR function to obtain its
translation code;
c) for each data space, defining a translation code, loading a LFSR with the translation code, and incrementing the LFSR, to obtain the translation code for each predefined data element; d) for each selected instruction, appending a translation code
corresponding to the value in the selected instruction's address field, to the selected instruction;
e) encoding each instruction, data and appended translation code with the translatio code associated with its address; and
f) separately encrypting the initial codes.
The selective instructions may include instructions for loading and storing registers containing addresses, branch instructions, and instructions for calling and returning from -subroutines. The LFSR may be a programmable LFSR, and the step of creating initial codes may include programming the LFSR with one of the initial codes.
|0 08| In. another embodiment, debugging unencrypted, applications may be performed without recompiling the application or altering the encrypted application's cycle-by-cycle operation, e.g., by using, zero translation codes.
|0009] In yet another embodiment, instructions to generate the data for transform. mask registers, which define the programming of code transformation logic from an LFSR' s mask, register bits, may be encrypted, appended in front of the encoded application, and may be executed .following the loading of the LFSR mask register and initial translation code. The transform mask, registers' data .may be generated by:
A) rotating an identity matrix down one row after ORing the LFSR mask, register bits into the first bits of the last column to obtain the transform mask register data for the first transformation function.
B) modulo 2 multiplying' in matrix form, the first transformation function's transform mask register data with itself to obtain the second transformation function's transform mask register data, and
C) obtaining each successive transformation function's mask register data by matrix multiplying modulo 2, the previous transformation function's transform mask register data with itself. [O00I0| Finally, in another -embodiment, an LFSR, code transformation logic, and checksum logic may be used to generate random instructions and data to test the processor prior to normal operation.
Brief Description of the Drawings
[00011] Various embodiments of the invention will now be described in connection, with the attached drawings, in which:
[00012] Figure la and l b are conceptual diagrams of examples of multi-processor systems with encryption and decryption translators,
[00813] Figures 2a and 2b are diagrams of examples of instruction decryption using an LFSR in conjunction with instructions,
|O0O14] Figure- 3a, 3b. and 3c are diagram of examples of data encryption and decryption using offset translated codes associated with base addresses of the data.
[000.15] Figure 4 is a simplified diagram of an example of a L I data cache,
[00016] Figure 5 is a diagram of an example -four-bit Galois LFSR,
[00017] Figure 6a is a high level diagram of code translation based on the example LFSR,
[000181 Figure 6b is -detailed diagram of the code transformation logic example,
[.00019] Figure 7 is another high level diagram of a code transformation example with extended offset addressing,
[00020.1 Figure 8 is a diagram of an. example of a programmable Galois LFSR, [00021.1 Figures 9 is a diagram of an example of programmable code transformation, logic, [00022] Figure 10 Is another diagram of an example of a processor with checksum l ogic, and [00023] Figure 1. is a diagram of an example of checksum logic.
Description of Various Embodiments ]ββ 24] Embodiments of the present invention are now described with reference to Figures la-9, it being appreciated that the figures may illustrate the subject matter of various embodiments and may not be to scab or to measure.
|O0O25] Reference is made to Figure la, a conceptual diagram of an example of a multiprocessor system with encryption and decryption. The system may include two processors 10 connected via a common system bus 17 to a shared 12 cache 18 and/or other devices, such as a memory controller 19, one or more co-processor{s) 20, and/or a bridge to more peripheral devices 21. Each processor 10 may include an instruction unit 1 1 and its corresponding Li 1- eache 13 with a translator 15 for decrypting the instructions being read into the I -cac he coupled between them, in a similar manner, each processor 10 may contain an execution unit 12 and its corresponding Li D-cache 14 with another translator 16 fo encrypting and decrypting data being read into and written out of the D-cache 'coupled between them. The instructions and data may be decrypted as they are read, out of their respective LI caches, and. selected decrypted data may be sent past the L..1 D-cache- 14 to the bridge 21 and co-processors 20, while maintaining the encrypted data in the cache.
108826] The architecture shown i Figure, la may .have processors with reduced instruction set computin (RISC) and instructions that use base register plus index addressing, but it is further contemplated that LFS R encryption and decryptio of individual processor instructions and data may be employed in other processor architectures, such as complex instruction set computing (CISC) with direc addressing modes, and a system architecture, including but not limited to the one shown in Figure lb.
|OO027] Returning to the system architecture shown in Figure la, reference is now made to Figure 2a a diagram of an example of instruction decryption using an LFSR in conjunction with an. instruction address register. For every instruction address register 22, there may be an instruction LFSR 23 containing a code that may be used to decrypt the encoded instructions 24 using a set of exemsive~OR gates 25, The instruction LFSR.23 may be clocked each time the instruction address register 22 is incremented, thereby providing each, instruction with a unique code from which, the instruction may be decoded, in this manner, each occurrence of an insimction may be encoded and decoded differently, thereby obscuring the actual instruct on's identity.
£00028] Reference is now made to Figure 2b, a diagram, of an example of instruction memory including branch instructions. The normal instructions may consist of opcode 26 and address 27 fields. The address fie ids may contain offsets from base register addresses for accessing data. In addition, selected instructions may also include translation eodes 28, The fields and translation codes may initially be encrypted and may be decrypted before being used. One such selected instruction, the branch instruction, may load both the instruction address register and the instruction LFSR while executing the branch, such that the instruction LFSR may contain the proper code for the next instructio 29 after takin the branch. In the case in which the branch is not taken, the instruction address register may be incremented, and the LFSR may be clocked to provide the proper code for decrypting the next instruction. Subsequent
incrementing of the instruction address register and clocking of the LFSR may occur until the instruction 29 may be reached, at which point the contents of the LFSR may match the decrypted translation code 2$ of the corresponding branch instruction. An. initial translation code for the first instruction 30, and other data, may be obtained by the operating system through some separate secure encryption protocol such as RS A, which may be loaded into the instruction. LFSR using. a. branch to the first, instruction.
|0O029] As mentioned above, some encrypted instructions may include translation codes, whic may also be encrypted, thereby securing all but theinitial translation code for the first, instruction. In this manner, the proper translation code for each instruction may be easil obtained in. one clock cycle. by either loading or clocking the LFSR.
f 00030 Unfortunately, in a random access memory, data may not be accessed, sequentially, thereby requiring a way to directl calc ulate the proper translation code from the address of the data. So, with regard to the system architecture shown in Figure la, reference is now made to Figures 3a, 3b and 3c, diagrams of examples of data, encryption and decryption using offset translated codes associated with base addresses of the data. Each processor may contain a set of N registers 30, where the first . registers .may be used, to .hold data, and the last N- base registers may contain addresses pointing to one .or more sections of data and/or instruction code, The processor may further contain N-K translation code registers 31, which may correspond to the 1 - base registers. When, an executed instruction loads or stores data from or to a memory location, it may calculate the address of the memory location by adding an offset to the address within a base register. The translation code corresponding to that address may be calculated by transforming the code from the translation code associated with the base register by the amount of the offset, thereby creating a code for decrypting or encrypting the corresponding data. For example, the translation code for a location 37 with an offset 35 from the address in base register K may be calculated by loading the offset 35 and the translation code from code register K 33 Into the code transformation logic 34, producing a translation code in. register 36 for the location 37, which, when applied to the exciusive-ORs 38 in Figure 3c, may decrypt or encrypt the data 39 being loaded from or stored into location 37, It may be noted that, for any given target address, the same translation code may be used to encrypt and decrypt data being put- into or taken from the target, address. Furthermore, the same translation code result as. obtained, through the code transformation-logic 34 may be obtained by loading a properly wired LFSR with the translation code from code register K and clocking the LFSR by a count equal to the offset 35. In addition, multiplexors 40 may be used to select the unencrypted data 3 when sending- the data to either the co-proeessors 20 in Figure la or other peripheral devices through the peripheral bridge 21 in Figure- la, or to other locations that may require decrypted data. Furthermore, the multiplexors 40 in Figure 3c may continue to select the encrypted dat when sending the data either back to the caches 14 and 18 in Figure la or to main memory through the memory controller 19 in Figure la. Such selection may be done by a select line 41 in Figure 3c, which may be driven either from a control register (not shown) or by control bits within the instructions themselves.
[00031J In addition to branches, instructions for loading a base register's address may also load the base register's associated code register, in a similar manner, a subroutine call may store the translation code associated with the instruction in the instruction LFSR after saving the prior contents of the LFSR in the code register associated with the base register where it stores its return address. Similarly, a return instruction may branch to the address in the base register while loading the contents of the corresponding code register into the instruction LFSR. 100032] Initial encryption of the instructions in a program and data space may be performed after compilation and before the final load module creation, -e.g., by: creating an initial translation code; incrementing the LFSR function to obtain the translation code for each instruction;
defining a translation code for each data space; incrementing the LFSR function to obtain the translation code for each predefined data element; appending to selected instructions the translation code corresponding to the value in the address field of those instructions; and encoding each instruction, data and appended translation code with the translation code associated with its address. The instructions requiring appended translation codes may include instructions involving addresses in base registers, branches and/or subroutine calls,
|OO033| Reference is now made to Figure 4, a simplified diagram of an example of a L I data cache. The LI Cache 45 may be coupled to both a write line buffer 43 and a read line buffer 44. In one embodiment, the contents of the data cache may be encrypted in a manner similar to the system architecture shown in Figure la, and the read line buffer 42 may be coupled to the data, inputs 39 in Figure 3 c for decryption of the encrypted data. The write line buffer 43 may also be connected to the outputs 42 in Figure 3c for re-encryption of updated, data and may be configured to read the register 36, to load the associated translation code into a portion of the data cache 46 along with the encrypted updated data. Part; of the write line buffer 43 may also be connected to read the output register 36 following execution of the code transformation logic 34, in this manner, on a cache line miss, the translation code associated with the new cache line data may be generated while the data portion of the write line buffer 43 may be filled from externa! cache' or memory. Similarly, the translation code may be read from a section of the cache 46 into the register 36 to subsequently translate the read line buffer's 44 data for the execution unit. Finally, when othe devices snoop the LI Cache, they may onl extract encrypted data.
0034| In ahother embodiment, the contents of the LI cache may be decrypted in the system architecture depicted in Figure l . In this case, the processor may not decrypt the data or instructions from the cache. Rather the translation code 36 for the address of the data or instructions being written into the write line buffer may be used to decrypt the data or instructions and may be subsequently stored in a code portion 46 of the cache along with the decrypted data. When the cache is snooped by another processor, the translation code associated with the data may be used to encrypt the snooped data,
100035) Furthermore, it is contemplated that an LFSR, starting with a translation code for the first instruction or word of data in a cache line buffer, may be clocked and applied to each subsequent instruction or word of data being read from or written into the cache line buffer. If the read or write is out of order, the translation code may be adjusted by a single transformation function that may "subtract" the buffer size from the LFS when the data or instructions wrap around the line buffer. Given an .LFSR function with M unique values before repeating, "subtraction" of N is equivalent to a transformation function of M-N, where >N.
[00036] A simple four-bit example may be used to clarify the structure and functional operation of both an LFSR and its associated code transformation logic. Reference is now made to Figure 3, a diagram of an example four-bit Galois LFSR composed of four flip-flops 5 ! -54 seriall coupled in a ring, with one. e clusive-OR gate 55 connected between, the first flip-flop 51 and the second flip- flop 52, and a feedback loop 56. -connecting the fourth, flip-flop 54 to the exclusive-OR 55 and the first flip-flop 51, where the outputs of all the flip-flops are coupled to a code register 57, This particular LFSR may be used to generate all fifteen distinct non-zero translation codes (i.e.,. all fifteen distinct non-zero combinations of four bits) before repeating.
[06037] Reference is now made to Figure 6a, a high level 'diagram of an example of -offset translation based on the example LFSR- of Figure 5. Given an initial translation code
corresponding to a specific address, M clocks may he used to generate the translation code for an address of the Mfh word after the specific address. Alternatively, the initial translation code Co.60 may applied .with an -offset value 61 to code transformation logic 63 to produce a translation code CM in an outpol code register 62, which corresponds to the translation code the Mill word after the specific address. The code transformation logic may produce translation code C by successively, for each set (i.e., or "high"), or alternatively s clear (i.e., **0" or "lo w5") bit in an Nth position of the binary offset value, transforming the initial t anslation code by a function that is equivalent to clocking the LFSR by 2 times. For any four bit offset, the code transformation logic may transform the translation code by some combination -of the transforation functions: J I (one clock) 64, J2 (two ciocks) 65, J4 (tour clocks) 66 and/or 18 (8 ciocks) 67,
[00038 Reference is now made to Figure 6b, a detailed diagram of the code transformation logic example, showing details of an example of a hardware implementation of the code
transformation logic 63. Each of the functions J! through 18 may be comprised of exclusive- OR gates 69 and multiplexors 68. The function Jl 64 may select the four b ts from Co 60 if the lowest order hit from the offset 61 is low, and may select Che same values the LFSR would generate in. one clock cycle if the lowest order bit from the offset 61 is high. Note that the second bit is the exelusive-OR 69 of the first and fourth bits, as would be captured in the second flip-flop 51 of the LFSR in Figure 5 after one clock. Similarly, the function 12 65 may select between its input values if bit 1 of the offset is low and the same values the LFSR would generate after two clock cycles, which is equivalent to two cycles of 11 , if bit! of the offset is high, i this manner, each respecti ve bit of the offset selects a corresponding function equivalent to clocking the LFSR by the number of clock cycles corresponding to the position of the. offset bit, depending on whether the respective offset bit is high or low. Clearly, by the logic In Figure ' b, an offset of zero would make CM™ Co, but an offset of 15. would also be equivalent to no change because the LFSR cycles, through only 15 numbers, so Ci5=Co. This can be shown by the following derivation:
[00 3 J Given the inputs tor Ix -are a(),b0,c0 and dO and the outputs are &x,bx,cx.d?i5 letting the symbol *'<-4* representing assigning the expression of inputs on the right t the output on the left, and letting represent an exelusive-OR operation, and given:
J I is a! <-d0s bl <-(a0+d0), c-K-bO, dl<-c0;
12 is a2<-cG, b2<-(c0+d0), c2« a0+dO), d2<~bi)
14 is a4<-(a0+d0)s b4<-((aCKd0)+bO), c4<-(b0+e0), d4<-(c0+d0); and.
18 is a8 -(aO+cO)¾ h8<-((b0÷d0)+c0), c8<~((aO+cO}+dO), d8<-(b0+d0); then
115 is al 5<~a0, hi S<-b0, c1.5<-e(l dl 5<-d0; so Cl5 ¼
|0004OJ In the case where the offset may be larger than the size of the non-repeating numeric sequence of the LFSR, it may possible to reduce the logic of the higher order transformation functions. Refcrence is now made to Figure 7, a diagram of a code transformation logic example with a six-bit offset. In this case, because the LFSR shown in Figure 5 has a repeating sequence of fifteen numbers, the transformation function equivalent to clocking the LFSR sixteen times is the same as clocking the LFSR once. Therefore, the transformation function 71 may be controlled by the offset bit 72 corresponding to 24~16, but 16-15::T is the il (one clock) transformation function. Similarly, the fifth bit 73
Figure imgf000013_0001
corresponding to 23-32, but 32-2* i 5= 2 is the. J2 (two clock) transformation. The same type of modulo calculation may be applied to generate code transformation logic for any LFSR with any size offset,
1080411 While the above techniques may provide reasonably strong encryption when using large LFSRs, the encryption may be weaker for smaller LFSRs. One solution may be to expand the number of potential, repeating sequences by making the LFSRs and code transformation logic programmable. Reference. s now made to Figure 8, a diagram of an example of a
programmable Galois LFSR. The LFSR 80 may contain any number of flip-flops 81 serially coupled in a ring, where, each flip-flop, except the last flip-flop 82, may drive an e clusive-OR gate 83:, and all may be driven, by a multiplexor 84, which may select between, loading a code 85 into the LFSR or sequencing the LFSR. Each exclusive-OR gate S3 may also be driven by an. AND gate 86, which may be used to enable the signal on the feedback line 87 with a bit from an LFSR mask register 88. The LFSR mask register 88 may contain N-l bits for a fllp-fiop LFSR, The LFSR mask register bits may be loaded with any Galois LFSR configuration. The LFSR may be clocked on all increments of the .instruction address register, thereby stepping through the LFSR. states. In any one of the ideal configurations, the LFSR may repeat ever state possible except zero. Loading a zero code into the LFSR may be equivalent to no decryption, given that no amount of clock may change the 'state of the LFSR and that no bits are changed when exclusive-ORed with zero.
[00042] Therefore, in another embodiment, the LFSRs and all decryption may be disabled by loading translation codes of zero. This may be performed, e.g., when exiting an encrypted application.
£00043] Reference Is now made to Figure 9, a diagram of an example -of programmable offset translation logic. Each respective transformation functio 90 may include N transform mask registers 91 of N bits each, where N is the number of flip- flops in the corresponding LFSR. There may be one transform mask register for each output of the transformation function. Each of the bits in the transform m sk registers may be used to select, via an AND ga te 92, a corresponding input to the transformation function. The selected inputs may be exclusive- ORed, through a tree of exclusive-OR gates 93, to form the output 94 selected by a bit 95 from the offset 96. for an LFSR with N flip-flops, programmable code transformation logic for an N-bit offset may require up to NJ programming bits. It should be noted that the transformation functions selected by all offset bits above may be copies of the first N transformation functions, thereby requiring no additional programming bits. 00044] The actual number of unique bits required to program the code transformation logic may be much less than N *. First, it should be noted that the transform mask register bits for the .first transformation function, when viewed as an NxN matrix, may be generated by rotating an identity matrix down one ro after ORing the N-l LFSR mask register bits into the first bits of the last column, in a manner that properly simulates one clock shift of the associated LFS'R. The second transformation function's matrix may be generated b multiplying modulo 2 the first transformation function's matrix by itself the third transformation function's matrix may be generated by multiplying modulo 2 the second matrix by itself, and each, successive transformation function's matrix may be generated from the matrix of the previous
transformation function in the same manner. As such, the N" programming bits o a
programmable code transformation-function may be generated with as few as N- i
programmable bits, or may, with appropriate logic, only require N-l programmable bits. 00045] Assuming a programmable version of the LFSR. in Figure .5 and the code transformation logic in Figures 6a and 6b, the process to generate programming bits for the code irans format on logic may be as follows:
Given the 3 LFSR hits are [1 0 0], the single shift matrix [J!] may be:
Figure imgf000014_0001
The matrix for two shifts 12] may be:
Figure imgf000015_0001
He matrix for four shifts [14] may be
Figure imgf000015_0002
And the matrix.fbr eight shifts [J8] may be:
1 0 0 1 1 0 0 1 1 0 1 2 1 0 1 0 a
1 1 0 1 1 1 0 1 2 1 1 3 0 1 1 1 b b+c+d
P 1 1 0 X 0 1 I 0 1 2 1 1 1 0 1 1 X c a+c+d
0 0 1 1 0 0 1 1 0 1 2 1 0 1 0 1 d b+d'
In the above equations*. 4<4-'' is an XO function,, .and the modulo 2 operation is explicitly shown only for the last equation.
1000461 i is further contemplated mat the LFSR mask register bits needed for programming the LFSR may not be the. bits used to program the transformation functions, thereby providing different encryption algorithms for the instruction and data. Such, additional mask register bits may also be included with the initial translation code.
[00047J It is also contemplated that the mask register bits may be encrypted with the initial translation code, and prior to executing the encrypted program, the mask register data may be decrypted by loading the initial translation code into the LFSR, using the initial translation code to decode the mask register data without clocking the LFSR, and then loading the LFSR's decrypted mask register data. |ββ0 8'| It is also contemplated that instructions to generate the data for the transform mask registers from the LFSR's mask register foils may be. encrypted, appended in front of the encoded application, and may be executed following the loading of the LFSR mask register and initial translation code. I should be noted that this code may not address data memory, which may require the use of the yet-to-be-programmed code transformation, logic. As such, all transform mask registers may foe directly addressable by instructions, and all generation of the transform mask register data may be done in situ, thereby avoiding use of addressed data memory.
J'00049] Furthermore., it is contemplated that the processor's legal instruction codes may be a small fraction of the possible values in the opcode field of an instruction. Upon incorrect decryption, the execution of an illegal instruction may cause an operating system interrupt, thereby allowing the operating -system to- detect instruction tampering. Similarly, by maintaining legal memory space or spaces thai are small relative to the full address space, illegal addresses may also cause operatin system interrupts, thereby allowing the operating system to detect data tampering.
A Practical Example
{OOeSOJ Small examples, such as those above, may be useful for illustrating the detailed logic, but in current .more realistic multi-processor environments, -a practical example ma be a 32-bit RISC processor with 20-bit offset address fields in the instructions and multiple levels of cache. In this example, the instructions, and data may remain encrypted, within their respect ve caches, the'.LFSR may be 32 bits long, and the LFSR mask register may be 31 bits long, 'both manageable sizes of separately encrypted initial codes. Once loaded, the longest path between flip-flops on the programmable LFSR may be an AND gate followed by XQR gate, and loading the LFSR may also only take one clock cycle; hence, the decryption of the instructions may easily occur during the instruction unit's fetch cycle. For branch look-ahead techniques or intermediate loop instruction storage, the proper decrypted translation codes for each stream may be stored with the branch predictions or loop instructions.
100051 f The data code transformation, logic may be much larger. The offset address field may contain a 20-bit offset, which may result In 20 transformation functions, each of which may have 32 bits of 32 AND gates masking the input signals to a 6-level tree of 31 XOR gates. Bach of the 20 transformation functions may then contain eight levels of logic (I AND, 6 XORs and 1 mu!tiplexor), for a total of 1 ,024 AND gates, 992 XOR gates, 32 multiplexors, and 32 32-bit transform mask registers. The worst-ease path in such structure may he up to 160 gate levels long. This may be reduced where the terms are not needed, but the result may still require many- clock cycles. Still , the i ne needed to calculate the proper cache line translation code may overlap with the time required to process a cache line miss request to either an 12 cache or main memory, which also may take many clock cycles. Upon receiving the exfemaily requested cache line, the translation code may be stored in the LI data cache with the encrypted cache line.. Upon a subsequent cache hit, the translation code may be retrieved to decrypt the data retrieved from the cache or to encrypt the data written to the cache, as shown in Figure 4. To save space, the translation code stored in the cache may be only applicable to the first word in a 2*1 word cache line. A -bit code transformation logic block may theft be used to create the translation code for the proper word out of the cache line, or a combination of a K-M bit code transformation logic block and 2** cycles of an appropriately loaded LFSR may be used. It should be noted that, because of the short path within the LFSR, the LFSR may also be clocked at multiple of the processor clock.
100052] in. yet 'another embodiment of the present invention, the mask register and code transformation logic may be reduced by limiting the programming to a subset of the bits.
Debug and Test
|'00053| in another embodiment, debugging of applications may be performed without, recompiling the application or altering its cycle-by-cycle operation. Unencrypted applications may also be modified before the -final load module creation, e.g., by creating a zero initial translation code and appending to the selected instructions a zero translation code. Execution of the unencrypted application may then be performed with all the available transparent debug facilities as may exist in the processor, and with the translation logic enabled. Furthermore, the unencrypted code may then perform in the same eyele-hy-cyele manner as the encrypted code. Similarly, when subsequently encrypting the application, or re-encrypting the application, its size and cycle-by-cycle operation may not change. |00 54j In another embodiment, the LFSR, code transformation logic, and checksum logic .may be used to generate random instructions and data to test the processor prior to normal operation. Reference is no w made to Figure 10, another diagram of an example of a processor 100 with checksum logic 107 coupled to the output of the instruction unit 101 and checksum logic 108 coupled to the output of the execution unit 102. To initiate LFSR-generaicd processor BIST, the LI caches 103 and 04 may be initialized to zero. Reference is now made to Figure 11, a diagram of an example of checksum logic. The checksum register 1 J 1 may be cleared or loaded with an initial code. On each clock cycle, the input data 112 may be combined with the current contents of the checksum register 1 1 1 through exelusive-G (XOR) gates 1 13 to update the checksum register 1 1 1. The input data 1 12 may be instructions or control signals from the instruction unit or may be data and control signals from the execution unit. Testing may proceed by: a) Loadin an LFSR translation code and an initial instruction address, b) Disabling cache misses by loading just the translation codes and
addresses, c) Clearing and disabling interrupts, d Clearing and enabling the checksums, e) Executing tor a prescribed number of cycles, and f) Reading and comparing the contents of the checksum registers with predetermined results.
| 0 55j The control signals may include interrupt signals, instruction addresses, and/or other signals generated by the execution of the test and captured by the checksum prior to bein disabled. Alternatively, some amount of encoded instructions ma be loaded into the i-cache, and mcodad data into the D-cache to perform partial or full diagnostic tests. In this manner, the LFSR, transformation logic and checksums may be used to perform processor BIST or to aid in processor diagnostic tests.
[O0056J It will be appreciated by persons skilled in the art that the present invention is not limited by what has been particularly shown and described hereinabove. Rather the scope of the present invention includes both combinations an subcombinations of various features described hereinabove as well as modifications and variations which would occur to persons skilled in the art upon readirig the foregoing description and which are not in the prior ait.

Claims

A multi-processor system, having two or more processors, each processor including:
an instruction unit for processing instructions;
an execution unit for operating on data;
at least one cache memory
at least one interface to a system bus;
logic configured' to translate instructions accessed hy the instruction unit from a cache memory; and
logic between the execution unit and a cache for translating data;
wherein, translating the Instructions, the data, or both uses pseudo-random numbers.
The multi-processor system as in claim I , wherein the logic configured to translate instructions includes logic configured to decrypt encoded instructions, and the logic configured to translate data includes logic configured to translate data being accessed by the execution unit and logic configured to encrypt data being written to the cache.
The multi-processor system as in claim 2, wherein the logic configured to translate instructions includes .a linear feedback shift register (LFSR). and wherein the logic configured to translate data includes code transformation logic.
The multi-processor system as in claim 3, wherein the LFSR includes a programmable LFSR.
5. The multi-processor system as in claim 2, wherein the logi configured to translate data includes logic configured to selectively encrypt data written to Che system bus.
6. The multi-processor system as in c um 2, wherein the at least one cache memory includes an instruction, cache and a data cache.
7. The multi-processor system as in claim 6, wherein the logic configured to translate instructions is configured to access the instruction cache, and wherein the logic configured to 'translate data is configured to access the data cache.
8. An integrated circuit including the multi-processor system according to claim 1 ,
9. A multi-processor system having two or more processors, each processor including:
an instruction unit for processing instructions;
an execution unit, for operating on data;
at least one cache memory;
at least one interface to a system bus: and
logic configured to translate data and instructions transferred between the system and a further cache memory.
10. The misiii-processor system, as in claim , wherein the logic configured to translate instructions and data includes logic configured to translate instructions that include logic configured to decrypt encoded instructions and logic configured to translate data that includes logic configured to decrypt data being accessed by the execution unit and to encrypt data being written to the cache.
1 1. Aii integrated circuit including the multi-processor system as in claim 9.
12. A method for encrypting Instructions and data of a program of a processing system, the method including:
creating initial codes and loading a linear feedback shift register (LFSR) with one of the initial codes; for a respective instruction, incrementing an LFSR function to obtain a translation code lor the respective instruction;
for a respective data space, defining a translation code, loading an LFSR with the translation code, and incrementing the LFSR, to obtain a translation .code for a respective predefined data element corresponding to the respective data space; for a respective selected instruction, appending to the selected instruction a translation code corresponding to a value in an address field of the selected, instruction:
encoding a respective instruction, data and appended translation code with a translation code associated with an address of the respective instruction; and separately encrypting the initial codes.
13. The method as in claim Ϊ2, wherein the .instructions include:
instructions for loading and storing registers containing addresses;
branch instructions; and
instructions for calling and returning from subroutines.
14, The method as in claim 12, wherein the LFSR is programmable- LFSR, and wherein creating initial codes includes programming the LFSR with one of the initial codes.
PCT/US2014/031396 2013-04-17 2014-03-21 Secure computing WO2014172062A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP14784683.6A EP2987086B1 (en) 2013-04-17 2014-03-21 Secure computing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/864,688 2013-04-17
US13/864,688 US9280490B2 (en) 2013-04-17 2013-04-17 Secure computing

Publications (2)

Publication Number Publication Date
WO2014172062A2 true WO2014172062A2 (en) 2014-10-23
WO2014172062A3 WO2014172062A3 (en) 2015-11-26

Family

ID=51729960

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/031396 WO2014172062A2 (en) 2013-04-17 2014-03-21 Secure computing

Country Status (3)

Country Link
US (1) US9280490B2 (en)
EP (1) EP2987086B1 (en)
WO (1) WO2014172062A2 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9846656B2 (en) 2013-04-17 2017-12-19 Laurence H. Cooke Secure computing
US9454418B1 (en) * 2014-08-21 2016-09-27 Rockwell Collins, Inc. Method for testing capability of dissimilar processors to achieve identical computations
WO2016100506A1 (en) * 2014-12-16 2016-06-23 Kyndi, Inc. Method and apparatus for randomizing computer instruction sets, memory registers and pointers
US9525457B1 (en) * 2015-07-01 2016-12-20 Honeywell International Inc. Spread spectrum clock generation using a tapped delay line and entropy injection
WO2017139010A2 (en) 2015-12-07 2017-08-17 Cooke Laurence H Secure computing
US10210323B2 (en) * 2016-05-06 2019-02-19 The Boeing Company Information assurance system for secure program execution
CN107292135A (en) * 2017-06-06 2017-10-24 网易(杭州)网络有限公司 A kind of program code guard method and device
EP3460709B1 (en) * 2017-09-26 2022-02-09 Secure-IC SAS Devices and methods for secured processors
US11954360B2 (en) * 2020-09-01 2024-04-09 Intel Corporation Technology to provide accurate training and per-bit deskew capability for high bandwidth memory input/output links

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011149986A1 (en) 2010-05-27 2011-12-01 Cisco Technology, Inc. Virtual machine memory compartmentalization in multi-core architectures

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5224166A (en) * 1992-08-11 1993-06-29 International Business Machines Corporation System for seamless processing of encrypted and non-encrypted data and instructions
US7270193B2 (en) * 2000-02-14 2007-09-18 Kabushiki Kaisha Toshiba Method and system for distributing programs using tamper resistant processor
US20020107903A1 (en) * 2000-11-07 2002-08-08 Richter Roger K. Methods and systems for the order serialization of information in a network processing environment
US6678707B1 (en) 2000-10-30 2004-01-13 Hewlett-Packard Development Company, L.P. Generation of cryptographically strong random numbers using MISRs
JP4263976B2 (en) * 2003-09-24 2009-05-13 株式会社東芝 On-chip multi-core tamper resistant processor
US7734932B2 (en) 2003-11-10 2010-06-08 Broadcom Corporation System and method for securing executable code
JP4447977B2 (en) 2004-06-30 2010-04-07 富士通マイクロエレクトロニクス株式会社 Secure processor and program for secure processor.
US7657756B2 (en) 2004-10-08 2010-02-02 International Business Machines Corporaiton Secure memory caching structures for data, integrity and version values
US20090319673A1 (en) * 2008-04-24 2009-12-24 International Business Machines Corporation Automated Wireless Device Pairing
US7752369B2 (en) * 2008-05-09 2010-07-06 International Business Machines Corporation Bounded starvation checking of an arbiter using formal verification
US8819839B2 (en) 2008-05-24 2014-08-26 Via Technologies, Inc. Microprocessor having a secure execution mode with provisions for monitoring, indicating, and managing security levels
US8024616B2 (en) 2009-01-26 2011-09-20 International Business Machines Corporation Pseudo random process state register for fast random process test generation
US8671285B2 (en) 2010-05-25 2014-03-11 Via Technologies, Inc. Microprocessor that fetches and decrypts encrypted instructions in same time as plain text instructions
US20120079281A1 (en) 2010-06-28 2012-03-29 Lionstone Capital Corporation Systems and methods for diversification of encryption algorithms and obfuscation symbols, symbol spaces and/or schemas
EP2756438B1 (en) 2011-09-13 2020-11-11 Facebook, Inc. Software cryptoprocessor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011149986A1 (en) 2010-05-27 2011-12-01 Cisco Technology, Inc. Virtual machine memory compartmentalization in multi-core architectures

Also Published As

Publication number Publication date
EP2987086A4 (en) 2016-12-14
EP2987086A2 (en) 2016-02-24
US20140317419A1 (en) 2014-10-23
US9280490B2 (en) 2016-03-08
WO2014172062A3 (en) 2015-11-26
EP2987086B1 (en) 2022-03-02

Similar Documents

Publication Publication Date Title
US10095636B2 (en) Secure computing
EP2987086B1 (en) Secure computing
Götzfried et al. Cache attacks on Intel SGX
EP3682362B1 (en) Call path dependent authentication
US9990249B2 (en) Memory integrity with error detection and correction
CN107851170B (en) Supporting configurable security levels for memory address ranges
CN112149145A (en) Data encryption based on invariant pointers
US10237059B2 (en) Diversified instruction set processing to enhance security
US20050108507A1 (en) Security of program executables and microprocessors based on compiler-arcitecture interaction
KR20180059954A (en) Memory integrity
TW201426540A (en) Apparatus and method for generating a decryption key
TWI627556B (en) Microprocessor and method for securely executing instructions therein
WO2023121757A1 (en) Hardening cpu predictors with cryptographic computing context information
Milenković et al. Using instruction block signatures to counter code injection attacks
Zhang et al. Klotski: Efficient obfuscated execution against controlled-channel attacks
US10169251B1 (en) Limted execution of software on a processor
EP3387530B1 (en) Secure computing
CN117546168A (en) Cryptographic computation using context information for transient side channel security
Hossain et al. Hexon: Protecting firmware using hardware-assisted execution-level obfuscation
Biernacki et al. Thwarting Control Plane Attacks with Displaced and Dilated Address Spaces
US20230400996A1 (en) Apparatus, Device, and Method for a Memory Controller, Memory Controller, and System
US20230195907A1 (en) Apparatus And Method For Defending Against Control Flow Attack, And Processor
EP4202748A1 (en) Data oblivious cryptographic computing
Wichelmann et al. Obelix: Mitigating Side-Channels through Dynamic Obfuscation
Mangard Memory Safety and Fault Security Through Cryptography

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14784683

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2014784683

Country of ref document: EP