WO2020146285A1  Protection of cryptographic operations by intermediate randomization  Google Patents
Protection of cryptographic operations by intermediate randomization Download PDFInfo
 Publication number
 WO2020146285A1 WO2020146285A1 PCT/US2020/012419 US2020012419W WO2020146285A1 WO 2020146285 A1 WO2020146285 A1 WO 2020146285A1 US 2020012419 W US2020012419 W US 2020012419W WO 2020146285 A1 WO2020146285 A1 WO 2020146285A1
 Authority
 WO
 WIPO (PCT)
 Prior art keywords
 vector
 value
 processing device
 arithmetic operation
 random number
 Prior art date
Links
Classifications

 H—ELECTRICITY
 H04—ELECTRIC COMMUNICATION TECHNIQUE
 H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
 H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
 H04L9/30—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy
 H04L9/3006—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy underlying computational problems or publickey parameters
 H04L9/3013—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy underlying computational problems or publickey parameters involving the discrete logarithm problem, e.g. ElGamal or DiffieHellman systems

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
 G06F21/60—Protecting data
 G06F21/602—Providing cryptographic facilities or services

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
 G06F17/10—Complex mathematical operations
 G06F17/16—Matrix or vector computation, e.g. matrixmatrix or matrixvector multiplication, matrix factorization

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
 G06F21/70—Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
 G06F21/78—Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
 G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
 G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using noncontactmaking devices, e.g. tube, solid state device; using unspecified devices
 G06F7/52—Multiplying; Dividing
 G06F7/523—Multiplying only

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
 G06F7/58—Random or pseudorandom number generators
 G06F7/588—Random number generators, i.e. based on natural stochastic processes

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
 G06F7/60—Methods or arrangements for performing computations using a digital nondenominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and nondenominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
 G06F7/72—Methods or arrangements for performing computations using a digital nondenominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and nondenominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
 G06F7/724—Finite field arithmetic
 G06F7/725—Finite field arithmetic over elliptic curves

 H—ELECTRICITY
 H04—ELECTRIC COMMUNICATION TECHNIQUE
 H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
 H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
 H04L9/30—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy
 H04L9/3006—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy underlying computational problems or publickey parameters
 H04L9/302—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy underlying computational problems or publickey parameters involving the integer factorization problem, e.g. RSA or quadratic sieve [QS] schemes

 H—ELECTRICITY
 H04—ELECTRIC COMMUNICATION TECHNIQUE
 H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
 H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
 H04L9/30—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy
 H04L9/3066—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy involving algebraic varieties, e.g. elliptic or hyperelliptic curves

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
 G06F2207/72—Indexing scheme relating to groups G06F7/72  G06F7/729
 G06F2207/7219—Countermeasures against side channel or fault attacks
 G06F2207/7223—Randomisation as countermeasure against side channel attacks

 H—ELECTRICITY
 H04—ELECTRIC COMMUNICATION TECHNIQUE
 H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
 H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
 H04L2209/08—Randomization, e.g. dummy operations or using noise

 H—ELECTRICITY
 H04—ELECTRIC COMMUNICATION TECHNIQUE
 H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
 H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
 H04L2209/12—Details relating to cryptographic hardware or logic circuitry
 H04L2209/122—Hardware reduction or efficient architectures
Definitions
 the disclosure pertains to cryptographic computing applications, more specifically to protection of cryptographic operations from sidechannel attacks.
 FIG. 1 is an exemplary block diagram of the components of a processing device capable of protecting cryptographic operations performed therein with intermediate
 FIG. 2A illustrates an exemplary operation of the Montgomery ladder multiplication algorithm with intermediate randomization, in accordance with one or more aspects of the present disclosure.
 FIG. 2B illustrates intermediate randomization operations, in accordance with one or more aspects of the present disclosure, that may be implemented to protect execution of the Montgomery ladder algorithm from sidechannel attacks.
 FIG. 3A illustrates an exemplary operation of the Double Add ladder multiplication algorithm with intermediate randomization, in accordance with one or more aspects of the present disclosure.
 FIG. 3B illustrates intermediate randomization operations, in accordance with one or more aspects of the present disclosure, that may be implemented to protect execution of the DoubleAdd ladder algorithm from sidechannel attacks.
 FIG. 4 depicts a flow diagram of an illustrative example of method of protecting cryptographic operations by intermediate randomization, in accordance with one or more aspects of the present disclosure.
 FIG. 5 depicts a block diagram of an example computer system operating in accordance with one or more aspects of the present disclosure.
 aspects of the present disclosure are directed to protection of arithmetic operations by intermediate randomizations that may be used in applications employing cryptographic algorithms, for safeguarding inputs and outputs of cryptographic computations against side channel attacks.
 a processing device may have various components/modules used for cryptographic operations on input messages.
 Input messages used in such operations are often large binary numbers whose processing is sometimes performed on lowbit microprocessors, such as smart card readers, wireless sensor nodes, and so on.
 Examples of cryptographic operations include, but are not limited to operations involving RivestShamir Adelman (RSA) and DiffieHellman (DH) keys, digital signature algorithms (DSA) used to authenticate messages transmitted between nodes of the publickey cryptography system, various elliptic curve cryptography schemes, etc.
 RSA RivestShamir Adelman
 DH DiffieHellman
 DSA digital signature algorithms
 Cryptographic algorithms often involve modular arithmetic operations with modulus N, in which the set of all integers Z is wrapped around a circle of length N (the set Z N ), SO that any two numbers that differ by N (or any other integer of N) are treated as the same number
 a modular (modulo N) multiplication operation, AB mod N may produce the same result for many more different sets of the multiplicand A and the multiplier B than for conventional arithmetic operations. For example, if it is known that a product of conventional multiplication of two positive integers is 6, it may then be determined that the two factors (the multiplicand and the multiplier, or vice versa) must necessarily be 2 and 3 (excluding a trivial product of 1 and the number itself, 6).
 a cryptographic operation on an elliptic curve may involve selecting a base point P (which may be a public key) and multiplying P by an integer number k (which may be a private key): Q kP.
 the elliptic curve multiplication may be defined via a set of specific rules for point doubling, 2P, point addition (P1+P2), zero (infinity) point, and so on.
 the strength of the elliptic curve cryptography is rooted in the fact that for large values of k, the resulting point Q can be practically anywhere on the elliptic curve.
 the inverse operation to determine an unknown value of the private key k from a known value Q (referred to as the discrete logarithm of Q to base P: k— log P Q ⁇ can be a prohibitively difficult computational operation.
 a number of laddertype algorithms may be used which require a significantly reduced number of loop iterations (generally, about log 2 k iterations).
 two registers e.g., R(0) and R(l)
 R(0) and R(l) may be used to store the accumulator value A and an auxiliary value B, with one doubling and one addition operation performed at each iteration.
 the accumulator value may be set to zero, A [R(0)] 0, and the auxiliary value B may be set to P: B [R(l)] ⁇ P.
 the second register R(l) may store the same auxiliary value P across all loop iterations.
 the Double Add algorithm gives rise to following six iterations.
 the Montgomery ladder algorithm has an advantage that the doubling and addition operations at each iteration (ladder step) can be performed independently, e.g., by two separate parallel processors.
 the iterations may be performed in the reverse order, from right to left, starting from the least significant bit.
 various other algorithms may be used, such as righttoleft binary method, conjugate coZ addition method, lefttoright scalar multiplication, the Gounday JoyeMiyaji ladder, and so on.
 three (or more) registers may be used, with one register to store an accumulator value, and two (or more) registers to store two (or more) auxiliary values.
 a sidechannel attack may be performed by monitoring emissions (signals) produced by electronic circuits of the target’s (victim’s) computer. Such signals may be acoustic, electric, magnetic, optical, thermal, and so on.
 a hardware trojan and/or malicious software may be capable of correlating specific processor (and/or memory) activity with operations carried out by the processor. For example, a trojan may be capable of identifying that an elliptic curve cryptographic application has m iterations.
 the attacker employing trojan may infer from this that the private key number is such that k ⁇ 2 m — 1 (or to make even more definitive prediction that the private key resides within the interval 2 m ⁇ 1 £ k £ 2 m — 1, if the algorithm starts with the iteration that corresponds to the most significant nonzero bit of the key).
 the trojan may further identify a difference between emissions corresponding to a doubling operation and emissions corresponding to an addition operation. This may be sufficient for the trojan to determine the entire sequence of the bits representing the private key number k.
 a processing device performing randomization protection may implement random projective scaling of various numbers encountered during various iterations so that the digital representation of these numbers is modified without modifying the objects (e.g., respective points on elliptic curves) that these numbers identify.
 the processing device may perform randomized storage of intermediate outputs (such as the values of the accumulator and the auxiliary value) and control the subsequent read/load operations so that the correct dataflow is preserved.
 randomized protective measures improve the security of cryptographic operations by making it more difficult for sidechannel attackers to correlate the signals emitted by the processing device during computation.
 FIG. 1 is an exemplary block diagram of the components of a processing device 100 capable of protecting cryptographic operations performed therein with intermediate
 Processing device refers to a device capable of executing instructions encoding arithmetic, logical, or I/O operations.
 a processing device may follow Von Neumann architectural model and may include an arithmetic logic unit (ALU), a control unit, and a plurality of registers.
 ALU arithmetic logic unit
 a processing device may be a single core processor which is typically capable of executing one instruction at a time (or process a single pipeline of instructions), or a multicore processor which may simultaneously execute multiple instructions.
 a processing device may be implemented as a single integrated circuit, two or more integrated circuits, or may be a component of a multichip module.
 “Memory device” herein refers to a volatile or nonvolatile memory, such as randomaccess memory (RAM), read only memory (ROM), electrically erasable programmable readonly memory (EEPROM), flash memory, flipflop memory, or any other device capable of storing data.
 RAM randomaccess memory
 ROM read only memory
 EEPROM electrically erasable programmable readonly memory
 flash memory flipflop memory, or any other device capable of storing data.
 the processing device 100 may include, among other components, an ALU 110.
 the ALU 110 may be any digital electronic circuit capable of performing arithmetic and bitwise operations on integer binary numbers.
 the ALU 110 may be a component part of a bigger computing device, such as a central processing unit (CPU), which in turn may be a part of any server, desktop, laptop, tablet, phone, or any other type of computing device.
 the computing device may include multiple ALUs 110 and CPUs.
 the ALU 110 may receive input in the form of data operands from one or more memory devices, such as the memory devices 120, 130, 150, and 160.
 the ALU 110 may also receive code/instructions input such as the algorithm instructions 140.
 the algorithm instructions 140 may identify the computations algorithm to be implemented (e.g., Montgomery ladder, the Double Add ladder, etc.) and indicate the nature and order of operations to be performed on input data operands.
 the ALU 110 may further receive randomization instructions 142.
 the randomization instructions 142 may indicate how various randomization measures are to be performed (e.g., random projective scaling, random storage of intermediate outputs, readout procedures for retrieving randomly stored intermediate outputs, and so on).
 the algorithm instructions 140 and/or the randomization instructions 142 may also indicate, what memory devices are to store the output of the ALU operations, and so on.
 the algorithm instructions 140 and the randomization instructions 142 may be combined in a single set of instructions.
 the algorithm instructions 140 and the randomization instructions 142 may be stored separately on separate memory devices.
 the numbers A and B may be stored in a first memory device 120, which may be a RAM (e.g. SRAM or DRAM) device in one
 the first memory device 120 may be a flash memory device (NAND, NOR, 3DXP, or other type of flash memory) or any other type of memory.
 the first memory device 120 may have one input/output port and may be capable of receiving (via a write operation) or providing (via a read operation) a single operand to the ALU 110 per clock cycle. In such implementations, to perform both a read operation and a write operation involving the first memory device 120, a minimum of two clock cycles may be required.
 a second memory device 130 may be a scratchpad memory device, in one implementation.
 the scratchpad may be any type of a highspeed memory circuit that may be used for temporary storage of data capable of being retrieved rapidly.
 the second memory device 130 may be equipped with multiple ports, e.g., a write port 132 and a read port 134, in one implementation. Each port may facilitate one operation per clock cycle.
 the numbers A and B may be may be represented by n* ILbits grouped into n words with TUbits in each word.
 the ALU 110 may load one word from the second memory device 130 (via a read port 134) and may output one word to the second memory device 130 (via a write port 132).
 the second memory device 130 may be used for storing accumulators during execution of various arithmetic operations, such as addition, subtraction, and multiplication, including Montgomery reduction.
 the processing device 100 may have an additional memory device, which may be a flipflop memory device 150.
 the flipflop memory device 150 may be any electronic circuit having stable states to store binary data, which may be changed by appropriate input signals.
 the flipflop memory device 150 may be used for storing carries during execution of addition, subtraction, and/or multiplication, in some implementations.
 the processing device 100 may optionally have a third memory device 160, which may be any aforementioned type of memory device.
 the third memory device 160 may be used to store results of intermediate steps of arithmetic operations and/or final results of such operations, in one implementaion.
 the third memory device 160 may be absent, and the intermediate/fmal results may be stored in the second memory device 130 (e g., the scratchpad memory) or writen to the first memory device 120, in one implementation.
 the first memory device 120 and/or the third memory device 160 may store randomization instructions 142 (and/or algorithm instructions 140, not shown) for the ALU 110, as depicted in FIG. 1.
 the accumulator A may be stored in the second memory device 130 to allow the fastest write/read access.
 the auxiliary number B may be stored in the flipflop memory device 150 and may be overwritten after every iteration of the algorithm (e.g., as in the case of the Montgomery ladder) or remain fixed (as in the case of the DoubleAdd ladder).
 random numbers may be stored in the flipflop memory and may remain there until the next read operation.
 the successive bits of the key number k may be stored in the flipflop memory and may be overwritten at the beginning of the next iteration.
 the bits of the key number k may be stored in the second memory device.
 any or all of the accumulator, the auxiliary number, the random numbers, an the key number k may be stored in the first memory device.
 FIG. 2A illustrates an exemplary operation 200 of the Montgomery ladder multiplication algorithm with intermediate randomization, in accordance with one or more aspects of the present disclosure.
 the exemplary operation 200 may be performed by one or more processing devices 100, in some implementations.
 the input of the exemplary operation 200 may include a number k and a number P.
 the number k may be a private key represented by a sequence of bits ( k 0 k 1 k 2 k 3 ).
 the number P may be a public number.
 the number P may represent a point on an elliptic curve that may be identified by affine coordinates (x,y).
 the point (x,y) on the elliptic curve may be identified with projective (e.g., Jacobian) coordinates whose number exceeds two.
 the value Z may be chosen to be an arbitrary (nonzero) number. This may allow projective scaling of the projective coordinates with an arbitrary value Z at various stages of the algorithm that uses intermediate randomization.
 the point P may be represented as P— (x, y, 1), but at a later stage of the algorithm the projective coordinates may be scaled by an arbitrary number Z such that ( x , y, 1) ® (xZ 2 , yZ 3 , Z) .
 a different projective scaling may be used.
 the multiplication Q kP may be performed using a number of iterations determined by the number of bits in the binary representation of k.
 the iterations may be performed by a processing device (e.g., ALU 110) having access to two (or more) memory registers, e.g., registers R(0) and R(l).
 the registers R(0) and R(l) may be separate physical memory devices.
 the registers R(0) and R(l) may be virtual registers implemented in the first memory device 120, the second memory device 130, the third memory device 160, the flipflop memory device 150, and so on.
 the registers R(0) and R(l) may be some memory addresses accessible to the processing device.
 One of the registers, e.g., R(0) may be used to store the accumulator value A (which may initially be set to zero).
 register R(l) can store a base point P or some other value.
 additional registers may store some additional values, as may be required or optional for a given algorithm being implemented.
 the values stored in the two registers may be swapped (shuffled), so that the register R(l) may be to store the accumulator value A whereas the register R(0) is to store the auxiliary value B.
 the registers R(0) and R(l) may include multiple subregisters (virtual subregisters, memory addresses, etc ), with each of the multiple subregisters storing one of the affine (xy) or projective (C,T,Z) coordinates corresponding to the respective point (A or B) on the elliptic curve.
 the processing device performing the cryptographic operation 200 may implement additional steps to protect the operation from side channel attacks by using intermediate randomization.
 the processing device may use a random number generator to generate a random number.
 the random number may be a onebit number b j , with the subscript j indicating the iteration of the loop.
 the processing device may store the accumulator value A+B in register R(l) (224) and store the new auxiliary value 2B in register R(0) (226).
 This randomization of outputs makes it harder for an adversary attempting a sidechannel attack to determine reliably the value of the key bit k j .
 the processing device may perform additional randomization of the values stored in R(0) and R(l) by performing random projective scaling.
 the value stored in register R(0) may be projectively scaled with some random number Z R(O) that may be produced by the random number generator, as schematically shown by blocks 230 and 236.
 the value stored in R(l) may be projectively scaled with a random number Z R(I) that may be produced by the random number generator, as schematically shown by blocks 232 and 234.
 the random numbers Z R(O J and Z R(I) may be different in some implementations.
 the numbers Z R(O J and Z R(I) may be the same, so that the random number generator has to be invoked once.
 the numbers Z R(O) and Z , ⁇ / may be short numbers, e g., singlewordlong numbers, so that the additional computations required to perform projective scaling in blocks 230, 232, 234, and 236 are minimized while still serving the purpose of randomizing the data flow of the operation 200.
 only one of the numbers Z R(O) and Z RQ) may be generated and only one of the values stored in R(0) or R(l) may be projectively scaled.
 the decision which value is to be scaled may be based on generation of an additional random number C j , which may be independent from the random bit b j that controls the swapping.
 the random number generator may generate a random (singleword or multi word) number Z and a random number C j to determine to which of the two registers R(0) or R(l) the random number Z is to be applied.
 the processing device may retrieve the current value of the accumulator value and the auxiliary value stored in R(0) and R(l).
 the processing device may have to account for a possibility that the shuffle operation during the previous /th iteration may have resulted the accumulator value being stored in R(l) and the auxiliary value being stored in R(0).
 the processing device may compute the value 2R(0) and identify it is the new value of the accumulator 240 (that is equal to 2A+2B, in the current illustration), as indicated by the thick dashed line in FIG. 2A.
 the processing device may further compute the value R(0)+R(1) (that is equal to A+3B) and determine it to be the new auxiliary number 242, as illustrated by the thin dashed lines in FIG. 2A.
 the processing device may compute the value R(0)+R(1) and identify this value as the new accumulator value regardless of the value of the random bit b j .
 the determined values of the accumulator 240 and the auxiliary number 242 may then be stored in a manner described above in relation to the yth iteration.
 the accumulator 240 may be stored in register R(l) and the auxiliary number 242 may be stored in register R(0).
 FIG. 2B illustrates these intermediate randomization operations 250, in accordance with one or more aspects of the present disclosure, which may be implemented to protect execution of the Montgomery ladder algorithm from sidechannel attacks.
 the operations 250 may include: adjustment of the read operations to compensate for the randomization (swapping) that may have been performed at the end of the previous yth iteration; selection of a correct input register for the“double” operation of the +7th iteration; selection of correct registers to store the output values of the +7th iteration based on the key bit value k j+1 ; and conditional swapping of the output values of the y+7th iteration based on the value of a random b j+1 .
 projective scaling performed at the end of the previous yth iteration. It shall be noted, however, that in other possible implementations projective scaling may be performed in a different order than that shown in FIG. 2B, since projective scaling does not change the location of the corresponding point(s) on the elliptic curve.
 projective scaling may be performed after the input values are read from the registers R(0) and R(l) during the 7+ 7th iteration but before the“double” and/or“add” operations are performed.
 projective scaling may be implemented after the “double” and/or“add” operations are performed (but prior to storing the output values in the registers), and so on. Additional projective scaling may be performed at the end of all iterations of the algorithm.
 FIG. 3A illustrates an exemplary operation 300 of the DoubleAdd ladder multiplication algorithm with intermediate randomization, in accordance with one or more aspects of the present disclosure.
 the exemplary operation 300 may be performed by one or more processing devices 100, in some implementations.
 the number P may be a public number.
 the number P may represent a point on an elliptic curve that may be identified by affine coordinates (x,y) and/or a set of projective (e g., Jacobian) coordinates.
 affine coordinates x,y
 projective e g., Jacobian
 the iterations of the DoubleAdd ladder algorithm may be performed by the processing device (e.g., ALU 110) having access to two (or more) memory registers, e.g., registers R(0) and R(l), which may be similar (in implementation and function) to the registers R(0) and R(l) described in relation to the Montgomery ladder algorithm.
 One of the registers, e.g., R(0) may be used to store the accumulator value A (which may initially be set to zero).
 the other register, e.g. R(l) may be used to store an auxiliary value, which in the Double Add algorithm may be a value, such as the value of base point P, that is to remain fixed for all iterations of the algorithm.
 the values stored in the two registers may be shuffled, so that the register R(l) may, at times, store the accumulator value A whereas the register R(0) may store the fixed auxiliary value P.
 the auxiliary value P (and/or the accumulator value) may be projectively scaled at various stages of the Double Add algorithm.
 the projective coordinates (e.g., X, Y, Z) representing the value P (or, similarly, the accumulator value) may be changed provided that they correspond to the unchanged set of the affine coordinates (x,y) on the elliptic curve.
 Montgomery ladder various multiplication (e.g., doubling, scaling) or addition operations shall be understood to refer to multiple components (e.g., various projective and/or affine coordinates) of the value P (and the accumulator value), if applicable.
 projective coordinates may have 4 or more components (e.g. X, Y, Z, W, ...), with the additional coordinates describing, for example, a slope of a line connecting the point identified by the coordinates to some reference point (e.g., the base point P), or other values, as may be prescribed by the specific algorithm being implemented.
 the processing device may use a random number generator to generate a random number (e.g. a onebit) number b j .
 the processing device may perform additional randomization of the values stored in R(0) and R(l) by projective scaling using random numbers Z R(O J and Z, ⁇ v / , as shown by blocks 330, 332, 334, and 336, which may be performed similarly to blocks 230, 232, 234, and 236 of the Montgomery ladder algorithm.
 the processing device may retrieve the current value of the accumulator and the value P stored in R(0) and R(l).
 the processing device may have to account for a possibility that the shuffle operation during the previous yth iteration may have resulted in the accumulator value being stored in R(l) and the value P being stored in R(0).
 the processing device may therefor compute the value 2R(0) and identify it is the new value of the accumulator 340 equal to 2(2A+P), as indicated by the left dashed line at the bottom of FIG. 3A.
 the processing device may also identify the value stored in R(0) as the value P, as illustrated by the right solid line leading to block 342 in FIG.
 the determined values of the accumulator 340 and the auxiliary number 342 may be stored using conditional swapping (shuffling), as described above.
 FIG. 3B illustrates these intermediate randomization operations 350, in accordance with one or more aspects of the present disclosure, which may be implemented to protect execution of the Double Add ladder algorithm from sidechannel attacks.
 These operations may include: adjustment of the read operations to compensate for the randomization (swapping) that may have been performed at the end of the previous yth iteration; determination whether the“add” operation is to be performed in addition to the“double” operation based on the value k j+1 ; and conditional swapping of the outputs based on the value of the random b j+1 . Also shown in FIG.
 3B is the projective scaling operation, which may be performed at the end of the previous /th iteration (as indicated), or after the input values are read from the registers R(0) and R(l) during the / / th iteration but before the“double” or“double add” operations are performed, or after the“double” or“doubleadd” operations are performed, and so on.
 Protection of cryptographic operations by intermediate randomization may be performed for other multiplication algorithms in a manner similar to the one described in relation to the Montgomery ladder and the Double Add ladder shown in FIGs 2AB and 3A3B, respectively.
 the operations performed by the processing device during j+ 7th iteration of the Joye Double Add ladder may be summarized as follows, in one exemplary implementation (wherein iterations are performed in the righttoleft order, so that k 0 is the least significant bit):
 one of the registers retains its stored value while the other register stores the result of the Double and Add operation, depending on the current bit value k j+1 .
 the random bit value b j of the previous iteration controls which input value is stored in which register prior to the j+lth iteration, while the random bit value b j+1 indicates where the output value are to be stored.
 more than two memory registers R(0), R(l),... R(N1) may store N intermediate values A(0), A(l), A(2), ... that may be used in successive iterations of these algorithms.
 the protection by intermediate randomization may be used in Nvalue algorithms similarly to the Montgomery ladder and the DoubleAdd ladder algorithms described above.
 the processing device may depart from a standard storing procedure, e.g., where A(i) value is stored in the register R(i).
 the processing device may generate a random number s, which may be a number between 0 and N, and assign the value A(l) to the register R(.v).
 the processing device may generate another random number /, which may be a number between 0 and N, but excluding s, and store the value A(2) in the register R(t), and so on.
 the random numbers s, t... may be multi bit numbers represented by log 2 N bits (or an integer number of bits not less than log 2 N, if N is not a power of 2).
 Other procedures of randomly distributing N values A(0), A(l), A(2), ... to N registers R(0), R(l),... R(N1) may, alternatively, be implemented.
 the processing device may determine what output distribution procedure was implemented during the preceding iteration (e g., the values of the random numbers s, /%) and what registers are currently storing the values A(0), A(l), A(2), ... , and retrieve these values therefrom.
 the processing device may perform projective scaling of the values A(0), A(l), A(2), ..., using random multipliers RR(0), RR(1), RR(2), ... , as described above in relation to the Montgomery and the DoubleAdd ladder algorithms. Some or all of the random multipliers may be the same.
 the projective scaling randomization may alternatively (or additionally) be performed at any other time during execution of an algorithm iteration.
 the randomizations random projective scaling and random distribution of the intermediate outputs— may be performed during each iteration of the algorithm, in some implementations.
 the randomizations may be performed in a fixed order for each iteration, e g., the random projective scaling may be performed at the beginning of each iteration before the registers are read out, or after the computations of the iteration are completed but before the outputs are stored.
 randomizations may be predetermined before the algorithm is applied to a specific
 multiplication task it may be predetermined that random projective scaling is to be performed at the beginning of iterations 0, 4, 6, and prior to storing outputs in iterations 1, 2,
 the exact instances of randomizations may themselves be determined randomly. For example, prior to a particular iteration of the algorithm, the random number generator may indicate whether an output randomization is to be performed during the iteration. Similarly, the random number generator may indicate whether the projective scaling randomization is to be performed during the iteration. The two determinations may be independent from each other. The random number generator may also indicate where exactly, within the iteration, the projective scaling
 FIG. 4 depicts a flow diagram of an illustrative example of method 400 of protecting cryptographic operations by intermediate randomization, in accordance with one or more aspects of the present disclosure.
 Method 400 and/or each of its individual functions, routines, subroutines, or operations may be performed by one or more processing units of the computing system implementing the methods, e.g., a processor containing the ALU 110.
 processing units of the computing system implementing the methods e.g., a processor containing the ALU 110.
 method 400 may be performed by a single processing thread. Alternatively, method 400 may be performed by two or more processing threads, each thread executing one or more individual functions, routines, subroutines, or operations of the method. In an illustrative example, the processing threads implementing method 400 may be synchronized (e.g., using semaphores, critical sections, and/or other thread synchronization mechanisms). Alternatively, the processing threads implementing method 400 may be executed asynchronously with respect to each other. Various blocks of the method 400 may be performed in a different order compared to the order shown in FIG. 6. Some blocks may be performed concurrently with other blocks. Some blocks may be optional.
 the method 400 may be implemented by the processor/ ALU performing a cryptographic operation, which may involve a public key number and a private key number, two private key numbers, and so on.
 the cryptographic operation may be a part of a larger computational operation involving multiple private key numbers and/or multiple public key numbers.
 the cryptographic operation may involve points in a cryptographic space.
 the cryptographic space may be a space of points belonging to an elliptic curve or any other object (a line, a surface, a volume, etc.) for which rules that specify how doubling and addition operations are to be performed.
 a point in the cryptographic space may be identified by a vector having a plurality of vector components.
 the number of vector components may be more than three, in some
 One of the vectors may be an accumulator (e.g., A) and the other vector(s)
 the auxiliary vector may be used to improve efficiency of the cryptographic operation.
 the second vector may represent the accumulator and the first vector may represent the auxiliary vector.
 the auxiliary vector may represent a public key P (e.g., as in Double Add ladder algorithm) or be a combination A+P of the accumulator value and the public key P (as in Montgomery ladder algorithm), or any other number that may be used by a specific cryptographic algorithm.
 the first and/or the second vector may change between successive iterations of the algorithm (e.g., as both the accumulator and the auxiliary vector change in the Montgomery ladder algorithm).
 the first or the second vector may remain fixed between successive iterations of the algorithm (e.g., as the auxiliary vector remains fixed in the Double Add ladder algorithm).
 any of the vectors— representing working points, base points, auxiliary points, etc.— may have only one component (in which case a single number may represent the corresponding vector).
 a state of the algorithm, S (P, A, B, ... , u, w, z, ... ) at its particular iteration, may be characterized by a number of vectors (such as vectors P, A, E) and a number of additional parameters u, w, z, which may be onecomponent numbers or multicomponent vectors.
 u may be a slope of a line that connects a particular point (e.g., B) with some other point (e.g., A or P); z may be an additional scaling factor; and so on.
 some of the components of the vectors may be elided.
 a given point A may be uniquely identified by its X A , Z L components (or Y A , Z A , or X A , Y A ), so that the third component, carrying a redundant information may be omitted.
 a state of the algorithm may be represented with the difference of some vectors, S— (P, A— P, B—
 some of the vector components may be shared by some vectors.
 some or all of the components Z P , Z A , Z B may be the same (and may further coincide with the“global” parameter z of the state S of the algorithm (at its particular iteration).
 the processing device performing method 400 may load a first vector and a second vector, such that the first vector includes a plurality of first vector components identifying a first point in a cryptographic space and the second vector includes a plurality of second components identifying a second point in the cryptographic space.
 the processing device may then obtain a scaled first vector by modifying at least some of the plurality of first vector components so that the scaled first vector identifies the same first point in the cryptographic space
 the processing device may also obtain a scaled second vector by modifying at least some of the plurality of second vector components so that the scaled second vector identifies the same second point in the cryptographic space.
 Scaling of the first vector may be projective scaling and may include modifying at least some of the plurality of vector components so that the modified plurality of vector components identifies the same point in the cryptographic space (e.g., elliptic curve).
 modifying the plurality of vector components may include multiplying some or all vector components by an integer power of a random factor.
 modifying the plurality of vector components may include (i) multiplying a first vector component by a random factor R, multiplying a second vector component a square of the random factor, R 2 and/or multiplying a third vector component by a cube of the random factor, R 3 .
 Scaling of the first vector may also include updating auxiliary information, which, together with the first vector components and the second vectors components may identify a current arithmetic state of the ladder.
 the auxiliary information may identify correspondence between the first vector components and the first point in the cryptographic space (e.g., elliptic curve) and similarly identify correspondence between the second vector components and the second point in the cryptographic space.
 the updated auxiliary information may identify correspondence between the modified first and second vector components and the respective points in the cryptographic space.
 the auxiliary information may be stored in additional registers different from the registers used to store the first vector components and the second vector components.
 the auxiliary information may include the random factor R (for one or both vectors, if the respective random factors are different from each other), the running value Z (for one or both vectors) of the zcoordinate (e.g., the previous value of the zcoordinate multiplied by the random factor R), the X and/or Y coordinates of the base point P (possibly scaled with the running value Z or some other value), the slope of the line connecting the base point with the first and the second points in the cryptographic space, and so on.
 the random factor R for one or both vectors, if the respective random factors are different from each other
 the running value Z for one or both vectors of the zcoordinate
 the X and/or Y coordinates of the base point P possibly scaled with the running value Z or some other value
 the slope of the line connecting the base point with the first and the second points in the cryptographic space and so on.
 some of the components of the first vector and/or the second vector (e.g., Tor Y) components are elided from the respective vectors, some of the elided component s) may be stored in the auxiliary information.
 the processing device performing method 400 may projectively scale the first vector (at block 410), multiplying it by some random number. Projective scaling may modify the components of the first vector without changing the point in the cryptographic space identified by the vector components. In some implementations, both the first and the second vectors may be projectively scaled by multiplying the first and the second vectors by the same or different random numbers.
 Computations that are to be performed during various iterations of the cryptographic operation may depend on the value of a key bit k j (e.g., of the private key k ) that corresponds to the current iteration being executed.
 the key bit value may determine if the“double” arithmetic operation or the“doubleandadd” arithmetic operation is to be performed.
 the key bit value may determine whether the“double” operation is to be performed on the accumulator or the auxiliary vector.
 the processing device may determine that the key bit k j has a first key bit value (which may be 1 or 0).
 the method 400 may continue with identifying, responsive to determining that the key bit has the first key bit value, a first arithmetic operation to be performed on the scaled first vector and the (scaled) second vector (430).
 the first arithmetic operation may be an add operation (where the scaled first vector is added to the (scaled) second vector), a doubleandadd operation (where the (scaled) second vector is added to the a double of the scaled first vector), or some other operation defined by the specific algorithm implemented by the processing device.
 the processing device may perform (execute) the identified operation on the scaled first vector and the (scaled) second vector to obtain a third vector (430).
 the method 400 may continue with generate a random number, b (440) to determine where in a memory device the third number is to be stored.
 the random number b may be a one bit number, if there are two possible memory locations (registers) in the memory device where the third number may be stored. Alternatively, the random number b may be a multibit number if there are more than two possible memory locations where the third vector may be stored.
 the processing device may store the third vector in a first memory location, responsive to the random number having a first value (e g., 0 or 1), or in a second memory location, responsive to the random number having a second value (e.g. 1 or 0).
 the processing device may also perform additional arithmetic operations (successively or in parallel to the first arithmetic operation) on the scaled first vector and/or the (scaled) second vector and obtain additional outputs, e g., a fourth vector.
 additional arithmetic operations for example, if the first arithmetic operation to determine the third vector is the“add” operation of the Montgomery ladder, the additional operation may be the“double” operation to be performed on the scaled first or the (scaled) second vector to obtained the fourth vector.
 the fourth vector may be stored in the first memory location, responsive to the random number having the second value (e.g., 1 or 0), or in the second memory location, responsive to the random number having the first value (e.g., 0 or 1).
 the first arithmetic operation and the second arithmetic operation may be modular arithmetic operations.
 the processing device may read out the vectors stored in the first memory location and/or the second memory location and use these vectors as inputs for a second arithmetic operation.
 the second arithmetic operation may be identified based on the value of the key bit k j+i (which corresponds to the + 7th iteration). For example, responsive to determining that the key bit k j+1 has the first key bit value (e.g., 0 or 1), the processing device may identify that the second arithmetic operation is the same as the first arithmetic operation.
 the processing device may identify the second arithmetic operation as different from the first arithmetic operation.
 the first arithmetic operation may be an“add” operation
 the second operation arithmetic operation may be the“double” operation (or vice versa).
 the processing device may also access the value b used during the preceding iteration for output distribution and use it in the decisionmaking block 465.
 the processing device may select a first input and a second input for the second arithmetic operation based on the random number value b having the first value (e g., 0 or 1) or the second value (e g., 1 or 0).
 the first input may be the third vector stored in the first memory location and the second input may be the fourth vector stored in the second memory location (if b has the first value) (470).
 the first input may be the third vector stored in the second memory location and the second input may be the fourth vector stored in the first memory location (if b has the second value) (480).
 the processing device may perform the second arithmetic operation on the first input and the second input 4).
 the outcome of the second arithmetic operation is to remain the same regardless of how the outputs of the first arithmetic operation were stored at the end of the previous, yth, iteration.
 the ⁇ contingent loading of the inputs at the beginning of the / /tli iteration reverses h contingent storing of the outputs at the end of the yth iteration, while introducing randomization operations that make it more difficult for an adversary to correlate emissions from the processing device among various operations of the algorithm being performed. Accordingly, this makes it harder for the adversary to mount a successful sidechannel attack.
 Any arithmetic operations described in reference to FIGs 2A, 2B, 3A, 3B, and 4 may be modular arithmetic operations.
 FIG. 5 depicts a block diagram of an example computer system 500 operating in accordance with one or more aspects of the present disclosure.
 computer system 500 may represent the processing device 100, illustrated in FIG. 1.
 Example computer system 500 may be connected to other computer systems in a LAN, an intranet, an extranet, and/or the Internet.
 Computer system 500 may operate in the capacity of a server in a clientserver network environment.
 Computer system 500 may be a personal computer (PC), a settop box (STB), a server, a network router, switch or bridge, or any device capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that device.
 PC personal computer
 STB settop box
 server a server
 network router switch or bridge
 Example computer system 500 may include a processing device 502 (also referred to as a processor or CPU), a main memory 504 (e g., readonly memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM), etc.), a static memory 506 (e.g., flash memory, static random access memory (SRAM), etc.), and a secondary memory (e g., a data storage device 518), which may communicate with each other via a bus 530.
 a processing device 502 also referred to as a processor or CPU
 main memory 504 e g., readonly memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM), etc.
 DRAM dynamic random access memory
 SDRAM synchronous DRAM
 static memory 506 e.g., flash memory, static random access memory (SRAM), etc.
 secondary memory e.g., a data storage device 518
 Processing device 502 represents one or more generalpurpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, processing device 502 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 502 may also be one or more specialpurpose processing devices such as an application specific integrated circuit (ASIC), a field
 processing device 502 may be configured to execute instructions implementing method 400 of protecting cryptographic operations by intermediate randomization.
 Example computer system 500 may further comprise a network interface device 508, which may be communicatively coupled to a network 520.
 Example computer system 500 may further comprise a video display 510 (e g., a liquid crystal display (LCD), a touch screen, or a cathode ray tube (CRT)), an alphanumeric input device 512 (e.g., a keyboard), a cursor control device 514 (e.g., a mouse), and an acoustic signal generation device 516 (e g., a speaker).
 a video display 510 e g., a liquid crystal display (LCD), a touch screen, or a cathode ray tube (CRT)
 an alphanumeric input device 512 e.g., a keyboard
 a cursor control device 514 e.g., a mouse
 an acoustic signal generation device 516 e g., a speaker
 Data storage device 518 may include a computerreadable storage medium (or, more specifically, a nontransitory computerreadable storage medium) 528 on which is stored one or more sets of executable instructions 522.
 executable instructions 522 may comprise executable instructions implementing method 400 of protecting cryptographic operations by intermediate randomization.
 Executable instructions 522 may also reside, completely or at least partially, within main memory 504 and/or within processing device 502 during execution thereof by example computer system 500, main memory 504 and processing device 502 also constituting computer readable storage media. Executable instructions 522 may further be transmitted or received over a network via network interface device 508.
 the term“computerreadable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of operating instructions.
 the term“computer readable storage medium” shall also be taken to include any medium that is capable of storing or encoding a set of instructions for execution by the machine that cause the machine to perform any one or more of the methods described herein.
 the term“computerreadable storage medium” shall accordingly be taken to include, but not be limited to, solidstate memories, and optical and magnetic media.
 Examples of the present disclosure also relate to an apparatus for performing the methods described herein.
 This apparatus may be specially constructed for the required purposes, or it may be a general purpose computer system selectively programmed by a computer program stored in the computer system.
 a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including optical disks, CDROMs, and magneticoptical disks, readonly memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic disk storage media, optical storage media, flash memory devices, other type of machineaccessible storage media, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
Abstract
Aspects of the present disclosure involve a method and a system to support execution of the method to perform a cryptographic operation involving a first vector and a second vector, by projectively scaling the first vector, performing a first operation involving the scaled first vector and the second vector to obtain a third vector, generating a random number, storing the third vector in a first location, responsive to the random number having a first value, or in a second location, responsive to the random number having a second value, and performing a second operation involving a first input and a second input, wherein, based on the random number having the first value or the second value, the first input is the third vector stored in the first location or the second location and the second input is a fourth vector stored in the second location or the first location.
Description
PROTECTION OF CRYPTOGRAPHIC OPERATIONS BY
INTERMEDIATE RANDOMIZATION
RELATED APPLICATIONS
[0001] This application relates to U.S. Provisional Application No. 62/789,103 filed on January 7, 2019, the entire contents of which are incorporated herein by reference.
TECHNICAL FIELD
[0002] The disclosure pertains to cryptographic computing applications, more specifically to protection of cryptographic operations from sidechannel attacks.
BRIEF DESCRIPTION OF THE DRAWINGS
[0003] The present disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various implementations of the disclosure.
[0004] FIG. 1 is an exemplary block diagram of the components of a processing device capable of protecting cryptographic operations performed therein with intermediate
randomization, in accordance with one or more aspects of the present disclosure.
[0005] FIG. 2A illustrates an exemplary operation of the Montgomery ladder multiplication algorithm with intermediate randomization, in accordance with one or more aspects of the present disclosure.
[0006] FIG. 2B illustrates intermediate randomization operations, in accordance with one or more aspects of the present disclosure, that may be implemented to protect execution of the Montgomery ladder algorithm from sidechannel attacks.
[0007] FIG. 3A illustrates an exemplary operation of the Double Add ladder multiplication algorithm with intermediate randomization, in accordance with one or more aspects of the present disclosure.
[0008] FIG. 3B illustrates intermediate randomization operations, in accordance with one or more aspects of the present disclosure, that may be implemented to protect execution of the DoubleAdd ladder algorithm from sidechannel attacks.
[0009] FIG. 4 depicts a flow diagram of an illustrative example of method of protecting cryptographic operations by intermediate randomization, in accordance with one or more aspects of the present disclosure.
[0010] FIG. 5 depicts a block diagram of an example computer system operating in accordance with one or more aspects of the present disclosure.
DETAILED DESCRIPTION
[0011] Aspects of the present disclosure are directed to protection of arithmetic operations by intermediate randomizations that may be used in applications employing cryptographic algorithms, for safeguarding inputs and outputs of cryptographic computations against side channel attacks.
[0012] In publickey cryptography systems, a processing device may have various components/modules used for cryptographic operations on input messages. Input messages used in such operations are often large binary numbers whose processing is sometimes performed on lowbit microprocessors, such as smart card readers, wireless sensor nodes, and so on. Examples of cryptographic operations include, but are not limited to operations involving RivestShamir Adelman (RSA) and DiffieHellman (DH) keys, digital signature algorithms (DSA) used to authenticate messages transmitted between nodes of the publickey cryptography system, various elliptic curve cryptography schemes, etc. Cryptographic algorithms often involve modular arithmetic operations with modulus N, in which the set of all integers Z is wrapped around a circle of length N (the set Z_{N}), SO that any two numbers that differ by N (or any other integer of N) are treated as the same number As a result, a modular (modulo N) multiplication operation, AB mod N , may produce the same result for many more different sets of the multiplicand A and the multiplier B than for conventional arithmetic operations. For example, if it is known that a product of conventional multiplication of two positive integers is 6, it may then be determined that the two factors (the multiplicand and the multiplier, or vice versa) must necessarily be 2 and 3 (excluding a trivial product of 1 and the number itself, 6). In modular arithmetic, however, this is no longer the case. For example, if JV=12, the same product A B mod 12= 6 may result from the pairs of factors 2 and 3, 3 and 6, 5 and 6, 6 and 7, 6 and 9, and so on. This happens because 6,
18, 30, 42, 54, etc., represent the same number modulo N=12 because all these numbers differ from each other by an integer of N (in other words, when any of these integers is divided by N, the remainder of the division is the same, i.e. 6). Cryptographic applications exploit the fact that extracting the value of the private key A from a public key P = B^{A} mod N may be a prohibitively difficult operation even when B is known, provided that A and N are sufficiently large. Similarly, a digital signature can be generated using a modular exponentiation technique. For example, when such algorithm is used as the basis of publickey cryptography, the signature S is computed in the form of the equation, S =K^{d} mod P, where is a public modulus, and c is a private exponent.
[0013] Many cryptographic applications employ elliptic curve multiplication which may involve operations with points (xj/) on an elliptic curve. For example, an elliptic curve f(x,y)=0 may be a Weierstrass curve where f(xj ) is a third degree polynomial in x and a second degree
polynomial in y. A cryptographic operation on an elliptic curve may involve selecting a base point P (which may be a public key) and multiplying P by an integer number k (which may be a private key): Q kP. The elliptic curve multiplication may be defined via a set of specific rules for point doubling, 2P, point addition (P1+P2), zero (infinity) point, and so on. The strength of the elliptic curve cryptography is rooted in the fact that for large values of k, the resulting point Q can be practically anywhere on the elliptic curve. As a result, the inverse operation to determine an unknown value of the private key k from a known value Q (referred to as the discrete logarithm of Q to base P: k— log_{P} Q\ can be a prohibitively difficult computational operation.
[0014] To avoid implementing the multiplication Q kP via k loop iterations, a number of laddertype algorithms may be used which require a significantly reduced number of loop iterations (generally, about log_{2} k iterations). For example, in the Montgomery ladder algorithm, two registers, e.g., R(0) and R(l), may be used to store the accumulator value A and an auxiliary value B, with one doubling and one addition operation performed at each iteration. Prior to the first iteration, the accumulator value may be set to zero, A [R(0)] 0, and the auxiliary value B may be set to P: B [R(l)] ^{~}P. In each iteration j, starting from the most significant nonzero bit, the Montgomery ladder algorithm adds the auxiliary value B to the accumulator value and doubles the auxiliary value B, if accumulator value A if the respective key bit is set, k_{j} = 1. If the key bit is zero, k_{j} =0, the algorithm adds the accumulator value A to the auxiliary value and doubles the accumulator value A:
A<A+B, B 2B, if k_{j} =1;
A<2A, B A+B, if k_{j} =0.
After the final iteration, the algorithm returns the accumulator value A as the result of the multiplication Q=kP.
[0015] For example, if the key is k= 41, represented with six bits, A=(101001), the multiplication Q=kP may give rise to six iterations summarized in the following table.
[0016] Because the iterations of the Montgomery ladder algorithm keep the difference between A and B invariant, this allows to perform elliptic curve multiplications using only one of the coordinates (e.g., x) and elide the other coordinate (e.g., ).
[0017] As another example, in the DoubleAdd ladder algorithm, the second register R(l) may store the same auxiliary value P across all loop iterations. The first register may store the accumulator value A that is doubled at each new iteration. If the key bit is set, k_{j} =1, the constant value B stored in the second register is also added to the new accumulator value:
A<2A+B, if k_{j} =1;
A r2A, if k _{j} =0.
After the final iteration, the algorithm returns the accumulator value A, which represents the result of the multiplication Q= P. For the same example of k= 41, the Double Add algorithm gives rise to following six iterations.
[0018] Compared to the Double Add algorithm, the Montgomery ladder algorithm has an advantage that the doubling and addition operations at each iteration (ladder step) can be performed independently, e.g., by two separate parallel processors.
[0019] As another example, in the Joye DoubleAdd ladder algorithm, the iterations may be performed in the reverse order, from right to left, starting from the least significant bit. The register R(0) may store the accumulator value A (initially set to zero) and the register R(l) may
store the auxiliary value (initially set to P). If the key bit is set, k_{}} =1, the doubleandadd operation is performed on the value A, but if the key bit is clear, the doubleandadd operation is performed on the value B :
A 2A+B, B < B, if kj =1;
ADA, B  2B+A, \ik_{j} =Q.
After the final iteration, the algorithm returns the accumulator value A, which represents the result of the multiplication Q=kP. For the example of k= 42, represented with six bits,
£=(101010), the multiplication Q=kP may give rise to six iterations summarized in the following table the Joye Double Add algorithm gives rise to following six iterations (to be performed from the bottom up):
[0020] In some implementations, various other algorithms may be used, such as righttoleft binary method, conjugate coZ addition method, lefttoright scalar multiplication, the Gounday JoyeMiyaji ladder, and so on. In some algorithms, three (or more) registers may be used, with one register to store an accumulator value, and two (or more) registers to store two (or more) auxiliary values.
[0021] Even though solving a discrete logarithm problem may be a prohibitively difficult task, elliptic curve cryptography operations may be vulnerable to sidechannel attacks. A side channel attack may be performed by monitoring emissions (signals) produced by electronic circuits of the target’s (victim’s) computer. Such signals may be acoustic, electric, magnetic, optical, thermal, and so on. By recording emissions, a hardware trojan and/or malicious software may be capable of correlating specific processor (and/or memory) activity with operations carried out by the processor. For example, a trojan may be capable of identifying that an elliptic
curve cryptographic application has m iterations. The attacker employing trojan may infer from this that the private key number is such that k < 2^{m}— 1 (or to make even more definitive prediction that the private key resides within the interval 2^{m~1} £ k £ 2^{m}— 1, if the algorithm starts with the iteration that corresponds to the most significant nonzero bit of the key). Within each iteration, the trojan may further identify a difference between emissions corresponding to a doubling operation and emissions corresponding to an addition operation. This may be sufficient for the trojan to determine the entire sequence of the bits representing the private key number k.
[0022] Aspects of the present disclosure address this and other shortcomings of the conventional cryptographic operations by implementing intermediate randomizations during iterations of the computational algorithm being used. For example, a processing device performing randomization protection may implement random projective scaling of various numbers encountered during various iterations so that the digital representation of these numbers is modified without modifying the objects (e.g., respective points on elliptic curves) that these numbers identify. Additionally, the processing device may perform randomized storage of intermediate outputs (such as the values of the accumulator and the auxiliary value) and control the subsequent read/load operations so that the correct dataflow is preserved. Such randomized protective measures improve the security of cryptographic operations by making it more difficult for sidechannel attackers to correlate the signals emitted by the processing device during computation.
[0023] FIG. 1 is an exemplary block diagram of the components of a processing device 100 capable of protecting cryptographic operations performed therein with intermediate
randomization, in accordance with one or more aspects of the present disclosure.“Processing device” refers to a device capable of executing instructions encoding arithmetic, logical, or I/O operations. In one illustrative example, a processing device may follow Von Neumann architectural model and may include an arithmetic logic unit (ALU), a control unit, and a plurality of registers. In a further aspect, a processing device may be a single core processor which is typically capable of executing one instruction at a time (or process a single pipeline of instructions), or a multicore processor which may simultaneously execute multiple instructions. In another aspect, a processing device may be implemented as a single integrated circuit, two or more integrated circuits, or may be a component of a multichip module.“Memory device” herein refers to a volatile or nonvolatile memory, such as randomaccess memory (RAM), read only memory (ROM), electrically erasable programmable readonly memory (EEPROM), flash memory, flipflop memory, or any other device capable of storing data.
[0024] As shown in FIG. 1, the processing device 100 may include, among other components, an ALU 110. The ALU 110 may be any digital electronic circuit capable of
performing arithmetic and bitwise operations on integer binary numbers. The ALU 110 may be a component part of a bigger computing device, such as a central processing unit (CPU), which in turn may be a part of any server, desktop, laptop, tablet, phone, or any other type of computing device. The computing device may include multiple ALUs 110 and CPUs. The ALU 110 may receive input in the form of data operands from one or more memory devices, such as the memory devices 120, 130, 150, and 160. The ALU 110 may also receive code/instructions input such as the algorithm instructions 140. The algorithm instructions 140 may identify the computations algorithm to be implemented (e.g., Montgomery ladder, the Double Add ladder, etc.) and indicate the nature and order of operations to be performed on input data operands. The ALU 110 may further receive randomization instructions 142. The randomization instructions 142 may indicate how various randomization measures are to be performed (e.g., random projective scaling, random storage of intermediate outputs, readout procedures for retrieving randomly stored intermediate outputs, and so on). The algorithm instructions 140 and/or the randomization instructions 142 may also indicate, what memory devices are to store the output of the ALU operations, and so on. In some implementations, the algorithm instructions 140 and the randomization instructions 142 may be combined in a single set of instructions. In some implementations, the algorithm instructions 140 and the randomization instructions 142 may be stored separately on separate memory devices.
[0025] In one exemplary implementation, the numbers A and B may be stored in a first memory device 120, which may be a RAM (e.g. SRAM or DRAM) device in one
implementation. In other implementations, the first memory device 120 may be a flash memory device (NAND, NOR, 3DXP, or other type of flash memory) or any other type of memory. In one implementation, the first memory device 120 may have one input/output port and may be capable of receiving (via a write operation) or providing (via a read operation) a single operand to the ALU 110 per clock cycle. In such implementations, to perform both a read operation and a write operation involving the first memory device 120, a minimum of two clock cycles may be required.
[0026] A second memory device 130 may be a scratchpad memory device, in one implementation. The scratchpad may be any type of a highspeed memory circuit that may be used for temporary storage of data capable of being retrieved rapidly. To facilitate rapid exchange of data with the ALU 110, the second memory device 130 may be equipped with multiple ports, e.g., a write port 132 and a read port 134, in one implementation. Each port may facilitate one operation per clock cycle.
[0027] The numbers A and B may be may be represented by n* ILbits grouped into n words with TUbits in each word. The size of the word W may be determined by microarchitectural
properties of a processor performing multiplication, e.g., by an arithmetic logic unit (ALU) of the processor. For example, in one implementation, a number may be represented with n= 8 words of W= 32 bits in each word, for the total of 256 bits in the number. Per each clock cycle, the ALU 110 may load one word from the second memory device 130 (via a read port 134) and may output one word to the second memory device 130 (via a write port 132). In one implementation, the second memory device 130 may be used for storing accumulators during execution of various arithmetic operations, such as addition, subtraction, and multiplication, including Montgomery reduction.
[0028] In some implementations, the processing device 100 may have an additional memory device, which may be a flipflop memory device 150. The flipflop memory device 150 may be any electronic circuit having stable states to store binary data, which may be changed by appropriate input signals. The flipflop memory device 150 may be used for storing carries during execution of addition, subtraction, and/or multiplication, in some implementations. In some implementations, the processing device 100 may optionally have a third memory device 160, which may be any aforementioned type of memory device. The third memory device 160 may be used to store results of intermediate steps of arithmetic operations and/or final results of such operations, in one implementaion. In some implementations, the third memory device 160 may be absent, and the intermediate/fmal results may be stored in the second memory device 130 (e g., the scratchpad memory) or writen to the first memory device 120, in one implementation.
In some implementations, the first memory device 120 and/or the third memory device 160 may store randomization instructions 142 (and/or algorithm instructions 140, not shown) for the ALU 110, as depicted in FIG. 1.
[0029] In some implementations, the accumulator A may be stored in the second memory device 130 to allow the fastest write/read access. In some implementations, the auxiliary number B may be stored in the flipflop memory device 150 and may be overwritten after every iteration of the algorithm (e.g., as in the case of the Montgomery ladder) or remain fixed (as in the case of the DoubleAdd ladder). In some implementations, random numbers (to indicate how randomization operations are to be performed) may be stored in the flipflop memory and may remain there until the next read operation. In some implementations, the successive bits of the key number k may be stored in the flipflop memory and may be overwritten at the beginning of the next iteration. In some implementations, the bits of the key number k may be stored in the second memory device. In some implementations, any or all of the accumulator, the auxiliary number, the random numbers, an the key number k may be stored in the first memory device.
[0030] FIG. 2A illustrates an exemplary operation 200 of the Montgomery ladder multiplication algorithm with intermediate randomization, in accordance with one or more aspects
of the present disclosure. The exemplary operation 200 may be performed by one or more processing devices 100, in some implementations. The input of the exemplary operation 200 may include a number k and a number P. In some implementation, the number k may be a private key represented by a sequence of bits ( k_{0}k_{1}k_{2}k_{3}...). The number P may be a public number. The number P may represent a point on an elliptic curve that may be identified by affine coordinates (x,y). In some implementations, the point (x,y) on the elliptic curve may be identified with projective (e.g., Jacobian) coordinates whose number exceeds two. For example, the number P specifying the point (v,>') on the elliptic curve may have three components (C,U,Z) with the corresponding affine coordinate determined as ( x,y ) = (X/Z^{2}, Y / Z ^{3}). The value Z may be chosen to be an arbitrary (nonzero) number. This may allow projective scaling of the projective coordinates with an arbitrary value Z at various stages of the algorithm that uses intermediate randomization. For example, at the start of the algorithm, the point P may be represented as P— (x, y, 1), but at a later stage of the algorithm the projective coordinates may be scaled by an arbitrary number Z such that ( x , y, 1) ® (xZ^{2}, yZ^{3}, Z) . In other implementations based on other geometric curves, a different projective scaling may be used.
[0031] As described above, the multiplication Q kP may be performed using a number of iterations determined by the number of bits in the binary representation of k. The iterations may be performed by a processing device (e.g., ALU 110) having access to two (or more) memory registers, e.g., registers R(0) and R(l). In some implementations, the registers R(0) and R(l) may be separate physical memory devices. In some implementations, the registers R(0) and R(l) may be virtual registers implemented in the first memory device 120, the second memory device 130, the third memory device 160, the flipflop memory device 150, and so on. In some
implementations, the registers R(0) and R(l) may be some memory addresses accessible to the processing device. One of the registers, e.g., R(0), may be used to store the accumulator value A (which may initially be set to zero). The other register, e.g. R(l), may be used to store the auxiliary value B, such as B=A+P in the Montgomery ladder algorithm. In some
implementations, register R(l) can store a base point P or some other value. In some
implementations, additional registers, R(2), R(3)..., may store some additional values, as may be required or optional for a given algorithm being implemented. In some implementations of the Montgomery ladder algorithm with intermediate randomization, the values stored in the two registers may be swapped (shuffled), so that the register R(l) may be to store the accumulator value A whereas the register R(0) is to store the auxiliary value B. The registers R(0) and R(l) may include multiple subregisters (virtual subregisters, memory addresses, etc ), with each of the multiple subregisters storing one of the affine (xy) or projective (C,T,Z) coordinates corresponding to the respective point (A or B) on the elliptic curve. Herein, when it is referred
(in a singular) to reading/storing/swapping/etc. of a value (number) A and/or a value (number) B, it shall be implied that the respective operations may be performed on multiple (e.g., all) components of the corresponding number(s). In case of elliptic curve computations, it shall further be implied, when referred to an add operation (e.g., A+B) or a double operation (e.g., 2A or 2B), that a set of specific“add” or“double” elliptic curve instructions may be followed to determine the coordinates (e.g., Jacobian or affine) of the output points from the coordinates of the input points. Such instructions may be standard elliptic curve instructions where the coordinates of the result of an“add” or a“double” operation may differ from a simple sum or a double of the coordinates of the corresponding input points.
[0032] Prior to a start of a /th iteration of the algorithm, the processing device may access the value of the bit k_{j} and perform the double and add operations on the values stored in R(0) and R(l), as described above. For example, assuming for concreteness that k_{j} = 1, the processing device may read the values stored in both registers (202 and 204), as indicated schematically by thin solid lines in FIG. 2A, and determine the sum A+B (210) and the double of the auxiliary value stored in R(l) (212), as indicated schematically by the thick solid line in FIG. 2A. Rather than directly storing in R(0) the number A+B as the new accumulator value and the number 2B in R(l) as the new auxiliary value, the processing device performing the cryptographic operation 200 may implement additional steps to protect the operation from side channel attacks by using intermediate randomization.
[0033] For example, the processing device may use a random number generator to generate a random number. The random number may be a onebit number b_{j} , with the subscript j indicating the iteration of the loop. In some implementations, the value of the random bit b_{j} = 1 may indicate that a swapping of the results of the double/add computation is to be performed, whereas the value of the random bit b_{j} = 0 may indicate to the processing device that no swapping is to be done. More specifically, if the processing device determines that b_{j}— 0, the processing device may store the accumulator value A+B in register R(0) (220) and store the new auxiliary value 2B in register R(l) (222). If, however, the processing device determines that b_{j}— 1, the processing device may store the accumulator value A+B in register R(l) (224) and store the new auxiliary value 2B in register R(0) (226). This randomization of outputs makes it harder for an adversary attempting a sidechannel attack to determine reliably the value of the key bit k_{j}. This is because the storage of the outputs A+B and 2B in randomly chosen registers R(0) and R(l) makes it harder for the adversary to correlate emissions (e.g., power consumption) with the outcome of the computational operations 210 and 212.
[0034] Prior to starting a computational double/add block of the next iteration k_{j+1}, the processing device may perform additional randomization of the values stored in R(0) and R(l) by performing random projective scaling. For example, the value stored in register R(0) may be projectively scaled with some random number Z_{R(O)} that may be produced by the random number generator, as schematically shown by blocks 230 and 236. Similarly, the value stored in R(l) may be projectively scaled with a random number Z_{R(I)} that may be produced by the random number generator, as schematically shown by blocks 232 and 234. The random numbers Z_{R(O}J and Z_{R(I)} may be different in some implementations. In other implementations, the numbers Z_{R(O}J and Z_{R(I)} may be the same, so that the random number generator has to be invoked once. The numbers Z_{R(O)} and Z ,·_{/}, may be short numbers, e g., singlewordlong numbers, so that the additional computations required to perform projective scaling in blocks 230, 232, 234, and 236 are minimized while still serving the purpose of randomizing the data flow of the operation 200. In some implementations, only one of the numbers Z_{R(O)} and Z_{RQ)} may be generated and only one of the values stored in R(0) or R(l) may be projectively scaled. In some implementations, the decision which value is to be scaled may be based on generation of an additional random number C_{j}, which may be independent from the random bit b_{j} that controls the swapping. In such implementations, the random number generator may generate a random (singleword or multi word) number Z and a random number C_{j} to determine to which of the two registers R(0) or R(l) the random number Z is to be applied.
[0035] At the beginning of the next, j+ 7th, iteration of the algorithm, the processing device may retrieve the current value of the accumulator value and the auxiliary value stored in R(0) and R(l). The processing device may have to account for a possibility that the shuffle operation during the previous /th iteration may have resulted the accumulator value being stored in R(l) and the auxiliary value being stored in R(0). To preserve the correct dataflow, the processing device may access the value of the random number b_{j} and load the numbers from R(0) and R(l) in a manner that depends on whether b_{j} = 0 or b_{j} = 1. For example, assuming for the sake of illustration, that k_{j+1} = 0, the processing device may determine that during the previous iteration of the algorithm the random number had the value b_{j} = 0. The processing device may compute the value 2R(0) and identify it is the new value of the accumulator 240 (that is equal to 2A+2B, in the current illustration), as indicated by the thick dashed line in FIG. 2A. The processing device may further compute the value R(0)+R(1) (that is equal to A+3B) and determine it to be the new auxiliary number 242, as illustrated by the thin dashed lines in FIG. 2A. If, on the other hand, the processing device assesses that during the previous iteration of the algorithm the random number had the value b_{j} = 1, the processing device may compute the value 2R(1) and
identify it as the new accumulator 240, as indicated by the thick solid line in FIG. 2A. Similarly to the scenario where b_{j} = 0, the processing device may compute the value R(0)+R(1) and identify this value as the new auxiliary value 242. Because in the Montgomery ladder algorithm the value R(0)+R(1) is computed at each iteration independent of the value of the random number b_{j}, R(0)+R(1) may be computed before (or in parallel) with determination of the value b_{j}. Similarly, in a situation where k_{j+1} = 1, the processing device may compute the value R(0)+R(1) and identify this value as the new accumulator value regardless of the value of the random bit b_{j}. On the other hand, the new auxiliary value will be dependent on the value of b_{j} and may be equal to 2R(1) for the unshuffled case of b_{j—} 1, and equal to 2R(0) for the shuffled case of b_{j} = 0.
[0036] The determined values of the accumulator 240 and the auxiliary number 242 may then be stored in a manner described above in relation to the yth iteration. Specifically, the processing device may use the random number generator to generate a new random number b_{j+1} and determine, based on b_{j+1}, how the accumulator 240 and the auxiliary number 242 are to be stored in R(0) and R(l). If b_{j+1} = 0 (no swapping), the accumulator 240 may be stored in register R(0) and the auxiliary number 242 may be stored in register R(l). If b_{j+1} = 1
(swapping), the accumulator 240 may be stored in register R(l) and the auxiliary number 242 may be stored in register R(0).
[0037] The operations performed by the processing device during the j+1 th iteration may be summarized as follows, in one exemplary implementation, wherein the notation R_{j}( ) stands for the content of the n th register after the yth iteration of the Montgomery ladder algorithm:
[0038] FIG. 2B illustrates these intermediate randomization operations 250, in accordance with one or more aspects of the present disclosure, which may be implemented to protect execution of the Montgomery ladder algorithm from sidechannel attacks. The operations 250, as illustrated in FIG. 2B may include: adjustment of the read operations to compensate for the randomization (swapping) that may have been performed at the end of the previous yth iteration; selection of a correct input register for the“double” operation of the +7th iteration; selection of correct registers to store the output values of the +7th iteration based on the key bit value k_{j+1}; and conditional swapping of the output values of the y+7th iteration based on the value of a random b_{j+1}. Also shown in FIG. 2B is the projective scaling performed at the end of the previous yth iteration. It shall be noted, however, that in other possible implementations
projective scaling may be performed in a different order than that shown in FIG. 2B, since projective scaling does not change the location of the corresponding point(s) on the elliptic curve. For example, projective scaling may be performed after the input values are read from the registers R(0) and R(l) during the 7+ 7th iteration but before the“double” and/or“add” operations are performed. As another example, projective scaling may be implemented after the “double” and/or“add” operations are performed (but prior to storing the output values in the registers), and so on. Additional projective scaling may be performed at the end of all iterations of the algorithm.
[0039] FIG. 3A illustrates an exemplary operation 300 of the DoubleAdd ladder multiplication algorithm with intermediate randomization, in accordance with one or more aspects of the present disclosure. The exemplary operation 300 may be performed by one or more processing devices 100, in some implementations. Similar to the Montgomery ladder algorithm of FIG. 2A, the input of the exemplary DoubleAdd ladder operation 300 may include a number P and a number k , which may be a private key represented by a sequence of bits k = (k_{0}k_{1}k_{2}k_{3}
The number P may be a public number. The number P may represent a point on an elliptic curve that may be identified by affine coordinates (x,y) and/or a set of projective (e g., Jacobian) coordinates. Everything described above in relation to representation of the number P in the Montgomery ladder algorithms shall be understood to apply to the Double Add ladder algorithm as well.
[0040] The iterations of the DoubleAdd ladder algorithm may be performed by the processing device (e.g., ALU 110) having access to two (or more) memory registers, e.g., registers R(0) and R(l), which may be similar (in implementation and function) to the registers R(0) and R(l) described in relation to the Montgomery ladder algorithm. One of the registers, e.g., R(0), may be used to store the accumulator value A (which may initially be set to zero). The other register, e.g. R(l), may be used to store an auxiliary value, which in the Double Add algorithm may be a value, such as the value of base point P, that is to remain fixed for all iterations of the algorithm. However, in implementations of the DoubleAdd ladder algorithm with intermediate randomization, as disclosed herein, the values stored in the two registers may be shuffled, so that the register R(l) may, at times, store the accumulator value A whereas the register R(0) may store the fixed auxiliary value P. Furthermore, the auxiliary value P (and/or the accumulator value) may be projectively scaled at various stages of the Double Add algorithm.
As a result of projective scaling, the projective coordinates (e.g., X, Y, Z) representing the value P (or, similarly, the accumulator value) may be changed provided that they correspond to the unchanged set of the affine coordinates (x,y) on the elliptic curve. As in the case of the
Montgomery ladder, various multiplication (e.g., doubling, scaling) or addition operations shall
be understood to refer to multiple components (e.g., various projective and/or affine coordinates) of the value P (and the accumulator value), if applicable. In some implementations, projective coordinates may have 4 or more components (e.g. X, Y, Z, W, ...), with the additional coordinates describing, for example, a slope of a line connecting the point identified by the coordinates to some reference point (e.g., the base point P), or other values, as may be prescribed by the specific algorithm being implemented.
[0041] Prior to a start of a /th iteration of the algorithm, the processing device performing exemplary cryptographic operation 300 may access the value of the bit k_{j} and perform the double or doubleandadd operation on the values stored in one of R(0) or R(l). For example, assuming for concreteness that k_{j} = 1, the processing device may read the values stored in both registers, as indicated schematically by solid lines in FIG. 3A, and determine the sum 2A+P (310) while keeping the value P stored in R(l) unchanged (312). Rather than storing the number 2A+P as the new accumulator value, the processing device performing the cryptographic operation 300 may implement additional steps to protect the operation from sidechannel attacks by using intermediate randomization.
[0042] For example, the processing device may use a random number generator to generate a random number (e.g. a onebit) number b_{j}. In some implementations, the value of the random bit b_{j} = 1 may indicate that a shuffle of the results of the computations 310 and 312 is to be performed, while the value b_{j} = 0 may indicate no swapping. More specifically, if the processing device determines that b_{j} = 0, the processing device may store the accumulator value 2A+P in register R(0) (320) and keep the fixed value P in register R(l) (322). If, however, the processing device determines that b_{j} = 1, the processing device may store the accumulator value 2A+P in register R(l) (324) and store the new fixed value P in register R(0) (326).
[0043] Prior to starting the next iteration identified by the key bit k_{j+1}, the processing device may perform additional randomization of the values stored in R(0) and R(l) by projective scaling using random numbers Z_{R(O}J and Z,_{\}v_{/} , as shown by blocks 330, 332, 334, and 336, which may be performed similarly to blocks 230, 232, 234, and 236 of the Montgomery ladder algorithm. At the beginning of the / /th iteration of the DoubleAdd ladder algorithm, the processing device may retrieve the current value of the accumulator and the value P stored in R(0) and R(l). The processing device may have to account for a possibility that the shuffle operation during the previous yth iteration may have resulted in the accumulator value being stored in R(l) and the value P being stored in R(0). The processing device may access the value of the random number b_{j} and load the values from R(0) and R(l) differently depending on whether b_{j} 0 or b_{j} 1. For example, if k_{j+1} 0, the processing device may determine that
during the previous yth iteration of the algorithm the random number had the value b_{j} = 0, so that the accumulator value is currently being stored in R(0). The processing device may therefor compute the value 2R(0) and identify it is the new value of the accumulator 340 equal to 2(2A+P), as indicated by the left dashed line at the bottom of FIG. 3A. The processing device may identify the value stored in R(l) as the value P, as illustrated by the right dashed line at the bottom of FIG. 3A. If, on the other hand, the processing device determines that during the previous yth iteration of the algorithm the random number had the value b_{j} = 1, the processing device may compute the value 2R(1) and identify it as the new accumulator 340, as indicated by the left solid line at the bottom of FIG. 3A. The processing device may also identify the value stored in R(0) as the value P, as illustrated by the right solid line leading to block 342 in FIG.
3A. In an instance where the new key bit k_{j+1} = 1, the processing device may compute the value 2R(0)+R(1) and identify it as the new accumulator value if b_{j}— 0, or compute 2R(1)+R(0) and identify it as the new accumulator value if b_{j} = 1.
[0044] The determined values of the accumulator 340 and the auxiliary number 342 may be stored using conditional swapping (shuffling), as described above. The processing device may deploy the random number generator to generate a new random number b_{j+1} and determine, based on b_{j+1}, how the accumulator 340 and the number P 342 are to be stored in R(0) and R(l). If b_{j+1} = 0 (no swapping), the accumulator 340 may be stored in register R(0) and the number P 342 may be stored in register R(l). If b_{j+1} = 1 (swapping), the accumulator 340 may be stored in register R(l) and the number P 342 may be stored in register R(0).
[0045] The operations performed by the processing device during the y+7th iteration may be summarized as follows, in one exemplary implementation, wherein the notation R_{j}(n) stands for the content of the n th register after the yth iteration of the Double Add ladder algorithm:
R_{j+1}(b_{j+1} XOR l) < R_{j} b_{j} XOR l).
[0046] FIG. 3B illustrates these intermediate randomization operations 350, in accordance with one or more aspects of the present disclosure, which may be implemented to protect execution of the Double Add ladder algorithm from sidechannel attacks. These operations, as illustrated in FIG. 3B may include: adjustment of the read operations to compensate for the randomization (swapping) that may have been performed at the end of the previous yth iteration; determination whether the“add” operation is to be performed in addition to the“double” operation based on the value k_{j+1}; and conditional swapping of the outputs based on the value of the random b_{j+1}. Also shown in FIG. 3B is the projective scaling operation, which may be
performed at the end of the previous /th iteration (as indicated), or after the input values are read from the registers R(0) and R(l) during the / / th iteration but before the“double” or“double add” operations are performed, or after the“double” or“doubleadd” operations are performed, and so on.
[0047] Protection of cryptographic operations by intermediate randomization may be performed for other multiplication algorithms in a manner similar to the one described in relation to the Montgomery ladder and the Double Add ladder shown in FIGs 2AB and 3A3B, respectively. For example, the operations performed by the processing device during j+ 7th iteration of the Joye Double Add ladder may be summarized as follows, in one exemplary implementation (wherein iterations are performed in the righttoleft order, so that k_{0} is the least significant bit):
R_{j+i}(b_{j+i} XOR k_{j+1} XOR l) < 2 R_{j}(b_{j} XOR k_{j+1} XOR l) + R_{j}{b_{j} XOR k_{j+1} XOR 0),
R_{j+1}(b_{j+1} XOR k_{j+1} XOR 0) < R_{j}(b_{j} XOR k_{j+1} XOR O).
As described earlier, in each iteration of the Joye Double Add algorithm, one of the registers retains its stored value while the other register stores the result of the Double and Add operation, depending on the current bit value k_{j+1}. The random bit value b_{j} of the previous iteration controls which input value is stored in which register prior to the j+lth iteration, while the random bit value b_{j+1} indicates where the output value are to be stored.
[0048] In some implementations of the cryptographic ladder algorithms, more than two memory registers R(0), R(l),... R(N1) may store N intermediate values A(0), A(l), A(2), ... that may be used in successive iterations of these algorithms. The protection by intermediate randomization may be used in Nvalue algorithms similarly to the Montgomery ladder and the DoubleAdd ladder algorithms described above. For example, at the end of an iteration of an N value algorithm, after the processing device has computed the N values A(i), the processing device may depart from a standard storing procedure, e.g., where A(i) value is stored in the register R(i). Instead, in one implementation, the processing device may generate a random number s, which may be a number between 0 and N, and assign the value A(l) to the register R(.v). Next, the processing device may generate another random number /, which may be a number between 0 and N, but excluding s, and store the value A(2) in the register R(t), and so on. The random numbers s, t... may be multi bit numbers represented by log_{2} N bits (or an integer number of bits not less than log_{2} N, if N is not a power of 2). Other procedures of randomly distributing N values A(0), A(l), A(2), ... to N registers R(0), R(l),... R(N1) may, alternatively, be implemented. At the beginning of the next iteration of the algorithm, the processing device
may determine what output distribution procedure was implemented during the preceding iteration (e g., the values of the random numbers s, /...) and what registers are currently storing the values A(0), A(l), A(2), ... , and retrieve these values therefrom. In the meantime, e.g., between storing the values A(0), A(l), A(2), ... and retrieving them, the processing device may perform projective scaling of the values A(0), A(l), A(2), ..., using random multipliers RR(0), RR(1), RR(2), ... , as described above in relation to the Montgomery and the DoubleAdd ladder algorithms. Some or all of the random multipliers may be the same. The projective scaling randomization may alternatively (or additionally) be performed at any other time during execution of an algorithm iteration.
[0049] The randomizations— random projective scaling and random distribution of the intermediate outputs— may be performed during each iteration of the algorithm, in some implementations. In some implementations, the randomizations may be performed in a fixed order for each iteration, e g., the random projective scaling may be performed at the beginning of each iteration before the registers are read out, or after the computations of the iteration are completed but before the outputs are stored. In some implementations, the order of
randomizations may be predetermined before the algorithm is applied to a specific
multiplication task. For example, it may be predetermined that random projective scaling is to be performed at the beginning of iterations 0, 4, 6, and prior to storing outputs in iterations 1, 2,
3, 5. In some implementations, to make sideattacks more difficult, the exact instances of randomizations may themselves be determined randomly. For example, prior to a particular iteration of the algorithm, the random number generator may indicate whether an output randomization is to be performed during the iteration. Similarly, the random number generator may indicate whether the projective scaling randomization is to be performed during the iteration. The two determinations may be independent from each other. The random number generator may also indicate where exactly, within the iteration, the projective scaling
randomization is to be performed.
[0050] FIG. 4 depicts a flow diagram of an illustrative example of method 400 of protecting cryptographic operations by intermediate randomization, in accordance with one or more aspects of the present disclosure. Method 400 and/or each of its individual functions, routines, subroutines, or operations may be performed by one or more processing units of the computing system implementing the methods, e.g., a processor containing the ALU 110. In certain
implementations, method 400 may be performed by a single processing thread. Alternatively, method 400 may be performed by two or more processing threads, each thread executing one or more individual functions, routines, subroutines, or operations of the method. In an illustrative example, the processing threads implementing method 400 may be synchronized (e.g., using
semaphores, critical sections, and/or other thread synchronization mechanisms). Alternatively, the processing threads implementing method 400 may be executed asynchronously with respect to each other. Various blocks of the method 400 may be performed in a different order compared to the order shown in FIG. 6. Some blocks may be performed concurrently with other blocks. Some blocks may be optional.
[0051] The method 400 may be implemented by the processor/ ALU performing a cryptographic operation, which may involve a public key number and a private key number, two private key numbers, and so on. The cryptographic operation may be a part of a larger computational operation involving multiple private key numbers and/or multiple public key numbers. The cryptographic operation may involve points in a cryptographic space. The cryptographic space may be a space of points belonging to an elliptic curve or any other object (a line, a surface, a volume, etc.) for which rules that specify how doubling and addition operations are to be performed. A point in the cryptographic space may be identified by a vector having a plurality of vector components. For example, in case where the cryptographic space is a line (e g., an elliptic curve), a base point may be identified by vector components that are affine coordinates P = (x_{P},y_{P}) or by projective coordinates P = (X_{P}, Y_{P},Z_{P} ) of the base point P. Similarly, working points, e.g. A, B, etc., at each iteration of the algorithm being implemented, may be identified by vector components that may be the corresponding affine coordinates A = (^{C}L' L)_{>} B = (xg_{>} y B) or the projective coordinates A = (X_{A}, Y_{A},Z_{A}), B = (X_{B}, Y_{B}, Z_{B}) of the working points. The number of vector components may be more than three, in some
implementations. One of the vectors may be an accumulator (e.g., A) and the other vector(s)
(e g., B) may be an auxiliary vector(s). The auxiliary vector may be used to improve efficiency of the cryptographic operation. In some implementations, the second vector may represent the accumulator and the first vector may represent the auxiliary vector. The auxiliary vector may represent a public key P (e.g., as in Double Add ladder algorithm) or be a combination A+P of the accumulator value and the public key P (as in Montgomery ladder algorithm), or any other number that may be used by a specific cryptographic algorithm. In some implementations, the first and/or the second vector may change between successive iterations of the algorithm (e.g., as both the accumulator and the auxiliary vector change in the Montgomery ladder algorithm). In some implementations, the first or the second vector may remain fixed between successive iterations of the algorithm (e.g., as the auxiliary vector remains fixed in the Double Add ladder algorithm). In some implementations, any of the vectors— representing working points, base points, auxiliary points, etc.— may have only one component (in which case a single number may represent the corresponding vector).
[0052] A state of the algorithm, S = (P, A, B, ... , u, w, z, ... ) at its particular iteration, may be characterized by a number of vectors (such as vectors P, A, E) and a number of additional parameters u, w, z, which may be onecomponent numbers or multicomponent vectors. For example, u may be a slope of a line that connects a particular point (e.g., B) with some other point (e.g., A or P); z may be an additional scaling factor; and so on. In some implementations, some of the components of the vectors may be elided. For example, a given point A may be uniquely identified by its X_{A}, Z_{L} components (or Y_{A}, Z_{A}, or X_{A}, Y_{A}), so that the third component, carrying a redundant information may be omitted. In some implementations, a state of the algorithm may be represented with the difference of some vectors, S— (P, A— P, B—
P, ... , u, w, z, ... ). In some implementations, some of the vector components may be shared by some vectors. For example, in various coz algorithms, some or all of the components Z_{P}, Z_{A}, Z_{B} may be the same (and may further coincide with the“global” parameter z of the state S of the algorithm (at its particular iteration).
[0053] At block 410, the processing device performing method 400 may load a first vector and a second vector, such that the first vector includes a plurality of first vector components identifying a first point in a cryptographic space and the second vector includes a plurality of second components identifying a second point in the cryptographic space. The processing device may then obtain a scaled first vector by modifying at least some of the plurality of first vector components so that the scaled first vector identifies the same first point in the cryptographic space Optionally, the processing device may also obtain a scaled second vector by modifying at least some of the plurality of second vector components so that the scaled second vector identifies the same second point in the cryptographic space.
[0054] Scaling of the first vector (and, similarly, the second vector, if applicable) may be projective scaling and may include modifying at least some of the plurality of vector components so that the modified plurality of vector components identifies the same point in the cryptographic space (e.g., elliptic curve). In some implementations, where the elliptic curve is a Weierstrass curve, modifying the plurality of vector components may include multiplying some or all vector components by an integer power of a random factor. For example, modifying the plurality of vector components may include (i) multiplying a first vector component by a random factor R, multiplying a second vector component a square of the random factor, R^{2} and/or multiplying a third vector component by a cube of the random factor, R^{3}.
[0055] Scaling of the first vector (and, optionally, the second vector) may also include updating auxiliary information, which, together with the first vector components and the second vectors components may identify a current arithmetic state of the ladder. For example, the auxiliary information may identify correspondence between the first vector components and the
first point in the cryptographic space (e.g., elliptic curve) and similarly identify correspondence between the second vector components and the second point in the cryptographic space. The updated auxiliary information may identify correspondence between the modified first and second vector components and the respective points in the cryptographic space. The auxiliary information may be stored in additional registers different from the registers used to store the first vector components and the second vector components. In various implementations, the auxiliary information may include the random factor R (for one or both vectors, if the respective random factors are different from each other), the running value Z (for one or both vectors) of the zcoordinate (e.g., the previous value of the zcoordinate multiplied by the random factor R), the X and/or Y coordinates of the base point P (possibly scaled with the running value Z or some other value), the slope of the line connecting the base point with the first and the second points in the cryptographic space, and so on. In some implementations, where some of the components of the first vector and/or the second vector (e.g., Tor Y) components are elided from the respective vectors, some of the elided component s) may be stored in the auxiliary information.
[0056] To protect the cryptographic operation from potential sidechannel attacks, the processing device performing method 400 may projectively scale the first vector (at block 410), multiplying it by some random number. Projective scaling may modify the components of the first vector without changing the point in the cryptographic space identified by the vector components. In some implementations, both the first and the second vectors may be projectively scaled by multiplying the first and the second vectors by the same or different random numbers.
[0057] Computations that are to be performed during various iterations of the cryptographic operation may depend on the value of a key bit k_{j} (e.g., of the private key k ) that corresponds to the current iteration being executed. For example, in the Double Add ladder algorithm, the key bit value may determine if the“double” arithmetic operation or the“doubleandadd” arithmetic operation is to be performed. In the Montgomery ladder algorithm, the key bit value may determine whether the“double” operation is to be performed on the accumulator or the auxiliary vector. At block 420, the processing device may determine that the key bit k_{j} has a first key bit value (which may be 1 or 0). The method 400 may continue with identifying, responsive to determining that the key bit has the first key bit value, a first arithmetic operation to be performed on the scaled first vector and the (scaled) second vector (430). For example, the first arithmetic operation may be an add operation (where the scaled first vector is added to the (scaled) second vector), a doubleandadd operation (where the (scaled) second vector is added to the a double of the scaled first vector), or some other operation defined by the specific algorithm implemented by the processing device. The processing device may perform (execute)
the identified operation on the scaled first vector and the (scaled) second vector to obtain a third vector (430).
[0058] The method 400 may continue with generate a random number, b (440) to determine where in a memory device the third number is to be stored. The random number b may be a one bit number, if there are two possible memory locations (registers) in the memory device where the third number may be stored. Alternatively, the random number b may be a multibit number if there are more than two possible memory locations where the third vector may be stored. At block 450, the processing device may store the third vector in a first memory location, responsive to the random number having a first value (e g., 0 or 1), or in a second memory location, responsive to the random number having a second value (e.g. 1 or 0). The processing device may also perform additional arithmetic operations (successively or in parallel to the first arithmetic operation) on the scaled first vector and/or the (scaled) second vector and obtain additional outputs, e g., a fourth vector. For example, if the first arithmetic operation to determine the third vector is the“add” operation of the Montgomery ladder, the additional operation may be the“double” operation to be performed on the scaled first or the (scaled) second vector to obtained the fourth vector. The fourth vector may be stored in the first memory location, responsive to the random number having the second value (e.g., 1 or 0), or in the second memory location, responsive to the random number having the first value (e.g., 0 or 1). The first arithmetic operation and the second arithmetic operation may be modular arithmetic operations.
[0059] At block 460, which may be performed during the next (e.g., / /th) iteration of the algorithm, the processing device may read out the vectors stored in the first memory location and/or the second memory location and use these vectors as inputs for a second arithmetic operation. The second arithmetic operation may be identified based on the value of the key bit k_{j+i} (which corresponds to the + 7th iteration). For example, responsive to determining that the key bit k_{j+1} has the first key bit value (e.g., 0 or 1), the processing device may identify that the second arithmetic operation is the same as the first arithmetic operation. Alternatively, responsive to determining that the key bit k_{j+1} has a second key bit value (e.g., 1 or 0), the processing device may identify the second arithmetic operation as different from the first arithmetic operation. For example, in implementations of the Montgomery ladder, the first arithmetic operation may be an“add” operation, whereas the second operation arithmetic operation may be the“double” operation (or vice versa).
[0060] Having identified the second arithmetic operation to be performed based on the key bit value k_{j+1} and determined what types of inputs are associated with the second arithmetic operation, the processing device may also access the value b used during the preceding iteration
for output distribution and use it in the decisionmaking block 465. The processing device may select a first input and a second input for the second arithmetic operation based on the random number value b having the first value (e g., 0 or 1) or the second value (e g., 1 or 0). For example, the first input may be the third vector stored in the first memory location and the second input may be the fourth vector stored in the second memory location (if b has the first value) (470). Alternatively, the first input may be the third vector stored in the second memory location and the second input may be the fourth vector stored in the first memory location (if b has the second value) (480).
[0061] Upon loading the first and the second inputs as described, the processing device may perform the second arithmetic operation on the first input and the second input 4). As a result, the outcome of the second arithmetic operation is to remain the same regardless of how the outputs of the first arithmetic operation were stored at the end of the previous, yth, iteration. In essence, the έcontingent loading of the inputs at the beginning of the / /tli iteration reverses h contingent storing of the outputs at the end of the yth iteration, while introducing randomization operations that make it more difficult for an adversary to correlate emissions from the processing device among various operations of the algorithm being performed. Accordingly, this makes it harder for the adversary to mount a successful sidechannel attack.
[0062] Any arithmetic operations described in reference to FIGs 2A, 2B, 3A, 3B, and 4 may be modular arithmetic operations.
[0063] FIG. 5 depicts a block diagram of an example computer system 500 operating in accordance with one or more aspects of the present disclosure. In various illustrative examples, computer system 500 may represent the processing device 100, illustrated in FIG. 1.
[0064] Example computer system 500 may be connected to other computer systems in a LAN, an intranet, an extranet, and/or the Internet. Computer system 500 may operate in the capacity of a server in a clientserver network environment. Computer system 500 may be a personal computer (PC), a settop box (STB), a server, a network router, switch or bridge, or any device capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that device. Further, while only a single example computer system is illustrated, the term“computer” shall also be taken to include any collection of computers that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methods discussed herein.
[0065] Example computer system 500 may include a processing device 502 (also referred to as a processor or CPU), a main memory 504 (e g., readonly memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM), etc.), a static memory 506 (e.g., flash memory, static random access memory (SRAM), etc.), and a secondary
memory (e g., a data storage device 518), which may communicate with each other via a bus 530.
[0066] Processing device 502 represents one or more generalpurpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, processing device 502 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 502 may also be one or more specialpurpose processing devices such as an application specific integrated circuit (ASIC), a field
programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. In accordance with one or more aspects of the present disclosure, processing device 502 may be configured to execute instructions implementing method 400 of protecting cryptographic operations by intermediate randomization.
[0067] Example computer system 500 may further comprise a network interface device 508, which may be communicatively coupled to a network 520. Example computer system 500 may further comprise a video display 510 (e g., a liquid crystal display (LCD), a touch screen, or a cathode ray tube (CRT)), an alphanumeric input device 512 (e.g., a keyboard), a cursor control device 514 (e.g., a mouse), and an acoustic signal generation device 516 (e g., a speaker).
[0068] Data storage device 518 may include a computerreadable storage medium (or, more specifically, a nontransitory computerreadable storage medium) 528 on which is stored one or more sets of executable instructions 522. In accordance with one or more aspects of the present disclosure, executable instructions 522 may comprise executable instructions implementing method 400 of protecting cryptographic operations by intermediate randomization.
[0069] Executable instructions 522 may also reside, completely or at least partially, within main memory 504 and/or within processing device 502 during execution thereof by example computer system 500, main memory 504 and processing device 502 also constituting computer readable storage media. Executable instructions 522 may further be transmitted or received over a network via network interface device 508.
[0070] While the computerreadable storage medium 528 is shown in FIG. 5 as a single medium, the term“computerreadable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of operating instructions. The term“computer readable storage medium” shall also be taken to include any medium that is capable of storing or encoding a set of instructions for execution by the machine that cause the machine to perform any one or more of the methods described herein. The term“computerreadable storage medium”
shall accordingly be taken to include, but not be limited to, solidstate memories, and optical and magnetic media.
[0071] Some portions of the detailed descriptions above are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a selfconsistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
[0072] It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as“identifying,” “determining,”“storing,”“adjusting,”“causing,”“returning,”“comparing,”“creating,” “stopping,”“loading,”“copying,”“throwing,”“replacing,”“performing,” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
[0073] Examples of the present disclosure also relate to an apparatus for performing the methods described herein. This apparatus may be specially constructed for the required purposes, or it may be a general purpose computer system selectively programmed by a computer program stored in the computer system. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including optical disks, CDROMs, and magneticoptical disks, readonly memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic disk storage media, optical storage media, flash memory devices, other type of machineaccessible storage media, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
[0074] The methods and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized
apparatus to perform the required method steps. The required structure for a variety of these systems will appear as set forth in the description below. In addition, the scope of the present disclosure is not limited to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present disclosure.
[0075] It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other implementation examples will be apparent to those of skill in the art upon reading and understanding the above description. Although the present disclosure describes specific examples, it will be recognized that the systems and methods of the present disclosure are not limited to the examples described herein, but may be practiced with modifications within the scope of the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than a restrictive sense. The scope of the present disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
Claims
1. A method to perform a cryptographic operation, the method comprising:
loading, by a processing device, a first vector and a second vector, wherein the first vector identifies a first point in a cryptographic space and the second vector identifies a second point in the cryptographic space;
scaling, by the processing device, a first vector, wherein the scaled first vector identifies the same first point in the cryptographic space;
responsive to determining that a key bit has a first key bit value, identifying a first arithmetic operation to be performed involving the scaled first vector and the second vector; performing, by the processing device, the first arithmetic operation involving the scaled first vector and the second vector to obtain a third vector;
generating a random number;
storing the third vector in (i) a first memory location, responsive to the random number having a first value, or (ii) a second memory location, responsive to the random number having a second value; and
performing, by the processing device, a second arithmetic operation involving a first input and a second input, wherein, based on the random number having the first value or the second value:
the first input is the third vector stored either in the first memory location or the second memory location; and
the second input is a fourth vector stored either in the second memory location or the first memory location.
2. The method of claim 1, further comprising:
scaling, by the processing device, a second vector, wherein the scaled second vector identifies the same second point in the cryptographic space.
3. The method of claim 1, wherein the first arithmetic operation comprises adding the scaled first vector to the second vector.
4. The method of claim 1, wherein the first arithmetic operation comprises adding the second vector to a double of the scaled first vector.
5. The method of claim 1, further comprising:
responsive to determining that the key bit has the first key bit value, performing an additional arithmetic operation involving the scaled first vector or the second vector to obtain the fourth vector.
6. The method of claim 5, further comprising:
storing the fourth vector in (i) the first memory location, responsive to the random number having the second value, or (ii) in the second memory location, responsive to the random number having the first value.
7. The method of claim 5, wherein the fourth vector is a double of the scaled first vector or a double of the second vector.
8. The method of claim 1, wherein the first arithmetic operation and the second arithmetic operation are modular arithmetic operations.
9. The method of claim 1, wherein the cryptographic space is a space of points belonging to an elliptic curve.
10. The method of claim 1, wherein the first vector comprises a plurality of first vector components and wherein the second vector comprises a plurality of second vector components, and wherein scaling the first vector comprises:
modifying at least some of the plurality of first vector components; and
updating auxiliary information, wherein the auxiliary information identifies
correspondence between the modified first vector components and the first point in the cryptographic space.
11. The method of claim 10, wherein scaling the first vector comprises:
multiplying a first component of the plurality of first vector components by a random factor; and
multiplying a second component of the plurality of first vector components by an integer power of the random factor; and
updating the auxiliary information with the random factor.
12. The method of claim 1, wherein the random number is a onebit number.
13. The method of claim 1, wherein performing the second arithmetic operation is responsive to determining that a new key bit has the first key bit value, and wherein the second arithmetic operation is the same as the first arithmetic operation.
14. The method of claim 1, wherein performing the second arithmetic operation is responsive to determining that a new key bit has a second key bit value, and wherein the second arithmetic operation is different from the first arithmetic operation.
15. A system to perform a cryptographic operation, the system comprising:
a memory device to store a first vector and a second vector; and
a processor coupled to the memory device to:
load, by a processing device from the memory device, a first vector and a second vector, wherein the first vector identifies a first point in a cryptographic space and the second vector identifies a second point in the cryptographic space;
scale, by the processing device, a first vector, wherein the scaled first vector identifies the same first point in the cryptographic space;
responsive to determining that a key bit has a first key bit value, identify a first arithmetic operation to be performed involving the scaled first vector and the second vector;
performing, by the processing device, the first arithmetic operation involving the scaled first vector and the second vector to obtain a third vector;
generate a random number;
store the third vector in (i) a first memory location, responsive to the random number having a first value, or (ii) a second memory location, responsive to the random number having a second value; and
perform, by the processing device, a second arithmetic operation involving a first input and a second input, wherein, based on the random number having the first value or the second value:
the first input is the third vector stored either in the first memory location or the second memory location; and
the second input is a fourth vector stored either in the second memory location or the first memory location.
16. The system of claim 15, wherein to perform the second arithmetic operation the processor is to determine that a new key bit has the first key bit value, and wherein the second arithmetic operation is the same as the first arithmetic operation.
17. The system of claim 15, wherein to perform the second arithmetic operation the processor is to determine that a new key bit has a second key bit value, and wherein the second arithmetic operation is different from the first arithmetic operation.
18. A computerreadable medium storing instruction thereon, wherein the instructions, when executed by a processing device performing a cryptographic operation, cause the processing device to:
load, by a processing device from a memory device, a first vector and a second vector, wherein the first vector identifies a first point in a cryptographic space and the second vector identifies a second point in the cryptographic space;
scale, by the processing device, a first vector, wherein the scaled first vector identifies the same first point in the cryptographic space;
responsive to determining that a key bit has a first key bit value, identify a first arithmetic operation to be performed involving the scaled first vector and the second vector;
performing, by the processing device, the first arithmetic operation involving the scaled first vector and the second vector to obtain a third vector;
generate a random number;
store the third vector in (i) a first memory location, responsive to the random number having a first value, or (ii) a second memory location, responsive to the random number having a second value; and
perform, by the processing device, a second arithmetic operation involving a first input and a second input, wherein, based on the random number having the first value or the second value:
the first input is the third vector stored either in the first memory location or the second memory location; and
the second input is a fourth vector stored either in the second memory location or the first memory location.
19. The computerreadable medium of claim 18, wherein the first arithmetic operation comprises adding the scaled first vector to the second vector or adding the second vector to a double of the scaled first vector.
20. The computerreadable medium of claim 18, wherein the first vector comprises a plurality of first vector components and wherein the second vector comprises a plurality of second vector components, and wherein to scale the first vector the instructions are to cause the processing device to modify at least some of the plurality of first vector components.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

US17/309,937 US20220075879A1 (en)  20190107  20200106  Protection of cryptographic operations by intermediate randomization 
Applications Claiming Priority (4)
Application Number  Priority Date  Filing Date  Title 

US201962789103P  20190107  20190107  
US62/789,103  20190107  
US201962912416P  20191008  20191008  
US62/912,416  20191008 
Publications (1)
Publication Number  Publication Date 

WO2020146285A1 true WO2020146285A1 (en)  20200716 
Family
ID=71520892
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

PCT/US2020/012419 WO2020146285A1 (en)  20190107  20200106  Protection of cryptographic operations by intermediate randomization 
Country Status (2)
Country  Link 

US (1)  US20220075879A1 (en) 
WO (1)  WO2020146285A1 (en) 
Families Citing this family (7)
Publication number  Priority date  Publication date  Assignee  Title 

IT201900025567A1 (en) *  20191224  20210624  St Microelectronics Srl  PROCEDURE FOR PERFORMING ENCRYPTING OPERATIONS ON A PROCESSING DEVICE, CORRESPONDING PROCESSING DEVICE AND IT PRODUCT 
IT202000000886A1 (en)  20200117  20210717  St Microelectronics Srl  PROCEDURE FOR PERFORMING DATA ENCRYPTING OPERATIONS IN A PROCESSING DEVICE, CORRESPONDING PROCESSING DEVICE AND IT PRODUCT 
IT202000006475A1 (en) *  20200327  20210927  St Microelectronics Srl  PROCEDURE FOR PERFORMING DATA ENCRYPTING OPERATIONS IN A PROCESSING DEVICE, CORRESPONDING PROCESSING DEVICE AND IT PRODUCT 
US20220368514A1 (en) *  20210422  20221117  Northeastern University  Methods and Systems For Protecting Against MemoryBased SideChannel Attacks 
WO2023141935A1 (en) *  20220128  20230803  Nvidia Corporation  Techniques, devices, and instruction set architecture for balanced and secure ladder computations 
CN116830076A (en)  20220128  20230929  辉达公司  Techniques, apparatus and instruction set architecture for efficient modulo division and modulo inversion 
CN114844650B (en) *  20220524  20231201  北京宏思电子技术有限责任公司  Equipment signature method and system 
Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

EP1231738B1 (en) *  20010213  20080116  Francisco Andeyro Garcia  Method based on a graphically implemented algorithm for the generation or filtering of data sequences for cryptographic applications 
US20140233726A1 (en) *  20121120  20140821  Fujitsu Limited  Decryption method, recording medium storing decryption program, decryption device, key generation method, and recording medium storing key generation program 
US20180267981A1 (en) *  20150903  20180920  Commissariat A L'energie Atomique Et Aux Energies Alternatives  Method for confidentially querying a locationbased service by homomorphing cryptography 
Family Cites Families (6)
Publication number  Priority date  Publication date  Assignee  Title 

US6782100B1 (en) *  19970129  20040824  Certicom Corp.  Accelerated finite field operations on an elliptic curve 
EP1648111B1 (en) *  20030722  20140115  Fujitsu Limited  Tamperresistant encryption using a private key 
US8913739B2 (en) *  20051018  20141216  Telecom Italia S.P.A.  Method for scalar multiplication in elliptic curve groups over prime fields for sidechannel attack resistant cryptosystems 
US8559625B2 (en) *  20070807  20131015  Inside Secure  Elliptic curve point transformations 
JP6277734B2 (en) *  20140120  20180214  富士通株式会社  Information processing program, information processing apparatus, and information processing method 
CN104519071B (en) *  20150112  20170811  北京科技大学  It is a kind of that there is the group's encryption and decryption method and system for selecting and excluding function 

2020
 20200106 WO PCT/US2020/012419 patent/WO2020146285A1/en active Application Filing
 20200106 US US17/309,937 patent/US20220075879A1/en active Pending
Patent Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

EP1231738B1 (en) *  20010213  20080116  Francisco Andeyro Garcia  Method based on a graphically implemented algorithm for the generation or filtering of data sequences for cryptographic applications 
US20140233726A1 (en) *  20121120  20140821  Fujitsu Limited  Decryption method, recording medium storing decryption program, decryption device, key generation method, and recording medium storing key generation program 
US20180267981A1 (en) *  20150903  20180920  Commissariat A L'energie Atomique Et Aux Energies Alternatives  Method for confidentially querying a locationbased service by homomorphing cryptography 
Also Published As
Publication number  Publication date 

US20220075879A1 (en)  20220310 
Similar Documents
Publication  Publication Date  Title 

US20220075879A1 (en)  Protection of cryptographic operations by intermediate randomization  
EP1320027B1 (en)  Elliptic curve cryptosystem apparatus, method and program  
US7430293B2 (en)  Cryptographic device employing parallel processing  
Sasdrich et al.  Implementing Curve25519 for sidechannelprotected elliptic curve cryptography  
CA2741698C (en)  Method and apparatus for modulus reduction  
Varchola et al.  MicroECC: A lightweight reconfigurable elliptic curve cryptoprocessor  
US20130301826A1 (en)  System, method, and program for protecting cryptographic algorithms from sidechannel attacks  
Hasenplaugh et al.  Fast modular reduction  
OchoaJiménez et al.  Implementation of RSA signatures on GPU and CPU architectures  
US20220085999A1 (en)  System and method to optimize decryption operations in cryptographic applications  
KR101925868B1 (en)  Modular arithmetic unit and secure system having the same  
US20230254145A1 (en)  System and method to improve efficiency in multiplicationladderbased cryptographic operations  
Dong et al.  sDPFRSA: Utilizing floatingpoint computing power of GPUs for massive digital signature computations  
US20230254115A1 (en)  Protection of transformations by intermediate randomization in cryptographic operations  
US20230246806A1 (en)  Efficient masking of secure data in laddertype cryptographic computations  
US20230244445A1 (en)  Techniques and devices for efficient montgomery multiplication with reduced dependencies  
WO2023003737A2 (en)  Multilane cryptographic engine and operations thereof  
Seo et al.  SIKE in 32bit ARM processors based on redundant number system for NIST levelII  
US7590235B2 (en)  Reduction calculations in elliptic curve cryptography  
Cui et al.  Highspeed elliptic curve cryptography on the NVIDIA GT200 graphics processing unit  
CN113032797A (en)  Method for performing cryptographic operations in a processing device  
US20220060315A1 (en)  Signbased partial reduction of modular operations in arithmetic logic units  
US20230042366A1 (en)  Signefficient addition and subtraction for streamingcomputations in cryptographic engines  
WO2020146284A1 (en)  Efficient squaring with loop equalization in arithmetic logic units  
Liu et al.  Multiprecision multiplication on ARMv8 
Legal Events
Date  Code  Title  Description 

121  Ep: the epo has been informed by wipo that ep was designated in this application 
Ref document number: 20738689 Country of ref document: EP Kind code of ref document: A1 

NENP  Nonentry into the national phase 
Ref country code: DE 

32PN  Ep: public notification in the ep bulletin as address of the adressee cannot be established 
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.10.2021) 

122  Ep: pct application nonentry in european phase 
Ref document number: 20738689 Country of ref document: EP Kind code of ref document: A1 