WO2003040911A2 - Carte d'acceleration de traitement cryptographique - Google Patents
Carte d'acceleration de traitement cryptographique Download PDFInfo
- Publication number
- WO2003040911A2 WO2003040911A2 PCT/FR2002/002036 FR0202036W WO03040911A2 WO 2003040911 A2 WO2003040911 A2 WO 2003040911A2 FR 0202036 W FR0202036 W FR 0202036W WO 03040911 A2 WO03040911 A2 WO 03040911A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mod
- bits
- algorithm
- message
- stage
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
- G06F7/505—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
- G06F7/5052—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination using carry completion detection, either over all stages or at sample stages only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
- G06F7/505—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
- G06F7/506—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination with simultaneous carry generation for, or propagation over, two or more stages
Definitions
- the present invention relates to an acceleration card for cryptographic processing by the RSA public key algorithm, based on the acceleration of the calculation of exponentials in modular arithmetic.
- This implementation uses an original method and architecture for calculating the exponential, including: 1) an adder stage based on a system of shifts and additions.
- the invention can be realized in the form of an Asia, for an acceleration card of the Hand Sha e of the SSL Protocol V. 3. This protocol is standardized in the field of e ⁇
- the Hand Shake is the initial phase for the establishment of a secure connection. This phase is burdened in its speed of execution by the RSA, DH, DSA operations.
- the basic functions of these two operators being constituted by a modular exponentiation
- the object of the patent is to present successively the three stages and the upper layer Dispatcher for processing the exponentiator intended for the production of FPGAs, and of an autonomous ASIC.
- This asic intended for Handshake acceleration cards whose exponentiation determines the speed of the Authentication Operation.
- a modular modular adder ( key length) of: 512, 1024, 2048, 4096 bits. It is original by its statistical treatment of the reservoirs, its structure and its corresponding architecture. It performs the accé eration of 16, compared to any other structure.
- the third stage is an exponentiation treatment of the results of the previous one.
- the speed is increased by a factor (4) for processing at that level in the uti icient, CRT algorithm, said Chineese, Remaining Theorem.
- -A fourth stage called Dispatcher is used to manage the Multicore structure of the exponentiator, to distribute R essources and Data so as to allow the cores of ca l culs (Addition, multiplication, exponentiation) to work permanently, to serve as an interface with the D rivers (Softs) external to La Puce.
- public key cryptography will be used for authentication and transmission of the "transfer key", which uses conventional cryptographic algorithms.
- the RSA public key encryption algorithm was invented in 1977 by Ron RIVEST, Adi SHAMIR and Léonard ADLEMA.
- M be a message or a fragment of a message.
- the Diffie-Hellman algorithm is a simple key exchange algorithm. Let A and B be two people, having chosen in an insecure way two numbers n (large and (n-l) / 2 must be prime) and g such n> g> l and that g is primitive with respect to n.
- the objective is to build a rapid modular exponentiator. However, this operation can be broken down into a succession of Additions, and modular multiplications [2].
- the algorithm that has been chosen is Interleaving multiplications and reductions, which is based on a series of doubling and subtraction.
- Operations 2a to 2f can be parallelized with operations 2g to 2j. These operations are included in an iterative algorithm.
- the first stage of the modular multiplier (figure 1.3 ) is composed of a multiplication operator x2, an adder, and various selection modules.
- FIG. 1 represents an example of a modular multiplication algorithm.
- FIG. 2 shows an overview of the modular multiplier
- Figure 3 shows a view of the first stage of the modular multiplier
- the operator x 2 is only a simple connection since multiplying by 2 in base 2 amounts to translating one bit to the left.
- a first multiplexer makes it possible to select the initial value of A 1 , which is the A of operation A.
- B mod n is the initial value of A 1 , which is the A of operation A.
- a second multiplexer chooses the value A 1 or A - n according to the sign of A - n. Then consider the following table:
- a function "or" between the most significant bit of A and the retention of the operation A - n therefore makes it possible to choose between A and A - n.
- Second stage of the modular multiplier Figure 4 shows a view of the second stage of the modular multiplier
- the second stage of the modular multiplier mainly consists of two adders and a selection logic.
- a multiplexer chooses the starting value (always 0).
- the first adder performs the operation RR + A. Bi.
- the multiplication between A and B ⁇ is represented by a function "and" (but it is physically carried out by a multiplexer).
- the second adder performs the operation R - n.
- the purpose of the state machine is to perform an iterative algorithm (see Figure 5). This is accompanied by a counter indicating the maximum number of iterations.
- an iteration consists in waiting for the end of the calculation (the end of the calculation of all the adders);
- the elementary adders are of the carry look ahead adder type [6, 7]. In a single block these are impractical for large numbers because the buffer surface would then be greater than the combinatorial processing logic [4].
- FIG. 6 represents an operating table of the modular multiplier automaton: actions of each state
- Figure 7 shows a view of the constitution of an adder
- Addition management is special. Each adder takes as input 2 fragments of the numbers to be added, these fragments being taken of identical sizes. At each clock cycle, the carry is propagated to the next stage. A mechanism detects if the carry is the same as for the previous cycle (it is therefore an "or exclusive " between the previous value and the current value). If all the holdbacks are unchanged so there is no longer any carryover propagation: the calculation is finished.
- FIG. 8 represents a table of an example of propagation of the reserve: unfavorable case In any case (as in table 1.
- FIG. 9 represents the table of propagation of the carry: any case The probability of having blocks Pc is very low and it has been established that the best compromise between the average propagation and the speed of an adder was to have 16-bit blocks for 1024-bit words with carry look ahead adder adders [4]. We do then an addition over 1024 bits (or more) in a few periods which correspond in fact to the propagation time of a 16-bit adder.
- FIG. 10 represents the diagram of constitution of an adder. Q A possible development [4] is presented in Figure 10.
- the operation of the adder allows us to introduce a faster stage between 2 adders. 5
- This stage is not intended to calculate the carry, but only to know whether or not it is necessary to calculate it. Indeed the deductions will in any case be calculated by the current adders. We are concerned at this stage only 0 of the deductions.
- This module takes as input the inputs of the current adder, as well as the carry of the current adder and that of the previous adder. Thus when a block is Pc, the carry of the previous adder is propagated directly into the next adder. Otherwise the reinforcement calculated by the current adder is reinjected.
- Figure 12 shows an RTL view of the shift register
- each register takes a bit of the input word corresponding to its weight, otherwise each register takes the value of the previous register.
- FIG. 1.10 - The RL algorithm
- FIG. 1.11 Overview of the modular exponentiator
- the architecture (presented in figure 1.11) of the module is composed:
- the first stage of the modular exponentiator (see f i g ure 1.12) consists of a single multiplier and a re re g is t f or the backup P at each iteration.
- a multiplexer selects the value M to P.
- J 'ai instantiated component "modular multiplier" p q our u'i the operation calculates the PIP odn ⁇ P 2 mod n.
- FIG. 1.12 First floor of exponentiator modu e ir
- the second stage (on figure 1.13 is constructed in a similar way to the first.
- the modular multiplier allows the calculation of CP mod n, while a register contains the current value of C.
- a first multiplexer allows to choose the initial value of the algorithm (which is always 0). Another allows you to choose either to load the value CP mod n in the register or to leave it at the current value of C.
- the exponentiator state machine (in Figure 1.14) has the same body as that of the modular multiplier (section 1.2.4 ) . In both cases, it is an iterative algorithm. The tests are always done by the "operative part". We thus reduce this automaton to its strict minimum: - we start the algorithm on command of a signal; - an iteration consists in awaiting the end of ca l cu l of one of the two multipliers (OKI signals or 0K2 indicating a f in d u calculation of each multiplier);
- FIG. 1.14 - Modular exponentiator state machine Table 1.4 indicates the actions taken during each step
- INiT initialization signal loading of the initial values in the corresponding registoes resetting of the counter
- This module therefore aims to adapt between the internal buses of the exponentiator (or multiplier) and the environment that operates it.
- FIG. 2.1 Global view of the final multiplier
- the multiplier "Top" is very simple ( see Figure 2.1).
- a command signal indicates to the module when to start the calculation. In the same way another signal indicates to him when the computation is finished. Indeed it must be remembered that the calculation time cannot be known in advance for any number. Operation
- the characteristics of our final module are described in terms of surface and speed.
- the first parameter gives the cost in spoiling of the component, the second its performance.
- I n the peculiarity of our system that it takes between 65 "-i
- the duration of the calculation therefore depends on the nature of the message and statistical studies have shown that this time was on average 65 * je
- the "Top” of the exponentiator looks like that of the multiplier (see figure 2.3).
- a control signal indicates to the module when to start the calculation and another signal indicates to it when the calculation is finished, the calculation time of the multiplier not being known in advance.
- Stability detection which helps stop alculs in C Temp Real /
- the functional timing diagram in Figure 2.4 illustrates the operation of the module. After a "reset” is applied during a period of the clock die b ut request signal e calculation. After a succession of intermediate results, the value validated by the "ok" signal is taken into account.
- the operating frequency may vary, but it s' g it always slower adders 1 6b ict o f library FPGA, soplus rapid Asics.
- the tools are therefore:
- R enoir for project administration and HDL code translation graphically (block diagrams, organization, state machine, truth table);
- ModelSim for analysis and simulation of HDL code (visualization of signals and internal variables: in the form of chronograms and data flows, execution of simulation scripts in TCL2 format;
- NIOS kit and FPGA Altera kit comprising a matrix of 200,000 doors as well as development tools (IP, compiler, debugger) of a "soft" processor core: the NIOS.
- IP development tools
- compiler compiler
- debugger development tools
- matrices (1 million doors ) .
- This module can work as is. Benefiting from Factor 16 in Treatments. However, improvements are still being made to it.
- the q ui following algorithm has been optimized for the imp em e n t a t ion Hardware. It reduces by about 25% the number tota l of dd has made i t ions.
- CH2 - Figure 5 Architecture of the 512-bit multiplier CHAP-3) -Original implementation of the CRT algorithm on the previous version (Montgomery) of the modular exponentiator (Claims: Implantation, Architecture).
- the Architecture is based on the Sqare & Multiply Algorithm.
- the multiplier and square component are ZENCOD Montgomery Multiplier (The square component is a simplified version )
- the following algorithm is the Square & Multiply method, as it was implemented in ZENCOD exponentiation core.
- CH3- Figure 1 Square & Multiply Algorithm
- CH3- Figure 2 Modular Exponentiation Core Architec t ure - 512 bits
- H is a hashing function: for DSA the standard is SHA-1 (Secure Hash Standard)
- p is a prime number of length L between 512 and 1024 bits (L is a multiple of 64)> q is a prime factor of p-1 (length 160 bits )
- Z 3 is a field and the inverse of x is defined for x ⁇ 0 mod z.
- DSA DSA -> 2 / A hardware implementation of DSA should be configurable for at least the following two points:
- the host CPU may transmit directly H (M) or ask the chip to compute the hashing (SHA-1) by hardware.
- the host CPU may transmit the random number k to the chip or ask it to generate it by hardware.
- This implementation method can be used in order to reduce the data transfer between the computation core and the host CP U.
- Step 1 The hardware have to execute the following instructions:
- the 1024-bits CRT core uses two 512 bits exponentiation cores. Each core instantiate two ZENCOD Montgomery Multipliers. (SQ_MONT bloc is a simplified version of MUL_MONT.)
- the two 512 bits cores can be chained to execute the 1024-bits wide operation of CRT computation.
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Optimization (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP02747535A EP1417566A2 (fr) | 2001-06-13 | 2002-06-13 | Carte d'acceleration de traitement cryptographique |
AU2002317928A AU2002317928A1 (en) | 2001-06-13 | 2002-06-13 | Cryptographic processing accelerator board |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US29781801P | 2001-06-13 | 2001-06-13 | |
US60/297,818 | 2001-06-13 | ||
US17124302A | 2002-06-13 | 2002-06-13 | |
US10/171,243 | 2002-06-13 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2003040911A2 true WO2003040911A2 (fr) | 2003-05-15 |
WO2003040911A3 WO2003040911A3 (fr) | 2004-02-26 |
Family
ID=26866880
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2002/002036 WO2003040911A2 (fr) | 2001-06-13 | 2002-06-13 | Carte d'acceleration de traitement cryptographique |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1417566A2 (fr) |
AU (1) | AU2002317928A1 (fr) |
WO (1) | WO2003040911A2 (fr) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3947671A (en) * | 1974-07-06 | 1976-03-30 | International Business Machines Corporation | Binary parallel computing arrangement for additions or subtractions |
EP0098692A2 (fr) * | 1982-07-01 | 1984-01-18 | Hewlett-Packard Company | Dispositif d'addition d'un premier et d'un second opérandes binaires |
WO1998048345A1 (fr) * | 1997-04-18 | 1998-10-29 | Certicom Corp. | Processeur arithmetique |
US6088800A (en) * | 1998-02-27 | 2000-07-11 | Mosaid Technologies, Incorporated | Encryption processor with shared memory interconnect |
-
2002
- 2002-06-13 WO PCT/FR2002/002036 patent/WO2003040911A2/fr not_active Application Discontinuation
- 2002-06-13 EP EP02747535A patent/EP1417566A2/fr not_active Withdrawn
- 2002-06-13 AU AU2002317928A patent/AU2002317928A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3947671A (en) * | 1974-07-06 | 1976-03-30 | International Business Machines Corporation | Binary parallel computing arrangement for additions or subtractions |
EP0098692A2 (fr) * | 1982-07-01 | 1984-01-18 | Hewlett-Packard Company | Dispositif d'addition d'un premier et d'un second opérandes binaires |
WO1998048345A1 (fr) * | 1997-04-18 | 1998-10-29 | Certicom Corp. | Processeur arithmetique |
US6088800A (en) * | 1998-02-27 | 2000-07-11 | Mosaid Technologies, Incorporated | Encryption processor with shared memory interconnect |
Non-Patent Citations (2)
Title |
---|
ORTON G A ET AL: "VLSI IMPLEMENTATION OF PUBLIC-KEY ENCRYPTION ALGORITHMS" ADVANCES IN CRYPTOLOGY. SANTA BARBARA, AUG. 11 - 15, 1986, PROCEEDINGS OF THE CONFERENCE ON THEORY AND APPLICATIONS OF CRYPTOGRAPHIC TECHNIQUES (CRYPTO), BERLIN, SPRINGER, DE, vol. CONF. 6, 1986, pages 277-301, XP000090670 * |
SHIH-LIEN L LU ET AL: "EVALUATION OF TWO-SUMMAND ADDERS IMPLEMENTED IN ECDL CMOS DIFFERENTIAL LOGIC" IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE INC. NEW YORK, US, vol. 26, no. 8, 1 août 1991 (1991-08-01), pages 1152-1160, XP000258583 ISSN: 0018-9200 * |
Also Published As
Publication number | Publication date |
---|---|
AU2002317928A8 (en) | 2003-05-19 |
AU2002317928A1 (en) | 2003-05-19 |
WO2003040911A3 (fr) | 2004-02-26 |
EP1417566A2 (fr) | 2004-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7908641B2 (en) | Modular exponentiation with randomized exponent | |
EP3188001B1 (fr) | Procédé et dispositif de multiplication modulaire | |
EP2296086B1 (fr) | Protection d'une génération de nombres premiers contre des attaques par canaux cachés | |
WO2013088065A1 (fr) | Procede de generation de nombres premiers prouves adapte aux cartes a puce | |
EP1368747B1 (fr) | Procede et dispositif pour reduire le temps de calcul d'un produit, d'une multiplication et d'une exponentiation modulaire selon la methode de montgomery | |
EP3115887B1 (fr) | Procédé, dispositif et support lisible par ordinateur non transitoire de calcul cryptographique | |
Parrilla et al. | Elliptic curve cryptography hardware accelerator for high-performance secure servers | |
EP3373509B1 (fr) | Procédé de signature électronique d'un document avec une clé secrète prédéterminée | |
Longa et al. | The cost to break SIKE: A comparative hardware-based analysis with AES and SHA-3 | |
Perin et al. | Montgomery modular multiplication on reconfigurable hardware: Systolic versus multiplexed implementation | |
Tahir | Design and Implementation of RSA Algorithm using FPGA | |
Issad et al. | Software/hardware co-design of modular exponentiation for efficient RSA cryptosystem | |
Nawari et al. | Fpga based implementation of elliptic curve cryptography | |
Paar et al. | The RSA cryptosystem | |
EP1804160B1 (fr) | Protection d'un calcul cryptographique effectué par un circuit intégré | |
EP1419434A1 (fr) | Procede securise de realisation d'une operation d'exponentiation modulaire | |
WO2003040911A2 (fr) | Carte d'acceleration de traitement cryptographique | |
Koppermann et al. | Automatic generation of high-performance modular multipliers for arbitrary mersenne primes on FPGAs | |
WO2003055134A1 (fr) | Procede cryptographique permettant de repartir la charge entre plusieurs entites et dispositifs pour mettre en oeuvre ce procede | |
FR2880149A1 (fr) | Procede de traitement de donnees et dispositif associe | |
KR20170113268A (ko) | 논-모듈러 승산기, 논-모듈러 승산 방법 및 계산 장치 | |
Judge et al. | A Hardware‐Accelerated ECDLP with High‐Performance Modular Multiplication | |
Aldaya et al. | Memory tampering attack on binary GCD based inversion algorithms | |
WO2004006497A1 (fr) | Procede et dispositifs cryptographiques permettant d'alleger les calculs au cours de transactions | |
JP5179933B2 (ja) | データ処理装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2002747535 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 2002747535 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |