CN107665109A - A kind of Montgomery modular multiplication computational methods suitable for embedded system - Google Patents

A kind of Montgomery modular multiplication computational methods suitable for embedded system Download PDF

Info

Publication number
CN107665109A
CN107665109A CN201610609265.0A CN201610609265A CN107665109A CN 107665109 A CN107665109 A CN 107665109A CN 201610609265 A CN201610609265 A CN 201610609265A CN 107665109 A CN107665109 A CN 107665109A
Authority
CN
China
Prior art keywords
montgomery
modular multiplication
calculate
embedded system
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610609265.0A
Other languages
Chinese (zh)
Other versions
CN107665109B (en
Inventor
曾学文
李杨
叶晓舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Intellix Technologies Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Intellix Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Intellix Technologies Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN201610609265.0A priority Critical patent/CN107665109B/en
Publication of CN107665109A publication Critical patent/CN107665109A/en
Application granted granted Critical
Publication of CN107665109B publication Critical patent/CN107665109B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/722Modular multiplication

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a kind of Montgomery modular multiplication computational methods suitable for embedded system, methods described includes:More precision multiplications and Montgomery about subtract;About subtract two parts for more precision multiplications and Montgomery to calculate by the way of mixed sweep, inner loop uses the mode that operand scans, and outer loop uses the mode that product scans;And more precision multiplications and Montgomery about subtract the mode integrated between two parts using coarseness, i.e. two parts interleaved computation.The Montgomery modular multiplications computational methods of the present invention can reduce memory access quantity in embedded system, improve Montgomery modular multiplication algorithms and realize efficiency.

Description

A kind of Montgomery modular multiplication computational methods suitable for embedded system
Technical field
The present invention relates to public-key cryptosystem field, specifically, it is related to a kind of suitable for embedded system Montgomery modular multiplication computational methods.
Background technology
1985, P.L.Montgomery proposed Montgomery modular multiplication algorithms, and it is present most widely used one Kind modular multiplication algorithm.Its basic thought is to replace time-consuming invert and divide operations with simple timesaving addition and shifting function.It is right In calculating modular multiplication P=A*B mod N, (A, B, N are the big integer of binary system and 0 of n-bit<A,B<N), the algorithm chooses one first It is individual (typically to take R=2 with coprime N integer Rn), mould N multiplying is converted into mould R multiplying.We define below Montgomery products:MonPro (A, B)=A*B*R-1mod N。
The calculating structure of Montgomery modular multiplication algorithms as shown in Figure 1, is specifically divided into following three steps:
(1) A, B N residue classes are calculated:A '=A*R mod N=A*R2*R-1Mod N=MonPro (A*R2),
B '=B*R mod N=B*R2*R-1Mod N=MonPro (B*R2);
(2) A ' and B ' Montgomery products P '=A ' * B ' * R are calculated-1Mod N=MonPro (A ', B ');
(3) P ' is converted into modular multiplication product P, P=A*B mod N=A ' * R-1*B’*R-1mod N
=P ' * R-1Mod N=MonPro (P ').
Therefore, the core of Montgomery modular multiplication algorithms is to calculate Montgomery products MonPro (A ', B '), and its is specific Algorithm flow describes as shown in Figure 2.As can be seen from Figure 2 Montgomery products, which calculate, can be divided into two committed steps: Calculate more precision multiplication T ← A*B and calculate Montgomery and about subtract P ← (T+M*N)/R.
Nineteen ninety Dusse proposes the Montgomery modular multiplication algorithms using r system numbers, and utilizes n0'=- n0 -1Mod r generations Algorithm is correspondingly improved for N '.Koc in 1996 is carried out to the implementation method of various Montgomery modular multiplication algorithms Analysis and summary, and summarize 5 kinds of main improvement Montgomery algorithms:SOS, CIOS, FIOS, FIPS and CIHS.After wherein Two alphabetical OS/PS/HS represent calculating multiplication scan mode, and OS represents that operand scans, PS expression product scannings, and HS expressions are mixed Close scanning;And alphabetical S/CI/FI above represents that more precision multiplications and Montgomery about subtract the integration mode that two parts use, S represents that the mode of separation calculates another part again after having calculated a part completely, and CI represents that coarseness integration mode is thick Granularity interleaved computation two parts, FI represent that fine granularity integration mode is fine granularity interleaved computation two parts.The realization of these algorithms Can be by sequence of operations:Multiplication mul, addition add, load load, storage store etc. and realize.So high performance realization Algorithm is concentrated mainly on optimization, and these are operated above.In embedded systems, due to the limited amount of available register, load/ The storages such as store operation is particularly important, and in currently used method, the register number used is fixed, and being generally divided into makes With 5 registers and n+4 (n is the size that mould N includes number of words) individual register two ways.It is fewer (5) using register Method need more memory access operation, and use the memory access operation that the more mode (n+4) of register needs compared with It is able to can not be used with register number because required register number exceedes processor less but usually.Therefore processing how is passed through Device can use register number and n size dynamically to use register so that by making full use of register to be deposited to reduce internal memory The quantity of extract operation, which turns into, to be currently needed for solving the problems, such as.
The content of the invention
It is an object of the invention to overcome current Montgomery modular multiplication algorithms not to be suitable for the less embedded system of resource A kind of the defects of system, in order to improve the computational efficiency of Montgomery modular multiplication algorithms, it is proposed that the integrated mixed sweep of coarseness The method that mode calculates Montgomery modular multiplications, this method make full use of the available register number choice of dynamical d of processor, lead to Cross sharing operation number when operand scans in block and reduce operand reading number, pass through when product scans between block per column count Exist after complete all products in 2d+1 register to reduce the access number of intermediate result, reduce memory access number on the whole Mesh, improve algorithm and realize efficiency.
To achieve these goals, the invention provides a kind of Montgomery modular multiplications calculating suitable for embedded system Method, methods described include:More precision multiplications and Montgomery about subtract;About subtract two for more precision multiplications and Montgomery Part is calculated by the way of mixed sweep, and inner loop uses the mode that operand scans, and outer loop use multiplies The mode of product scanning;And more precision multiplications and Montgomery about subtract the mode integrated using coarseness between two parts, i.e., two Part interleaved computation.
In above-mentioned technical proposal, methods described specifically includes:
It is m bit prime numbers that step 1), which sets big number N, and the word length of processor is W bit, then N number of words size isA, B are two N residue classes i.e. 0<A,B<N;Montgomery coefficients R=2nW, n0'=- n0 -1Mod r, n0For N Lowest order;D is selected, d is the number of words size of inner loop;Then the size of outer loop is It is more than or equal to take Its smallest positive integral computing;
Step 2) A and B modular multiplication calculating process is:C=A*B, M=C*n0' mod R, C=(C+M*N)/R;By operand A, B, M, N, C every d word are as an entirety:
E=(E [r-1] ..., E [0])=(A [n-1], A [n-2] ..., A [n-d] } ... { A [d-1], A [2], A [1], A [0]})
F=(F [r-1] ..., F [0])=(B [n-1], B [n-2] ..., B [n-d] } ... { B [d-1], B [2], B [1], B [0]})
G=(G [r-1] ..., G [0])=(M [n-1], M [n-2] ..., M [n-d] } ... { M [d-1], M [2], M [1], M [0]})
H=(H [r-1] ..., H [0])=(N [n-1], N [n-2] ..., N [n-d] } ... { N [d-1], N [2], N [1], N [0]})
Q=(Q [2r-1], Q [2r-3] ..., Q [1], Q [0])=({ C [2n-1], C [2n-2], C [2n-3], C [2n- d]},…,
{C[d-1],C[2],C[1],C[0]})
Q, all partial products of 0≤q≤2r-1 row are calculated successively:
E [k] * F [l]+G [k] * H [l]=(Q [q+1], Q [q]),
Wherein k+l=q;Completed until all columns calculate, obtain C;
Step 3) judges whether C >=N sets up, if set up, makes C=C-N;Step 4) is transferred to, otherwise, is transferred to step 4);
Step 4) exports A and B Montgomery result of product C.
In above-mentioned technical proposal, the step 2) specifically includes:
Step 2-1) make q=0;
Step 2-2) all k for meeting k+l=q, l set be designated as A:A=k, l | k+l=q };
Step 2-3) calculate (Q [q+1], Q [q])=∑AE[k]*F[l];
Wherein,
E [k] * F [l]=(A [kd+3], A [kd+2], A [kd+1], A [kd]) * (B [ld+3], B [ld+2], B [ld+1], B [ld]);
Step 2-4) judge q<Whether r sets up, if a determination be made that certainly, then calculating G [q]=Q [q] * n0’;Otherwise, Go to step 2-5);
Step 2-5) calculate (Q [q+1], Q [q])=(Q [q+1], Q [q])+∑sAG[k]*H[l];
Step 2-6) make q=q+1;If q is less than or equal to 2r-2, k=k+1, return to step 2-2 are made);Otherwise, it is transferred to step 2-7);
Step 2-7) q row C=C/R are calculated, due to R=2nW, so:
C=(C [2n-1], C [2n-2] ..., C [n+1], C [n]).
Compared with prior art, the technical advantages of the present invention are that:
Mixed sweep thought is applied in the calculating of Montgomery modular multiplications using the mode that coarseness integrates, passes through dynamic Choose d and rationally utilize operand, reduce memory access quantity in embedded system, improve Montgomery modular multiplication algorithms and realize effect Rate.
Brief description of the drawings
Fig. 1 is that existing Montgomery modular multiplications calculate structural representation;
Fig. 2 is the flow chart that existing Montgomery modular multiplications calculate Montgomery products;
Fig. 3 is the schematic diagram of the modular multiplication computational methods of the present invention;
Fig. 4 is that the coarseness of the present invention integrates the Montgomery modular multiplication methods of sum of products operand mixed sweep CIPOHS-a (n=8, d=4) structure chart;
Fig. 5 is that the coarseness of the present invention integrates the Montgomery modular multiplication methods of sum of products operand mixed sweep CIPOHS-b (n=8, d=3) structure chart;
The schematic diagram that piecemeal product scans in the method for Fig. 6 present invention.
Embodiment
The method of the present invention is further described in detail with specific embodiment below in conjunction with the accompanying drawings.
A kind of Montgomery modular multiplication computational methods suitable for embedded system, methods described include:
It is m bit prime numbers that step 1), which sets big number N, and the word length of processor is W bit, then N number of words size isA, B are two N residue classes i.e. 0<A,B<N;Montgomery coefficients R=2nW, n0'=- n0 -1Mod r, n0For N Lowest order;Select the number of words size that d, d are inner loop (being scanned using operand);Then outer loop (being scanned using product) Size be It is more than or equal to its smallest positive integral computing to take;
Step 2) calculates A and B modular multiplication result C, and calculating process is:
1) C=A*B;
2) M=C*n0’mod R;
3) C=(C+M*N)/R.;
As shown in figure 3, A, B are represented the multiprecision integer of 2 m bits, be:A=(A [n-1] ..., A [2], A [1], A [0]), B=(B [n-1] ..., B [2], B [1], B [0]).Then product C=AB can be expressed as:C=(C [2n-1] ..., C [2],C[1],C[0])。
Using operand A, B, M, N, C every d word as an entirety, in the present embodiment, n=8, d=4;Represent such as Under:
E=(E [1], E [0])=({ A [7], A [6], A [5], A [4] } { A [3], A [2], A [1], A [0] })
F=(F [1], F [0])=({ B [7], B [6], B [5], B [4] } { B [3], B [2], B [1], B [0] })
G=(G [1], G [0])=({ M [7], M [6], M [5], M [4] } { M [3], M [2], M [1], M [0] })
H=(H [1], H [0])=({ N [7], N [6], N [5], N [4] } { N [3], N [2], N [1], N [0] })
Q=(Q [3], Q [2], Q [1], Q [0])=({ C [15], C [14], C [13], C [12] } { C [11], C [10], C [9],C[8]}
{C[7],C[6],C[5],C[4]}{C[3],C[2],C[1],C[0]})
Calculating (Q [3], Q [2], Q [1], Q [0])=(E [1], E [0]) * (F [1], F can be converted into by then calculating C=A*B [0])
Calculate M=C*n0' mod R can be converted into calculating (G [1], G [0])=(Q [1], Q [0]) * n0
Calculating can be converted into by calculating C=C+M*N
(Q [3], Q [2], Q [1], Q [0])=(Q [3], Q [2], Q [1], Q [0])+(G [1], G [0]) * (H [1], H [0]))
Underneath with product scan mode interleaved computation C=A*B and C=C+M*N (M=C*n0’mod R).Calculate Q is arranged, all partial products of 0≤q≤2r-1 row:
After E [k] * F [l]+G [k] * H [l]=(Q [q+1], Q [q]) (wherein k+l=q), then next column is calculated, Zhi Daosuo There is columns to calculate to complete.
Algorithm structure describe as shown in Figure 4, each shaded block in figure in diamond structure 1., 2. wait and multiplication structure in Each big square frame represent product E [k] * F [l] or G [k] * H [l].With shaded block when calculating whole diamond structure For base unit, then its scale is r=n/d=2;Coarseness product scan mode is used when being calculated between block, is first calculated All E [k] * F [l] (i.e. block 1.) of 0th row, calculate all G [k] * H [l] (i.e. block 2.);All E [k] * of the 1st row is calculated again F [l] (i.e. block 3., 4.), calculate all G [k] * H [l] (i.e. block 5., 6.);Finally calculate all E [k] * F [l] of the 2nd row (i.e. Block is 7.), calculate all G [k] * H [l] (i.e. block 8.).
For calculating each shaded block in figure 1., 2. etc., that is, each E [k] * F [l] (or G [k] * H [l]) is calculated, By the way of operand scanning, scale is d=4 for it;Calculated according to capable mode, an operation is kept in often going Number B [i] is constant, with another operand A [j] (0≤j<D) all multiplications;Counted again after having calculated all products of this row Calculate next line.
In another embodiment, n=8 is worked as, during d=3, as shown in figure 5,1. whole multiplication is divided into many individual blocks, 2. etc., And between these blocks also still using coarseness integrate product scan mode i.e. press figure in numeric order 1.-Held OK, scale isDue to can not be divided evenly, the block being divided into not all be complete block, as in Fig. 4 1. block is one Individual complete d*d block, 7. block is one incomplete piece.We can scan product according to the integrality of block below All row are divided into three parts:Part I is the 1st to r-1 row, and all blocks are all whole blocks, and size is d*d;Part II is R to 2r-2 is arranged, and its top and nethermost piece are incomplete piece, and size is [d- (rd-n)] * d, and remaining piece is all Complete block size is d*d;Part III is 2r-1 row, and comprising two endless monoblocks, size is [d- (rd-n)] * [d- (rd- n)].The mode memory still scanned inside block using operand is calculated, and calculating incomplete diamond block is carried out along long side Operand scans.
The step 2) specifically includes:
Step 2-1) make q=0;
Step 2-2) all k for meeting k+l=q, l set be designated as A:A=k, l | k+l=q };
Step 2-3) calculate (Q [q+1], Q [q])=∑AE[k]*F[l];
Using operand scan mode:Calculated according to capable mode, keep an operand B [i] constant in often going, With another operand A [j] (0≤j<D) all multiplications;Next line is calculated again after having calculated all products of this row.Its In, calculate each E [k] * F [l] and G [k] * H [l]
As shown in fig. 6,
E [k] * F [l]=(A [kd+3], A [kd+2], A [kd+1], A [kd]) * (B [ld+3], B [ld+2], B [ld+1], B [ld])
Step 2-4) judge q<Whether r sets up, if a determination be made that certainly, then calculating G [q]=Q [q] * n0’;Otherwise, Go to step 2-5);
Step 2-5) calculate (Q [q+1], Q [q])=(Q [q+1], Q [q])+∑sAG[k]*H[l];
Step 2-6) make q=q+1;If q is less than or equal to 2r-1, k=k+1, return to step 2-2 are made);Otherwise, it is transferred to step 2-7);
Step 2-7) q row C=C/R are calculated, due to R=2W, so:
C=(C [15], C [14], C [13], C [12], C [11], C [10], C [9], C [8]);
Step 3) judges whether C >=N sets up, if set up, makes C=C-N;Step 4) is transferred to, otherwise, is transferred to step 4);
Step 4) exports A and B Montgomery result of product C.
Whether it is that the method for the invention is divided into two kinds of situations by integer below according to n/d:The first situation is that n/d is whole Number, i.e.,We term it CIPOHS-a;Second of situation is that n/d is not integer, i.e.,We Referred to as CIPOHS-b.The total amount of two methods memory access is analyzed.
1st, CIPOHS-a methods
As shown in figure 4, the memory access quantity of each piece of inside is analyzed first:Due to using d+1 register in every piece Storage operation number, wherein d register storage operation number A, it is left 1 register storage operation number B multiple-accuracy representings in turn Each word, so each operation only loads once in block, therefore load quantity is 2d in each piece;And every piece of calculating As a result it is stored directly in 2d+1 register, so every piece of storage intermediate result store quantity is 0.Outside lower surface analysis Circulation, because outer loop size is r=n/d therefore shares 2r2=2 (n/d)2Block, and the product scanning side integrated using coarseness Formula.2*2 is shared in this example2=8 pieces, 1. 2. 3. 4. 5. 6. 7. 8. the execution sequence of block performs according to the numeral marked in figure. Because every piece of load quantity is 2d, 2 (n/d) are shared2Block, therefore load total amount is 2d*2 (n/d)2=4n2/d;And entirely calculating That need to store in method is M (M ← C*n of n word0' mod r) and n+1 word final result C, therefore store total amount is n+ N+1=2n+1;So the total amount of memory access (load and store) is 4n2/d+2n+1。
2nd, CIPOHS-b methods
As shown in figure 5, Part I:Comprising preceding r-1 arrange, all blocks be all whole blocks such as block 1., 3., 4., for every Scanned in one piece using operand, therefore load quantity is 2d in every piece, one shares r* (r-1) block, therefore load total amounts are 2d*r* (r-1).Part II:Arranged comprising r to 2r-2, each column is topmost and nethermost piece is endless monoblock, and size is [d- (rd- N)] * d, scan mode is to carry out operand scanning along the direction that length is d in these blocks;As in Fig. 4 block 7., first by A [0], A [1], A [2] are loaded in a register, then first calculate B [6] and A [0], A [1], A [2] product, then calculate B [7] and A [0], A [1], A [2] product;Therefore each endless monoblock load quantity is 2d- (rd-n), shares 4 (r-1) blocks.Part II Remaining piece is all whole blocks, is scanned using normal operand, and every piece of load quantity is 2d, shares (r-2) * (r-1) block.Institute It is 4 (r-1) * [2d- (rd-n)]+2d* (r-2) * (r-1) with this part load total amounts.Part III:Only arranged comprising 2r-1, Only comprising the endless monoblock that 2 sizes are [d- (rd-n)] * [d- (rd-n)], such as blockIt is according to length in these blocks The operand scan mode of [d- (rd-n)] is scanned, therefore load quantity is 4 [d- (rd-n)].Table 1 summarizes this three Divide load quantity, it can be seen that load total amount is 4rn, and it is 2n+1 that store quantity is identical with CIPOHS-a, within institute The total amount of access is 4rn+2n+1.
Table 1
The total amount of memory access can be uniformly designated as by comprehensive CIPOHS-a and CIPOHS-bBelow I Analyze Koc proposition several algorithms and it is proposed that CIPOHS algorithms using register number and the number of memory access Amount, as shown in table 2.
Table 2
From Table 2, it can be seen that in several existing algorithms compared in table, the memory access minimum number of CIOS algorithms needs Want 2n2+ 3n+1, but the register number of its needs is more to need n+4.When n value is bigger, the number of register can be used Amount is less than n+4, therefore can not reuse this algorithm.And the register number that CIOS-5reg and FIPS algorithms are used is smaller, only 5 are needed, but the access amount of its internal memory is bigger.CIPOHS algorithms proposed by the present invention solve this problem, and it passes through The selection d of the Number dynamics of register can be used, the quantity of register can be used by rationally utilization, to reduce the number of memory access Amount.CIPOHS memory access quantity isCIOS memory access quantity is 2n2+ 3n+1, when d is taken more than 1 Integer when, CIPOHS memory access quantity is less than CIOS, and d value is bigger, CIPOHS need memory access quantity get over It is small.And the multiplying order and addition instruction grade number used for several algorithms is essentially identical, the memory access of CIPOHS algorithms Quantity is minimum, so the operation efficiency highest of algorithm.
In summary, a kind of Montgomery modular multiplications computational methods suitable for embedded system of the invention use coarse grain The integrated more precision multiplications of mode interleaved computation of degree and Montgomery about subtract two parts, are operated in two parts using sum of products The mode of number mixed sweep.By can use the quantity of register to select d, making full use of register number and being deposited to reduce algorithm internal memory The quantity taken, further improve the operation efficiency of algorithm.
The embodiment of the present invention is the foregoing is only, is not intended to limit the scope of the present invention, this area It will be appreciated by the skilled person that on the premise of inventive principle is not departed from, technical scheme is modified or waited With replacing, without departure from the spirit and scope of technical solution of the present invention, it all should cover in protection scope of the present invention.

Claims (3)

1. a kind of Montgomery modular multiplication computational methods suitable for embedded system, methods described includes:More precision multiplications and Montgomery about subtracts;About subtract two parts for more precision multiplications and Montgomery to count by the way of mixed sweep Calculate, inner loop uses the mode that operand scans, and outer loop uses the mode that product scans;And more precision multiplications and Montgomery about subtracts the mode integrated between two parts using coarseness, i.e. two parts interleaved computation.
2. the Montgomery modular multiplication computational methods according to claim 1 suitable for embedded system, it is characterised in that Methods described specifically includes:
It is m bit prime numbers that step 1), which sets big number N, and the word length of processor is W bit, then N number of words size isA,B It is two N residue classes i.e. 0<A,B<N;Montgomery coefficients R=2nW, n0'=- n0 -1Mod r, n0For N lowest order;Selection D, d are the number of words sizes of inner loop;Then the size of outer loop is It is more than or equal to its smallest positive integral to take Computing;
Step 2) A and B modular multiplication calculating process is:C=A*B, M=C*n0' mod R, C=(C+M*N)/R;By operand A, B, M, N, C every d word are as an entirety:
E=(E [r-1] ..., E [0])=(A [n-1], A [n-2] ..., A [n-d] } ... { A [d-1], A [2], A [1], A [0]})
F=(F [r-1] ..., F [0])=(B [n-1], B [n-2] ..., B [n-d] } ... { B [d-1], B [2], B [1], B [0]})
G=(G [r-1] ..., G [0])=(M [n-1], M [n-2] ..., M [n-d] } ... { M [d-1], M [2], M [1], M [0]})
H=(H [r-1] ..., H [0])=(N [n-1], N [n-2] ..., N [n-d] } ... { N [d-1], N [2], N [1], N [0]})
Q=(Q [2r-1], Q [2r-3] ..., Q [1], Q [0])=({ C [2n-1], C [2n-2], C [2n-3], C [2n-d] } ...,
{C[d-1],C[2],C[1],C[0]})
Q, all partial products of 0≤q≤2r-1 row are calculated successively:
E [k] * F [l]+G [k] * H [l]=(Q [q+1], Q [q]),
Wherein k+l=q;Completed until all columns calculate, obtain C;
Step 3) judges whether C >=N sets up, if set up, makes C=C-N;Step 4) is transferred to, otherwise, is transferred to step 4);
Step 4) exports A and B Montgomery result of product C.
3. the Montgomery modular multiplication computational methods according to claim 2 suitable for embedded system, it is characterised in that The step 2) specifically includes:
Step 2-1) make q=0;
Step 2-2) all k for meeting k+l=q, l set be designated as A:A=k, l | k+l=q };
Step 2-3) calculate (Q [q+1], Q [q])=∑AE[k]*F[l];
Wherein,
E [k] * F [l]=(A [kd+3], A [kd+2], A [kd+1], A [kd]) * (B [ld+3], B [ld+2], B [ld+1], B [ld]);
Step 2-4) judge q<Whether r sets up, if a determination be made that certainly, then calculating G [q]=Q [q] * n0’;Otherwise, go to Step 2-5);
Step 2-5) calculate (Q [q+1], Q [q])=(Q [q+1], Q [q])+∑sAG[k]*H[l];
Step 2-6) make q=q+1;If q is less than or equal to 2r-2, k=k+1, return to step 2-2 are made);Otherwise, it is transferred to step 2-7);
Step 2-7) q row C=C/R are calculated, due to R=2nW, so:
C=(C [2n-1], C [2n-2] ..., C [n+1], C [n]).
CN201610609265.0A 2016-07-28 2016-07-28 Montgomery modular multiplication calculation method suitable for embedded system Active CN107665109B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610609265.0A CN107665109B (en) 2016-07-28 2016-07-28 Montgomery modular multiplication calculation method suitable for embedded system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610609265.0A CN107665109B (en) 2016-07-28 2016-07-28 Montgomery modular multiplication calculation method suitable for embedded system

Publications (2)

Publication Number Publication Date
CN107665109A true CN107665109A (en) 2018-02-06
CN107665109B CN107665109B (en) 2020-04-14

Family

ID=61115623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610609265.0A Active CN107665109B (en) 2016-07-28 2016-07-28 Montgomery modular multiplication calculation method suitable for embedded system

Country Status (1)

Country Link
CN (1) CN107665109B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1152746A (en) * 1996-09-20 1997-06-25 张胤微 High speed modular multiplication method and device
CN101834723A (en) * 2009-03-10 2010-09-15 上海爱信诺航芯电子科技有限公司 RSA (Rivest-Shamirh-Adleman) algorithm and IP core
CN102207847A (en) * 2011-05-06 2011-10-05 广州杰赛科技股份有限公司 Data encryption and decryption processing method and device based on Montgomery modular multiplication operation
CN102707924A (en) * 2012-05-02 2012-10-03 广州中大微电子有限公司 RSA coprocessor for RFID (radio frequency identification device) intelligent card chip
US8417756B2 (en) * 2007-11-29 2013-04-09 Samsung Electronics Co., Ltd. Method and apparatus for efficient modulo multiplication
CN103914277A (en) * 2014-04-14 2014-07-09 复旦大学 Extensible modular multiplier circuit based on improved Montgomery modular multiplication algorithm

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1152746A (en) * 1996-09-20 1997-06-25 张胤微 High speed modular multiplication method and device
CN1085862C (en) * 1996-09-20 2002-05-29 张胤微 High speed modular multiplication method and device
US8417756B2 (en) * 2007-11-29 2013-04-09 Samsung Electronics Co., Ltd. Method and apparatus for efficient modulo multiplication
CN101834723A (en) * 2009-03-10 2010-09-15 上海爱信诺航芯电子科技有限公司 RSA (Rivest-Shamirh-Adleman) algorithm and IP core
CN102207847A (en) * 2011-05-06 2011-10-05 广州杰赛科技股份有限公司 Data encryption and decryption processing method and device based on Montgomery modular multiplication operation
CN102707924A (en) * 2012-05-02 2012-10-03 广州中大微电子有限公司 RSA coprocessor for RFID (radio frequency identification device) intelligent card chip
CN103914277A (en) * 2014-04-14 2014-07-09 复旦大学 Extensible modular multiplier circuit based on improved Montgomery modular multiplication algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘哲: "8比特AVR微控制器上高效及抗侧信道攻击的RSA算法的实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN107665109B (en) 2020-04-14

Similar Documents

Publication Publication Date Title
US20230186065A1 (en) Accelerator for deep neural networks
US11144819B2 (en) Convolutional neural network hardware configuration
Zachariadis et al. Accelerating sparse matrix–matrix multiplication with GPU Tensor Cores
US10572409B1 (en) Sparse matrix processing circuitry
US8065337B2 (en) Shared-memory multiprocessor system and method for processing information
JP5304251B2 (en) Parallel sort apparatus, method, and program
CN112988655A (en) System and method for loading weights into tensor processing blocks
Hu et al. A memory-efficient high-throughput architecture for lifting-based multi-level 2-D DWT
EP3295300B1 (en) System and method for determining concurrency factors for dispatch size of parallel processor kernels
CN110851779A (en) Systolic array architecture for sparse matrix operations
CN104615584B (en) The method for solving vectorization calculating towards GPDSP extensive triangular linear equation group
CN110858137A (en) Floating point division by integer constant
CN115348002A (en) Montgomery modular multiplication fast calculation method based on multi-word long multiplication instruction
CN107665109A (en) A kind of Montgomery modular multiplication computational methods suitable for embedded system
CN116888591A (en) Matrix multiplier, matrix calculation method and related equipment
US7136893B2 (en) Decimal multiplication using digit recoding
US11194490B1 (en) Data formatter for convolution
Liu et al. HiKonv: High throughput quantized convolution with novel bit-wise management and computation
US11429850B2 (en) Performing consecutive mac operations on a set of data using different kernels in a MAC circuit
US20140059106A1 (en) Arithmetic circuit for performing division based on restoring division
CN115756389A (en) Floating-point multiply-add device based on FPGA and calculation method
US10824434B1 (en) Dynamically structured single instruction, multiple data (SIMD) instructions
US20100030836A1 (en) Adder, Synthesis Device Thereof, Synthesis Method, Synthesis Program, and Synthesis Program Storage Medium
Rakanovic et al. Argus CNN accelerator based on kernel clustering and resource-aware pruning
Jiang et al. Output-Directed Dynamic Quantization for DNN Acceleration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant