CN107665109A - A kind of Montgomery modular multiplication computational methods suitable for embedded system - Google Patents
A kind of Montgomery modular multiplication computational methods suitable for embedded system Download PDFInfo
- Publication number
- CN107665109A CN107665109A CN201610609265.0A CN201610609265A CN107665109A CN 107665109 A CN107665109 A CN 107665109A CN 201610609265 A CN201610609265 A CN 201610609265A CN 107665109 A CN107665109 A CN 107665109A
- Authority
- CN
- China
- Prior art keywords
- montgomery
- modular multiplication
- calculate
- embedded system
- mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
- G06F7/722—Modular multiplication
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a kind of Montgomery modular multiplication computational methods suitable for embedded system, methods described includes:More precision multiplications and Montgomery about subtract;About subtract two parts for more precision multiplications and Montgomery to calculate by the way of mixed sweep, inner loop uses the mode that operand scans, and outer loop uses the mode that product scans;And more precision multiplications and Montgomery about subtract the mode integrated between two parts using coarseness, i.e. two parts interleaved computation.The Montgomery modular multiplications computational methods of the present invention can reduce memory access quantity in embedded system, improve Montgomery modular multiplication algorithms and realize efficiency.
Description
Technical field
The present invention relates to public-key cryptosystem field, specifically, it is related to a kind of suitable for embedded system
Montgomery modular multiplication computational methods.
Background technology
1985, P.L.Montgomery proposed Montgomery modular multiplication algorithms, and it is present most widely used one
Kind modular multiplication algorithm.Its basic thought is to replace time-consuming invert and divide operations with simple timesaving addition and shifting function.It is right
In calculating modular multiplication P=A*B mod N, (A, B, N are the big integer of binary system and 0 of n-bit<A,B<N), the algorithm chooses one first
It is individual (typically to take R=2 with coprime N integer Rn), mould N multiplying is converted into mould R multiplying.We define below
Montgomery products:MonPro (A, B)=A*B*R-1mod N。
The calculating structure of Montgomery modular multiplication algorithms as shown in Figure 1, is specifically divided into following three steps:
(1) A, B N residue classes are calculated:A '=A*R mod N=A*R2*R-1Mod N=MonPro (A*R2),
B '=B*R mod N=B*R2*R-1Mod N=MonPro (B*R2);
(2) A ' and B ' Montgomery products P '=A ' * B ' * R are calculated-1Mod N=MonPro (A ', B ');
(3) P ' is converted into modular multiplication product P, P=A*B mod N=A ' * R-1*B’*R-1mod N
=P ' * R-1Mod N=MonPro (P ').
Therefore, the core of Montgomery modular multiplication algorithms is to calculate Montgomery products MonPro (A ', B '), and its is specific
Algorithm flow describes as shown in Figure 2.As can be seen from Figure 2 Montgomery products, which calculate, can be divided into two committed steps:
Calculate more precision multiplication T ← A*B and calculate Montgomery and about subtract P ← (T+M*N)/R.
Nineteen ninety Dusse proposes the Montgomery modular multiplication algorithms using r system numbers, and utilizes n0'=- n0 -1Mod r generations
Algorithm is correspondingly improved for N '.Koc in 1996 is carried out to the implementation method of various Montgomery modular multiplication algorithms
Analysis and summary, and summarize 5 kinds of main improvement Montgomery algorithms:SOS, CIOS, FIOS, FIPS and CIHS.After wherein
Two alphabetical OS/PS/HS represent calculating multiplication scan mode, and OS represents that operand scans, PS expression product scannings, and HS expressions are mixed
Close scanning;And alphabetical S/CI/FI above represents that more precision multiplications and Montgomery about subtract the integration mode that two parts use,
S represents that the mode of separation calculates another part again after having calculated a part completely, and CI represents that coarseness integration mode is thick
Granularity interleaved computation two parts, FI represent that fine granularity integration mode is fine granularity interleaved computation two parts.The realization of these algorithms
Can be by sequence of operations:Multiplication mul, addition add, load load, storage store etc. and realize.So high performance realization
Algorithm is concentrated mainly on optimization, and these are operated above.In embedded systems, due to the limited amount of available register, load/
The storages such as store operation is particularly important, and in currently used method, the register number used is fixed, and being generally divided into makes
With 5 registers and n+4 (n is the size that mould N includes number of words) individual register two ways.It is fewer (5) using register
Method need more memory access operation, and use the memory access operation that the more mode (n+4) of register needs compared with
It is able to can not be used with register number because required register number exceedes processor less but usually.Therefore processing how is passed through
Device can use register number and n size dynamically to use register so that by making full use of register to be deposited to reduce internal memory
The quantity of extract operation, which turns into, to be currently needed for solving the problems, such as.
The content of the invention
It is an object of the invention to overcome current Montgomery modular multiplication algorithms not to be suitable for the less embedded system of resource
A kind of the defects of system, in order to improve the computational efficiency of Montgomery modular multiplication algorithms, it is proposed that the integrated mixed sweep of coarseness
The method that mode calculates Montgomery modular multiplications, this method make full use of the available register number choice of dynamical d of processor, lead to
Cross sharing operation number when operand scans in block and reduce operand reading number, pass through when product scans between block per column count
Exist after complete all products in 2d+1 register to reduce the access number of intermediate result, reduce memory access number on the whole
Mesh, improve algorithm and realize efficiency.
To achieve these goals, the invention provides a kind of Montgomery modular multiplications calculating suitable for embedded system
Method, methods described include:More precision multiplications and Montgomery about subtract;About subtract two for more precision multiplications and Montgomery
Part is calculated by the way of mixed sweep, and inner loop uses the mode that operand scans, and outer loop use multiplies
The mode of product scanning;And more precision multiplications and Montgomery about subtract the mode integrated using coarseness between two parts, i.e., two
Part interleaved computation.
In above-mentioned technical proposal, methods described specifically includes:
It is m bit prime numbers that step 1), which sets big number N, and the word length of processor is W bit, then N number of words size isA, B are two N residue classes i.e. 0<A,B<N;Montgomery coefficients R=2nW, n0'=- n0 -1Mod r, n0For N
Lowest order;D is selected, d is the number of words size of inner loop;Then the size of outer loop is It is more than or equal to take
Its smallest positive integral computing;
Step 2) A and B modular multiplication calculating process is:C=A*B, M=C*n0' mod R, C=(C+M*N)/R;By operand
A, B, M, N, C every d word are as an entirety:
E=(E [r-1] ..., E [0])=(A [n-1], A [n-2] ..., A [n-d] } ... { A [d-1], A [2], A [1], A
[0]})
F=(F [r-1] ..., F [0])=(B [n-1], B [n-2] ..., B [n-d] } ... { B [d-1], B [2], B [1], B
[0]})
G=(G [r-1] ..., G [0])=(M [n-1], M [n-2] ..., M [n-d] } ... { M [d-1], M [2], M [1], M
[0]})
H=(H [r-1] ..., H [0])=(N [n-1], N [n-2] ..., N [n-d] } ... { N [d-1], N [2], N [1], N
[0]})
Q=(Q [2r-1], Q [2r-3] ..., Q [1], Q [0])=({ C [2n-1], C [2n-2], C [2n-3], C [2n-
d]},…,
{C[d-1],C[2],C[1],C[0]})
Q, all partial products of 0≤q≤2r-1 row are calculated successively:
E [k] * F [l]+G [k] * H [l]=(Q [q+1], Q [q]),
Wherein k+l=q;Completed until all columns calculate, obtain C;
Step 3) judges whether C >=N sets up, if set up, makes C=C-N;Step 4) is transferred to, otherwise, is transferred to step 4);
Step 4) exports A and B Montgomery result of product C.
In above-mentioned technical proposal, the step 2) specifically includes:
Step 2-1) make q=0;
Step 2-2) all k for meeting k+l=q, l set be designated as A:A=k, l | k+l=q };
Step 2-3) calculate (Q [q+1], Q [q])=∑AE[k]*F[l];
Wherein,
E [k] * F [l]=(A [kd+3], A [kd+2], A [kd+1], A [kd]) * (B [ld+3], B [ld+2], B [ld+1], B
[ld]);
Step 2-4) judge q<Whether r sets up, if a determination be made that certainly, then calculating G [q]=Q [q] * n0’;Otherwise,
Go to step 2-5);
Step 2-5) calculate (Q [q+1], Q [q])=(Q [q+1], Q [q])+∑sAG[k]*H[l];
Step 2-6) make q=q+1;If q is less than or equal to 2r-2, k=k+1, return to step 2-2 are made);Otherwise, it is transferred to step
2-7);
Step 2-7) q row C=C/R are calculated, due to R=2nW, so:
C=(C [2n-1], C [2n-2] ..., C [n+1], C [n]).
Compared with prior art, the technical advantages of the present invention are that:
Mixed sweep thought is applied in the calculating of Montgomery modular multiplications using the mode that coarseness integrates, passes through dynamic
Choose d and rationally utilize operand, reduce memory access quantity in embedded system, improve Montgomery modular multiplication algorithms and realize effect
Rate.
Brief description of the drawings
Fig. 1 is that existing Montgomery modular multiplications calculate structural representation;
Fig. 2 is the flow chart that existing Montgomery modular multiplications calculate Montgomery products;
Fig. 3 is the schematic diagram of the modular multiplication computational methods of the present invention;
Fig. 4 is that the coarseness of the present invention integrates the Montgomery modular multiplication methods of sum of products operand mixed sweep
CIPOHS-a (n=8, d=4) structure chart;
Fig. 5 is that the coarseness of the present invention integrates the Montgomery modular multiplication methods of sum of products operand mixed sweep
CIPOHS-b (n=8, d=3) structure chart;
The schematic diagram that piecemeal product scans in the method for Fig. 6 present invention.
Embodiment
The method of the present invention is further described in detail with specific embodiment below in conjunction with the accompanying drawings.
A kind of Montgomery modular multiplication computational methods suitable for embedded system, methods described include:
It is m bit prime numbers that step 1), which sets big number N, and the word length of processor is W bit, then N number of words size isA, B are two N residue classes i.e. 0<A,B<N;Montgomery coefficients R=2nW, n0'=- n0 -1Mod r, n0For N
Lowest order;Select the number of words size that d, d are inner loop (being scanned using operand);Then outer loop (being scanned using product)
Size be It is more than or equal to its smallest positive integral computing to take;
Step 2) calculates A and B modular multiplication result C, and calculating process is:
1) C=A*B;
2) M=C*n0’mod R;
3) C=(C+M*N)/R.;
As shown in figure 3, A, B are represented the multiprecision integer of 2 m bits, be:A=(A [n-1] ..., A [2], A [1], A
[0]), B=(B [n-1] ..., B [2], B [1], B [0]).Then product C=AB can be expressed as:C=(C [2n-1] ..., C
[2],C[1],C[0])。
Using operand A, B, M, N, C every d word as an entirety, in the present embodiment, n=8, d=4;Represent such as
Under:
E=(E [1], E [0])=({ A [7], A [6], A [5], A [4] } { A [3], A [2], A [1], A [0] })
F=(F [1], F [0])=({ B [7], B [6], B [5], B [4] } { B [3], B [2], B [1], B [0] })
G=(G [1], G [0])=({ M [7], M [6], M [5], M [4] } { M [3], M [2], M [1], M [0] })
H=(H [1], H [0])=({ N [7], N [6], N [5], N [4] } { N [3], N [2], N [1], N [0] })
Q=(Q [3], Q [2], Q [1], Q [0])=({ C [15], C [14], C [13], C [12] } { C [11], C [10], C
[9],C[8]}
{C[7],C[6],C[5],C[4]}{C[3],C[2],C[1],C[0]})
Calculating (Q [3], Q [2], Q [1], Q [0])=(E [1], E [0]) * (F [1], F can be converted into by then calculating C=A*B
[0])
Calculate M=C*n0' mod R can be converted into calculating (G [1], G [0])=(Q [1], Q [0]) * n0’
Calculating can be converted into by calculating C=C+M*N
(Q [3], Q [2], Q [1], Q [0])=(Q [3], Q [2], Q [1], Q [0])+(G [1], G [0]) * (H [1], H [0]))
Underneath with product scan mode interleaved computation C=A*B and C=C+M*N (M=C*n0’mod R).Calculate
Q is arranged, all partial products of 0≤q≤2r-1 row:
After E [k] * F [l]+G [k] * H [l]=(Q [q+1], Q [q]) (wherein k+l=q), then next column is calculated, Zhi Daosuo
There is columns to calculate to complete.
Algorithm structure describe as shown in Figure 4, each shaded block in figure in diamond structure 1., 2. wait and multiplication structure in
Each big square frame represent product E [k] * F [l] or G [k] * H [l].With shaded block when calculating whole diamond structure
For base unit, then its scale is r=n/d=2;Coarseness product scan mode is used when being calculated between block, is first calculated
All E [k] * F [l] (i.e. block 1.) of 0th row, calculate all G [k] * H [l] (i.e. block 2.);All E [k] * of the 1st row is calculated again
F [l] (i.e. block 3., 4.), calculate all G [k] * H [l] (i.e. block 5., 6.);Finally calculate all E [k] * F [l] of the 2nd row (i.e.
Block is 7.), calculate all G [k] * H [l] (i.e. block 8.).
For calculating each shaded block in figure 1., 2. etc., that is, each E [k] * F [l] (or G [k] * H [l]) is calculated,
By the way of operand scanning, scale is d=4 for it;Calculated according to capable mode, an operation is kept in often going
Number B [i] is constant, with another operand A [j] (0≤j<D) all multiplications;Counted again after having calculated all products of this row
Calculate next line.
In another embodiment, n=8 is worked as, during d=3, as shown in figure 5,1. whole multiplication is divided into many individual blocks, 2. etc.,
And between these blocks also still using coarseness integrate product scan mode i.e. press figure in numeric order 1.-Held
OK, scale isDue to can not be divided evenly, the block being divided into not all be complete block, as in Fig. 4 1. block is one
Individual complete d*d block, 7. block is one incomplete piece.We can scan product according to the integrality of block below
All row are divided into three parts:Part I is the 1st to r-1 row, and all blocks are all whole blocks, and size is d*d;Part II is
R to 2r-2 is arranged, and its top and nethermost piece are incomplete piece, and size is [d- (rd-n)] * d, and remaining piece is all
Complete block size is d*d;Part III is 2r-1 row, and comprising two endless monoblocks, size is [d- (rd-n)] * [d- (rd-
n)].The mode memory still scanned inside block using operand is calculated, and calculating incomplete diamond block is carried out along long side
Operand scans.
The step 2) specifically includes:
Step 2-1) make q=0;
Step 2-2) all k for meeting k+l=q, l set be designated as A:A=k, l | k+l=q };
Step 2-3) calculate (Q [q+1], Q [q])=∑AE[k]*F[l];
Using operand scan mode:Calculated according to capable mode, keep an operand B [i] constant in often going,
With another operand A [j] (0≤j<D) all multiplications;Next line is calculated again after having calculated all products of this row.Its
In, calculate each E [k] * F [l] and G [k] * H [l]
As shown in fig. 6,
E [k] * F [l]=(A [kd+3], A [kd+2], A [kd+1], A [kd]) * (B [ld+3], B [ld+2], B [ld+1], B
[ld])
Step 2-4) judge q<Whether r sets up, if a determination be made that certainly, then calculating G [q]=Q [q] * n0’;Otherwise,
Go to step 2-5);
Step 2-5) calculate (Q [q+1], Q [q])=(Q [q+1], Q [q])+∑sAG[k]*H[l];
Step 2-6) make q=q+1;If q is less than or equal to 2r-1, k=k+1, return to step 2-2 are made);Otherwise, it is transferred to step
2-7);
Step 2-7) q row C=C/R are calculated, due to R=2W, so:
C=(C [15], C [14], C [13], C [12], C [11], C [10], C [9], C [8]);
Step 3) judges whether C >=N sets up, if set up, makes C=C-N;Step 4) is transferred to, otherwise, is transferred to step 4);
Step 4) exports A and B Montgomery result of product C.
Whether it is that the method for the invention is divided into two kinds of situations by integer below according to n/d:The first situation is that n/d is whole
Number, i.e.,We term it CIPOHS-a;Second of situation is that n/d is not integer, i.e.,We
Referred to as CIPOHS-b.The total amount of two methods memory access is analyzed.
1st, CIPOHS-a methods
As shown in figure 4, the memory access quantity of each piece of inside is analyzed first:Due to using d+1 register in every piece
Storage operation number, wherein d register storage operation number A, it is left 1 register storage operation number B multiple-accuracy representings in turn
Each word, so each operation only loads once in block, therefore load quantity is 2d in each piece;And every piece of calculating
As a result it is stored directly in 2d+1 register, so every piece of storage intermediate result store quantity is 0.Outside lower surface analysis
Circulation, because outer loop size is r=n/d therefore shares 2r2=2 (n/d)2Block, and the product scanning side integrated using coarseness
Formula.2*2 is shared in this example2=8 pieces, 1. 2. 3. 4. 5. 6. 7. 8. the execution sequence of block performs according to the numeral marked in figure.
Because every piece of load quantity is 2d, 2 (n/d) are shared2Block, therefore load total amount is 2d*2 (n/d)2=4n2/d;And entirely calculating
That need to store in method is M (M ← C*n of n word0' mod r) and n+1 word final result C, therefore store total amount is n+
N+1=2n+1;So the total amount of memory access (load and store) is 4n2/d+2n+1。
2nd, CIPOHS-b methods
As shown in figure 5, Part I:Comprising preceding r-1 arrange, all blocks be all whole blocks such as block 1., 3., 4., for every
Scanned in one piece using operand, therefore load quantity is 2d in every piece, one shares r* (r-1) block, therefore load total amounts are 2d*r*
(r-1).Part II:Arranged comprising r to 2r-2, each column is topmost and nethermost piece is endless monoblock, and size is [d- (rd-
N)] * d, scan mode is to carry out operand scanning along the direction that length is d in these blocks;As in Fig. 4 block 7., first by A
[0], A [1], A [2] are loaded in a register, then first calculate B [6] and A [0], A [1], A [2] product, then calculate B [7] and
A [0], A [1], A [2] product;Therefore each endless monoblock load quantity is 2d- (rd-n), shares 4 (r-1) blocks.Part II
Remaining piece is all whole blocks, is scanned using normal operand, and every piece of load quantity is 2d, shares (r-2) * (r-1) block.Institute
It is 4 (r-1) * [2d- (rd-n)]+2d* (r-2) * (r-1) with this part load total amounts.Part III:Only arranged comprising 2r-1,
Only comprising the endless monoblock that 2 sizes are [d- (rd-n)] * [d- (rd-n)], such as blockIt is according to length in these blocks
The operand scan mode of [d- (rd-n)] is scanned, therefore load quantity is 4 [d- (rd-n)].Table 1 summarizes this three
Divide load quantity, it can be seen that load total amount is 4rn, and it is 2n+1 that store quantity is identical with CIPOHS-a, within institute
The total amount of access is 4rn+2n+1.
Table 1
The total amount of memory access can be uniformly designated as by comprehensive CIPOHS-a and CIPOHS-bBelow I
Analyze Koc proposition several algorithms and it is proposed that CIPOHS algorithms using register number and the number of memory access
Amount, as shown in table 2.
Table 2
From Table 2, it can be seen that in several existing algorithms compared in table, the memory access minimum number of CIOS algorithms needs
Want 2n2+ 3n+1, but the register number of its needs is more to need n+4.When n value is bigger, the number of register can be used
Amount is less than n+4, therefore can not reuse this algorithm.And the register number that CIOS-5reg and FIPS algorithms are used is smaller, only
5 are needed, but the access amount of its internal memory is bigger.CIPOHS algorithms proposed by the present invention solve this problem, and it passes through
The selection d of the Number dynamics of register can be used, the quantity of register can be used by rationally utilization, to reduce the number of memory access
Amount.CIPOHS memory access quantity isCIOS memory access quantity is 2n2+ 3n+1, when d is taken more than 1
Integer when, CIPOHS memory access quantity is less than CIOS, and d value is bigger, CIPOHS need memory access quantity get over
It is small.And the multiplying order and addition instruction grade number used for several algorithms is essentially identical, the memory access of CIPOHS algorithms
Quantity is minimum, so the operation efficiency highest of algorithm.
In summary, a kind of Montgomery modular multiplications computational methods suitable for embedded system of the invention use coarse grain
The integrated more precision multiplications of mode interleaved computation of degree and Montgomery about subtract two parts, are operated in two parts using sum of products
The mode of number mixed sweep.By can use the quantity of register to select d, making full use of register number and being deposited to reduce algorithm internal memory
The quantity taken, further improve the operation efficiency of algorithm.
The embodiment of the present invention is the foregoing is only, is not intended to limit the scope of the present invention, this area
It will be appreciated by the skilled person that on the premise of inventive principle is not departed from, technical scheme is modified or waited
With replacing, without departure from the spirit and scope of technical solution of the present invention, it all should cover in protection scope of the present invention.
Claims (3)
1. a kind of Montgomery modular multiplication computational methods suitable for embedded system, methods described includes:More precision multiplications and
Montgomery about subtracts;About subtract two parts for more precision multiplications and Montgomery to count by the way of mixed sweep
Calculate, inner loop uses the mode that operand scans, and outer loop uses the mode that product scans;And more precision multiplications and
Montgomery about subtracts the mode integrated between two parts using coarseness, i.e. two parts interleaved computation.
2. the Montgomery modular multiplication computational methods according to claim 1 suitable for embedded system, it is characterised in that
Methods described specifically includes:
It is m bit prime numbers that step 1), which sets big number N, and the word length of processor is W bit, then N number of words size isA,B
It is two N residue classes i.e. 0<A,B<N;Montgomery coefficients R=2nW, n0'=- n0 -1Mod r, n0For N lowest order;Selection
D, d are the number of words sizes of inner loop;Then the size of outer loop is It is more than or equal to its smallest positive integral to take
Computing;
Step 2) A and B modular multiplication calculating process is:C=A*B, M=C*n0' mod R, C=(C+M*N)/R;By operand A, B,
M, N, C every d word are as an entirety:
E=(E [r-1] ..., E [0])=(A [n-1], A [n-2] ..., A [n-d] } ... { A [d-1], A [2], A [1], A
[0]})
F=(F [r-1] ..., F [0])=(B [n-1], B [n-2] ..., B [n-d] } ... { B [d-1], B [2], B [1], B
[0]})
G=(G [r-1] ..., G [0])=(M [n-1], M [n-2] ..., M [n-d] } ... { M [d-1], M [2], M [1], M
[0]})
H=(H [r-1] ..., H [0])=(N [n-1], N [n-2] ..., N [n-d] } ... { N [d-1], N [2], N [1], N
[0]})
Q=(Q [2r-1], Q [2r-3] ..., Q [1], Q [0])=({ C [2n-1], C [2n-2], C [2n-3], C [2n-d] } ...,
{C[d-1],C[2],C[1],C[0]})
Q, all partial products of 0≤q≤2r-1 row are calculated successively:
E [k] * F [l]+G [k] * H [l]=(Q [q+1], Q [q]),
Wherein k+l=q;Completed until all columns calculate, obtain C;
Step 3) judges whether C >=N sets up, if set up, makes C=C-N;Step 4) is transferred to, otherwise, is transferred to step 4);
Step 4) exports A and B Montgomery result of product C.
3. the Montgomery modular multiplication computational methods according to claim 2 suitable for embedded system, it is characterised in that
The step 2) specifically includes:
Step 2-1) make q=0;
Step 2-2) all k for meeting k+l=q, l set be designated as A:A=k, l | k+l=q };
Step 2-3) calculate (Q [q+1], Q [q])=∑AE[k]*F[l];
Wherein,
E [k] * F [l]=(A [kd+3], A [kd+2], A [kd+1], A [kd]) * (B [ld+3], B [ld+2], B [ld+1], B
[ld]);
Step 2-4) judge q<Whether r sets up, if a determination be made that certainly, then calculating G [q]=Q [q] * n0’;Otherwise, go to
Step 2-5);
Step 2-5) calculate (Q [q+1], Q [q])=(Q [q+1], Q [q])+∑sAG[k]*H[l];
Step 2-6) make q=q+1;If q is less than or equal to 2r-2, k=k+1, return to step 2-2 are made);Otherwise, it is transferred to step 2-7);
Step 2-7) q row C=C/R are calculated, due to R=2nW, so:
C=(C [2n-1], C [2n-2] ..., C [n+1], C [n]).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610609265.0A CN107665109B (en) | 2016-07-28 | 2016-07-28 | Montgomery modular multiplication calculation method suitable for embedded system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610609265.0A CN107665109B (en) | 2016-07-28 | 2016-07-28 | Montgomery modular multiplication calculation method suitable for embedded system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107665109A true CN107665109A (en) | 2018-02-06 |
CN107665109B CN107665109B (en) | 2020-04-14 |
Family
ID=61115623
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610609265.0A Active CN107665109B (en) | 2016-07-28 | 2016-07-28 | Montgomery modular multiplication calculation method suitable for embedded system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107665109B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1152746A (en) * | 1996-09-20 | 1997-06-25 | 张胤微 | High speed modular multiplication method and device |
CN101834723A (en) * | 2009-03-10 | 2010-09-15 | 上海爱信诺航芯电子科技有限公司 | RSA (Rivest-Shamirh-Adleman) algorithm and IP core |
CN102207847A (en) * | 2011-05-06 | 2011-10-05 | 广州杰赛科技股份有限公司 | Data encryption and decryption processing method and device based on Montgomery modular multiplication operation |
CN102707924A (en) * | 2012-05-02 | 2012-10-03 | 广州中大微电子有限公司 | RSA coprocessor for RFID (radio frequency identification device) intelligent card chip |
US8417756B2 (en) * | 2007-11-29 | 2013-04-09 | Samsung Electronics Co., Ltd. | Method and apparatus for efficient modulo multiplication |
CN103914277A (en) * | 2014-04-14 | 2014-07-09 | 复旦大学 | Extensible modular multiplier circuit based on improved Montgomery modular multiplication algorithm |
-
2016
- 2016-07-28 CN CN201610609265.0A patent/CN107665109B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1152746A (en) * | 1996-09-20 | 1997-06-25 | 张胤微 | High speed modular multiplication method and device |
CN1085862C (en) * | 1996-09-20 | 2002-05-29 | 张胤微 | High speed modular multiplication method and device |
US8417756B2 (en) * | 2007-11-29 | 2013-04-09 | Samsung Electronics Co., Ltd. | Method and apparatus for efficient modulo multiplication |
CN101834723A (en) * | 2009-03-10 | 2010-09-15 | 上海爱信诺航芯电子科技有限公司 | RSA (Rivest-Shamirh-Adleman) algorithm and IP core |
CN102207847A (en) * | 2011-05-06 | 2011-10-05 | 广州杰赛科技股份有限公司 | Data encryption and decryption processing method and device based on Montgomery modular multiplication operation |
CN102707924A (en) * | 2012-05-02 | 2012-10-03 | 广州中大微电子有限公司 | RSA coprocessor for RFID (radio frequency identification device) intelligent card chip |
CN103914277A (en) * | 2014-04-14 | 2014-07-09 | 复旦大学 | Extensible modular multiplier circuit based on improved Montgomery modular multiplication algorithm |
Non-Patent Citations (1)
Title |
---|
刘哲: "8比特AVR微控制器上高效及抗侧信道攻击的RSA算法的实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN107665109B (en) | 2020-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230186065A1 (en) | Accelerator for deep neural networks | |
US11144819B2 (en) | Convolutional neural network hardware configuration | |
Zachariadis et al. | Accelerating sparse matrix–matrix multiplication with GPU Tensor Cores | |
US10572409B1 (en) | Sparse matrix processing circuitry | |
US8065337B2 (en) | Shared-memory multiprocessor system and method for processing information | |
JP5304251B2 (en) | Parallel sort apparatus, method, and program | |
CN112988655A (en) | System and method for loading weights into tensor processing blocks | |
Hu et al. | A memory-efficient high-throughput architecture for lifting-based multi-level 2-D DWT | |
EP3295300B1 (en) | System and method for determining concurrency factors for dispatch size of parallel processor kernels | |
CN110851779A (en) | Systolic array architecture for sparse matrix operations | |
CN104615584B (en) | The method for solving vectorization calculating towards GPDSP extensive triangular linear equation group | |
CN110858137A (en) | Floating point division by integer constant | |
CN115348002A (en) | Montgomery modular multiplication fast calculation method based on multi-word long multiplication instruction | |
CN107665109A (en) | A kind of Montgomery modular multiplication computational methods suitable for embedded system | |
CN116888591A (en) | Matrix multiplier, matrix calculation method and related equipment | |
US7136893B2 (en) | Decimal multiplication using digit recoding | |
US11194490B1 (en) | Data formatter for convolution | |
Liu et al. | HiKonv: High throughput quantized convolution with novel bit-wise management and computation | |
US11429850B2 (en) | Performing consecutive mac operations on a set of data using different kernels in a MAC circuit | |
US20140059106A1 (en) | Arithmetic circuit for performing division based on restoring division | |
CN115756389A (en) | Floating-point multiply-add device based on FPGA and calculation method | |
US10824434B1 (en) | Dynamically structured single instruction, multiple data (SIMD) instructions | |
US20100030836A1 (en) | Adder, Synthesis Device Thereof, Synthesis Method, Synthesis Program, and Synthesis Program Storage Medium | |
Rakanovic et al. | Argus CNN accelerator based on kernel clustering and resource-aware pruning | |
Jiang et al. | Output-Directed Dynamic Quantization for DNN Acceleration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |