CN107665109B - Montgomery modular multiplication calculation method suitable for embedded system - Google Patents
Montgomery modular multiplication calculation method suitable for embedded system Download PDFInfo
- Publication number
- CN107665109B CN107665109B CN201610609265.0A CN201610609265A CN107665109B CN 107665109 B CN107665109 B CN 107665109B CN 201610609265 A CN201610609265 A CN 201610609265A CN 107665109 B CN107665109 B CN 107665109B
- Authority
- CN
- China
- Prior art keywords
- montgomery
- equal
- multiplication
- calculation
- calculated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
- G06F7/722—Modular multiplication
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a Montgomery modular multiplication calculation method suitable for an embedded system, which comprises the following steps: multi-precision multiplication and Montgomery reduction; the multi-precision multiplication and Montgomery reduction parts are calculated in a mixed scanning mode, an operand scanning mode is used in an internal cycle, and a product scanning mode is used in an external cycle; and the multi-precision multiplication and Montgomery reduction use a coarse-grained integration mode, namely two-part alternate calculation. The Montgomery modular multiplication calculation method can reduce the memory access number in the embedded system and improve the Montgomery modular multiplication algorithm realization efficiency.
Description
Technical Field
The invention relates to the field of public key cryptosystems, in particular to a Montgomery modular multiplication computing method suitable for an embedded system.
Background
In 1985, Montgomery proposed Montgomery modular multiplication algorithm, which is the most widely used modular multiplication algorithm at present. The basic idea is to replace the time consuming inversion and division operations with simple and time saving addition and shift operations. For calculating the modulo product P ═ A × B mod N (A, B, N are N-bit binary large integers and 0<A,B<N), the algorithm first selects an integer R that is relatively prime to N (typically, R is 2)n) And converting the multiplication operation of the modulus N into the multiplication operation of the modulus R. Below we define the Montgomery product: MonPro (A, B) ═ A × B × R-1mod N。
The calculation structure of the Montgomery modular multiplication algorithm is shown in FIG. 1, and the method specifically comprises the following three steps:
(1) calculating N remaining classes of A and B:A’=A*R mod N=A*R2*R-1mod N=MonPro(A*R2),
B’=B*R mod N=B*R2*R-1mod N=MonPro(B*R2);
(2) calculating Montgomery product P '═ A'. B '. R of A' and B-1mod N=MonPro(A’,B’);
(3) Converting P 'into a modular product P, wherein P is A, B, mod N, A' and R-1*B’*R-1mod N
=P’*R-1mod N=MonPro(P’)。
Therefore, the core of the Montgomery modular multiplication algorithm is to calculate Montgomery product MonPro (A ', B'), the specific algorithm flow description of which is shown in FIG. 2. From fig. 2 it can be seen that the Montgomery product calculation can be divided into two key steps: compute multi-precision multiplication T ← a × B and compute Montgomery about subtract P ← (T + M × N)/R.
Dusse proposed the Montgomery modular multiplication algorithm using r-ary numbers in 1990 and using n0’=-n0 -1mod r replaces N' with a corresponding improvement to the algorithm. In 1996, Koc analyzed and summarized various Montgomery modular multiplication algorithm implementation methods, and concluded 5 major improved Montgomery algorithms: SOS, CIOS, FIOS, FIPS, and CIHS. Wherein the last two letters OS/PS/HS represent the calculation multiplication scanning mode, OS represents the operand scanning, PS represents the product scanning, and HS represents the hybrid scanning; the former letter S/CI/FI represents an integration mode used by multi-precision multiplication and Montgomery reduction, S represents a separation mode, namely one part is completely calculated and then the other part is calculated, CI represents a coarse-granularity integration mode, namely two parts of coarse-granularity alternate calculation, and FI represents a fine-granularity integration mode, namely two parts of fine-granularity alternate calculation. The implementation of these algorithms can be made by a series of operations: multiplication mul, addition add, load, store, etc. High performance implementation algorithms focus primarily on optimizing these operations. In the embedded system, because the number of available registers is limited, the storage operation such as load/store is very important, and in the current common method, the number of used registers is fixed, and is generally divided into 5 registersAnd N +4(N being the size modulo N including the number of words) registers. The method of using fewer registers (5) requires more memory access operations, while the method of using more registers (n +4) requires fewer memory access operations but is often unusable because the number of registers needed exceeds the number of available registers of the processor. Therefore, how to dynamically use the registers by the number of available registers of the processor and the size of n makes it a current problem to be solved to reduce the number of memory access operations by fully utilizing the registers.
Disclosure of Invention
The invention aims to overcome the defect that the Montgomery modular multiplication algorithm is not suitable for an embedded system with less resources at present, and provides a method for calculating Montgomery modular multiplication in a coarse-grained integrated mixed scanning mode in order to improve the calculation efficiency of the Montgomery modular multiplication algorithm.
In order to achieve the above object, the present invention provides a Montgomery modular multiplication calculation method suitable for an embedded system, the method comprising: multi-precision multiplication and Montgomery reduction; the multi-precision multiplication and Montgomery reduction parts are calculated in a mixed scanning mode, an operand scanning mode is used in an internal cycle, and a product scanning mode is used in an external cycle; and the multi-precision multiplication and Montgomery reduction use a coarse-grained integration mode, namely two-part alternate calculation.
In the above technical solution, the method specifically includes:
step 1) setting the large number N as m-bit prime number, the word length of the processor is W bit, and the word number of N isA, B are two N remaining classes, namely 0<A,B<N;Montgomery coefficient R is 2nW,n0’=-n0 -1mod r,n0The lowest bit of N; selecting d, d being the word size of the inner loop; the size of the outer loop is The minimum integer operation is taken to be more than or equal to the minimum integer operation;
step 2) the modular multiplication calculation process of A and B is as follows: c ═ A ═ B, M ═ C ═ n0' mod R, C ═ C + M × N)/R; taking every d words of operand A, B, M, N, C as a whole:
E=(E[r-1],…,E[0])=({A[n-1],A[n-2],….,A[n-d]}…{A[d-1],A[2],A[1],A[0]})
F=(F[r-1],…,F[0])=({B[n-1],B[n-2],….,B[n-d]}…{B[d-1],B[2],B[1],B[0]})
G=(G[r-1],…,G[0])=({M[n-1],M[n-2],….,M[n-d]}…{M[d-1],M[2],M[1],M[0]})
H=(H[r-1],…,H[0])=({N[n-1],N[n-2],….,N[n-d]}…{N[d-1],N[2],N[1],N[0]})
Q=(Q[2r-1],Q[2r-3],…,Q[1],Q[0])=({C[2n-1],C[2n-2],C[2n-3],C[2n-d]},…,
{C[d-1],C[2],C[1],C[0]})
sequentially calculating all partial products of the q, q is more than or equal to 0 and less than or equal to 2r-1 columns:
E[k]*F[l]+G[k]*H[l]=(Q[q+1],Q[q]),
wherein k + l ═ q; until all the column numbers are calculated, obtaining C;
step 3) judging whether C is larger than or equal to N, and if so, making C equal to C-N; turning to the step 4), otherwise, turning to the step 4);
and 4) outputting a Montgomery product result C of A and B.
In the above technical solution, the step 2) specifically includes:
step 2-1) making q equal to 0;
step 2-2) the set of all k, l satisfying k + l q is denoted as a: a ═ { k, l | k + l ═ q };
step 2-3) calculation of (Q [ Q +1]],Q[q])=∑AE[k]*F[l];
Wherein the content of the first and second substances,
E[k]*F[l]=(A[kd+3],A[kd+2],A[kd+1],A[kd])*(B[ld+3],B[ld+2],B[ld+1],B[ld]);
step 2-4) judging q<r is established, if the judgment result is positive, G [ q ] is calculated]=Q[q]*n0'; otherwise, go to step 2-5);
step 2-5) calculation of (Q [ Q +1]],Q[q])=(Q[q+1],Q[q])+∑AG[k]*H[l];
Step 2-6) making q ═ q + 1; if q is less than or equal to 2r-2, making k equal to k +1, and returning to the step 2-2); otherwise, turning to the step 2-7);
step 2-7) calculate the qth column C ═ C/R, since R ═ 2nWTherefore:
C=(C[2n-1],C[2n-2],…,C[n+1],C[n])。
compared with the prior art, the invention has the technical advantages that:
the mixed scanning idea is applied to Montgomery modular multiplication calculation by using a coarse-grained integration mode, operands are reasonably utilized by dynamically selecting d, the memory access number in an embedded system is reduced, and the realization efficiency of a Montgomery modular multiplication algorithm is improved.
Drawings
FIG. 1 is a schematic diagram of a prior Montgomery modular multiplication calculation structure;
FIG. 2 is a flow chart of a prior art Montgomery modular multiplication to compute a Montgomery product;
FIG. 3 is a schematic diagram of the modular multiplication computation method of the present invention;
FIG. 4 is a block diagram of the Montgomery modular multiplication method CIPOS-a (n 8, d 4) for coarse grain integrated product and operand hybrid scanning according to the present invention;
FIG. 5 is a block diagram of the Montgomery modular multiplication method CIPHS-b (n 8, d 3) for coarse-grained integrated product and operand hybrid scanning according to the present invention;
FIG. 6 is a schematic diagram of block product scanning in the method of the present invention.
Detailed Description
The method of the present invention is described in further detail below with reference to the figures and specific examples.
A Montgomery modular multiplication computation method suitable for an embedded system, the method comprising:
step 1) setting the large number N as m-bit prime number, the word length of the processor is W bit, and the word number of N isA, B are two N remaining classes, namely 0<A,B<N; montgomery coefficient R is 2nW,n0’=-n0 -1mod r,n0The lowest bit of N; select d, d is the word size of the inner loop (scanned using the operand); the size of the outer loop (using product scan) is The minimum integer operation is taken to be more than or equal to the minimum integer operation;
step 2) calculating a modular multiplication result C of the A and the B, wherein the calculation process comprises the following steps:
1)C=A*B;
2)M=C*n0’mod R;
3)C=(C+M*N)/R.;
as shown in fig. 3, let a, B denote 2 m-bit multi-precision integers: a ═ A [ n-1], …, A [2], A [1], A [0], B ═ B [ n-1], …, B [2], B [1], B [0 ]. The product C ═ a · B can be expressed as: c ═ C (C2 n-1, …, C2, C1, C0).
Taking every d words of operand A, B, M, N, C as a whole, n-8, d-4 in this embodiment; is represented as follows:
E=(E[1],E[0])=({A[7],A[6],A[5],A[4]}{A[3],A[2],A[1],A[0]})
F=(F[1],F[0])=({B[7],B[6],B[5],B[4]}{B[3],B[2],B[1],B[0]})
G=(G[1],G[0])=({M[7],M[6],M[5],M[4]}{M[3],M[2],M[1],M[0]})
H=(H[1],H[0])=({N[7],N[6],N[5],N[4]}{N[3],N[2],N[1],N[0]})
Q=(Q[3],Q[2],Q[1],Q[0])=({C[15],C[14],C[13],C[12]}{C[11],C[10],C[9],C[8]}
{C[7],C[6],C[5],C[4]}{C[3],C[2],C[1],C[0]})
then the calculation of C ═ A ═ B can be converted into the calculation of (Q3, Q2, Q1, Q0) ═ E1, E0 [ ((F1), F0 ])
Calculating M ═ C × n0' mod R can be converted to calculation (G1)],G[0])=(Q[1],Q[0])*n0’
Calculation of C + M N may be translated into calculation
(Q[3],Q[2],Q[1],Q[0])=(Q[3],Q[2],Q[1],Q[0])+(G[1],G[0])*(H[1],H[0]))
Next, C ═ a × B and C ═ C + M × N (M ═ C × N) were alternately calculated by the product scan method0' mod R). Namely, all partial products of q is more than or equal to 0 and less than or equal to 2r-1 after the q column is calculated:
e [ k ] + F [ l ] + G [ k ] + H [ l ] (Q [ Q +1, Q [ Q ]) (where k + l ═ Q), then the next column is counted until all column counts are completed.
Description of the algorithmic structure as shown in fig. 4, each shaded block ①, etc. in the diamond structure and each large box in the multiplication structure in the figure represent a product E k F l or G k H l, the size of the size is r n/d 2 when calculating the whole diamond structure using the shaded blocks as basic units, all E k F l in the 0 th column (i.e., block ①), all G k H l (i.e., block ②), all E k F l in the 1 st column (i.e., block ③), all G k H l (i.e., block ⑤), and all E k F l in the 2 nd column (i.e., block ⑦), all G k H l (i.e., block ⑧).
For each shaded block ①, etc., in the calculation map, E [ k ] F [ l ] (or G [ k ] H [ l ]) is calculated for each row, using operand scanning, with the scale size d being 4, the calculation is performed in rows, keeping one operand B [ i ] in each row unchanged, and multiplying with all terms of another operand a [ j ] (0 ≦ j < d), and the next row is calculated after all the products in the row are calculated.
In another embodiment, when n is 8 and d is 3, as shown in fig. 5, the whole multiplication is divided into many blocks ①, etc., and the coarse-grained integration product scanning method is still used between these blocks, i.e. ① -Is executed with the size ofIn the following we can divide all columns of the product scan into three parts, the first part being the 1 st to r-1 st columns, all blocks being complete blocks, size d, and the second part being the r to 2r-2 nd columns, the uppermost and lowermost blocks being incomplete blocks, size [ d- (rd-n) according to the completeness of the block]D, the remaining blocks are all full block sizes d x d; the third part, 2r-1 column, contains two partial blocks of size [ d- (rd-n)]*[d-(rd-n)]. The inside of the block is still calculated by means of operand scanning, and the diamond block with incomplete calculation is used for operand scanning along the long edge.
The step 2) specifically comprises the following steps:
step 2-1) making q equal to 0;
step 2-2) the set of all k, l satisfying k + l q is denoted as a: a ═ { k, l | k + l ═ q };
step 2-3) calculation of (Q [ Q +1]],Q[q])=∑AE[k]*F[l];
An operand scanning mode is adopted: calculating according to the row mode, keeping one operand B [ i ] unchanged in each row, and multiplying all terms of another operand A [ j ] (j is more than or equal to 0 and less than d); after all the products of the row are calculated, the next row is calculated. Wherein each of E [ k ] F [ l ] and G [ k ] H [ l ] is calculated
As shown in figure 6 of the drawings,
E[k]*F[l]=(A[kd+3],A[kd+2],A[kd+1],A[kd])*(B[ld+3],B[ld+2],B[ld+1],B[ld])
step 2-4) judging q<Whether r is true, e.g.If the result is positive, G [ q ] is calculated]=Q[q]*n0'; otherwise, go to step 2-5);
step 2-5) calculation of (Q [ Q +1]],Q[q])=(Q[q+1],Q[q])+∑AG[k]*H[l];
Step 2-6) making q ═ q + 1; if q is less than or equal to 2r-1, making k equal to k +1, and returning to the step 2-2); otherwise, turning to the step 2-7);
step 2-7) calculate the qth column C ═ C/R, since R ═ 2WTherefore:
C=(C[15],C[14],C[13],C[12],C[11],C[10],C[9],C[8]);
step 3) judging whether C is larger than or equal to N, and if so, making C equal to C-N; turning to the step 4), otherwise, turning to the step 4);
and 4) outputting a Montgomery product result C of A and B.
The method of the present invention is divided into two cases according to whether n/d is an integer: the first case is that n/d is an integer, i.e.We call CIPOHS-a; the second case is that n/d is not an integer, i.e.We call CIPOHS-b. The total amount of memory accesses for both methods is analyzed.
1. CIPOS-a method
As shown in fig. 4, the number of memory accesses inside each block is first analyzed: because d +1 registers are used for storing operands in each block, wherein d registers store an operand A, and the remaining 1 register stores each word represented by the operand B in a multi-precision mode in turn, each operation in each block is only loaded once, so that the number of loads in each block is 2 d; and the calculation result of each block is directly stored in 2d +1 registers, so the number of storing intermediate results of each block is 0. The outer loop was analyzed below and had a total of 2r since the outer loop size was r ═ n/d2=2(n/d)2Block, and use a coarse-grained integrated product-scan approach. In this case, 2 x 2 is common2Execution of blocks, 8The sequence is performed according to the number ①②③④⑤⑥⑦⑧ marked in the figure, since the number of loads per block is 2d, there is 2(n/d)2Block, so the total amount of load is 2d × 2(n/d)2=4n2D; what needs to be stored in the whole algorithm is M of n words (M ← C × n)0' mod r) and n +1 words, so the total amount of stores is n + n +1 — 2n + 1; so the total number of memory accesses (load and store) is 4n2/d+2n+1。
2. CIPOS-b method
As shown in FIG. 5, the first part, containing the first r-1 columns, all blocks are complete blocks such as blocks ①, ③, ④, and operand scanning is used in each block, so the number of loads in each block is 2d, there are a total of r (r-1) blocks, so the total load is 2d r (r-1), the second part, containing r to 2r-2 columns, the uppermost and lowermost blocks in each column are incomplete blocks, with a size of [ d- (rd-n)]D, in which the scan is performed by scanning operands along the length d, block ⑦ in FIG. 4, where A [0] is first],A[1],A[2]Loaded in a register and then first calculates B [6 ]]And A [0]],A[1],A[2]The product of (a), then B [7 ]]And A [0]],A[1],A[2]The product of (a); the number of load blocks per incomplete block is 2d- (rd-n) for a total of 4(r-1) blocks. The second part of the remaining blocks are all full blocks, using normal operand scanning, the number of loads per block is 2d, there are (r-2) × (r-1) blocks. So that the total load is 4 (r-1). [2d- (rd-n) ]]+2d (r-2) (r-1). And a third part: only 2r-1 columns, only 2 [ d- (rd-n) in size]*[d-(rd-n)]Incomplete blocks, e.g. blocksIn these blocks according to the length [ d- (rd-n)]So that the number of loads is 4[ d- (rd-n)]. Table 1 summarizes the number of loads in these three parts, and it can be seen that the total number of loads is 4rn, and the number of stores is 2n +1 as CIPHOSS-a, so the total number of memory accesses is 4rn +2n + 1.
TABLE 1
The total amount of memory access can be uniformly recorded by integrating the CIPOS-a and the CIPOS-bWe next analyze the number of registers used and the number of memory accesses for several algorithms proposed by Koc and for the CIPOHS algorithm proposed, as shown in table 2.
TABLE 2
As can be seen from Table 2, the CIOS algorithm requires a minimum of 2n memory accesses among several existing algorithms compared in the table2+3n +1, but the larger number of registers it requires n + 4. When the value of n is large, the number of available registers is less than n +4, and therefore the algorithm can no longer be used. The CIOS-5reg and FIPS algorithms use a small number of registers, only 5 registers are needed, but the access amount of the memory is large. The CIPOHS algorithm proposed by the present invention solves this problem by dynamically selecting d by the number of available registers, and by making good use of the number of available registers, reducing the number of memory accesses. The memory access number of CIPOS isThe number of memory accesses of the CIOS is 2n2+3n +1, when d is an integer greater than 1, the number of memory accesses by CIPOHS is less than CIOS, and the larger the value of d, the smaller the number of memory accesses required by CIPOHS. The times of the multiplication instructions, the addition instructions and the like used by the algorithms are basically the same, and the memory access number of the CIPOHS algorithm is the minimum, so the arithmetic efficiency of the algorithm is the highest.
In summary, the Montgomery modular multiplication method applicable to the embedded system of the present invention alternately calculates the two parts of the multi-precision multiplication and the Montgomery reduction by using the coarse integration method, and uses the mixed scanning method of the product and the operand in the two parts. D is selected by the number of available registers, the number of accessed algorithm memories is reduced by fully utilizing the number of registers, and the operation efficiency of the algorithm is further improved.
The above description is only for the purpose of illustrating the embodiments of the present invention and should not be taken as limiting the scope of the present invention, and it should be understood by those skilled in the art that modifications and equivalents may be made without departing from the spirit and scope of the present invention and that the present invention is also covered by the scope of the present invention.
Claims (1)
1. A Montgomery modular multiplication computation method suitable for an embedded system, the method comprising: multi-precision multiplication and Montgomery reduction; the multi-precision multiplication and Montgomery reduction parts are calculated in a mixed scanning mode, an operand scanning mode is used in an internal cycle, and a product scanning mode is used in an external cycle; the multi-precision multiplication and Montgomery reduction use a coarse-grained integration mode, namely the two parts are alternately calculated;
the method specifically comprises the following steps:
step 1) setting the large number N as m-bit prime number, the word length of the processor is W bit, and the word number of N isA, B are two N remaining classes, namely 0<A,B<N; montgomery coefficient R is 2nW,n0’=-n0 -1mod r,n0The lowest bit of N; selecting d, d being the word size of the inner loop; the size of the outer loop is The minimum integer operation is taken to be more than or equal to the minimum integer operation;
step 2) the modular multiplication calculation process of A and B is as follows: c ═ A ═ B, M ═ C ═ n0' mod R, C ═ C + M × N)/R; taking every d words of operand A, B, M, N, C as a whole:
E=(E[r-1],…,E[0])=({A[n-1],A[n-2],….,A[n-d]}…{A[d-1],A[2],A[1],A[0]})
F=(F[r-1],…,F[0])=({B[n-1],B[n-2],….,B[n-d]}…{B[d-1],B[2],B[1],B[0]})
G=(G[r-1],…,G[0])=({M[n-1],M[n-2],….,M[n-d]}…{M[d-1],M[2],M[1],M[0]})
H=(H[r-1],…,H[0])=({N[n-1],N[n-2],….,N[n-d]}…{N[d-1],N[2],N[1],N[0]})
Q=(Q[2r-1],Q[2r-3],…,Q[1],Q[0])=({C[2n-1],C[2n-2],C[2n-3],C[2n-d]},…,{C[d-1],C[2],C[1],C[0]})
E. f, G, H and Q are both block matrices;
sequentially calculating all partial products of the q, q is more than or equal to 0 and less than or equal to 2r-1 columns:
E[k]*F[l]+G[k]*H[l]=(Q[q+1],Q[q]),
wherein k + l ═ q; until all the column numbers are calculated, obtaining C; k and l are integers;
step 3) judging whether C is larger than or equal to N, and if so, making C equal to C-N; turning to the step 4), otherwise, turning to the step 4);
step 4), outputting a Montgomery product result C of A and B;
the step 2) specifically comprises the following steps:
step 2-1) making q equal to 0;
step 2-2) the set of all k, l satisfying k + l q is denoted as a: a ═ { k, l | k + l ═ q };
step 2-3) calculation of (Q [ Q +1]],Q[q])=∑AE[k]*F[l];
Wherein the content of the first and second substances,
E[k]*F[l]=(A[kd+3],A[kd+2],A[kd+1],A[kd])*(B[ld+3],B[ld+2],B[ld+1],B[ld]);
step 2-4) judging q<r is established, if the judgment result is positive, G [ q ] is calculated]=Q[q]*n0'; otherwise, go to step 2-5);
step 2-5) calculation of (Q [ Q +1]],Q[q])=(Q[q+1],Q[q])+∑AG[k]*H[l];
Step 2-6) making q ═ q + 1; if q is less than or equal to 2r-2, making k equal to k +1, and returning to the step 2-2); otherwise, turning to the step 2-7);
step 2-7) calculate the qth column C ═ C/R, since R ═ 2nWTherefore:
C=(C[2n-1],C[2n-2],…,C[n+1],C[n])。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610609265.0A CN107665109B (en) | 2016-07-28 | 2016-07-28 | Montgomery modular multiplication calculation method suitable for embedded system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610609265.0A CN107665109B (en) | 2016-07-28 | 2016-07-28 | Montgomery modular multiplication calculation method suitable for embedded system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107665109A CN107665109A (en) | 2018-02-06 |
CN107665109B true CN107665109B (en) | 2020-04-14 |
Family
ID=61115623
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610609265.0A Active CN107665109B (en) | 2016-07-28 | 2016-07-28 | Montgomery modular multiplication calculation method suitable for embedded system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107665109B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1152746A (en) * | 1996-09-20 | 1997-06-25 | 张胤微 | High speed modular multiplication method and device |
CN101834723A (en) * | 2009-03-10 | 2010-09-15 | 上海爱信诺航芯电子科技有限公司 | RSA (Rivest-Shamirh-Adleman) algorithm and IP core |
CN102207847A (en) * | 2011-05-06 | 2011-10-05 | 广州杰赛科技股份有限公司 | Data encryption and decryption processing method and device based on Montgomery modular multiplication operation |
CN102707924A (en) * | 2012-05-02 | 2012-10-03 | 广州中大微电子有限公司 | RSA coprocessor for RFID (radio frequency identification device) intelligent card chip |
US8417756B2 (en) * | 2007-11-29 | 2013-04-09 | Samsung Electronics Co., Ltd. | Method and apparatus for efficient modulo multiplication |
CN103914277A (en) * | 2014-04-14 | 2014-07-09 | 复旦大学 | Extensible modular multiplier circuit based on improved Montgomery modular multiplication algorithm |
-
2016
- 2016-07-28 CN CN201610609265.0A patent/CN107665109B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1152746A (en) * | 1996-09-20 | 1997-06-25 | 张胤微 | High speed modular multiplication method and device |
CN1085862C (en) * | 1996-09-20 | 2002-05-29 | 张胤微 | High speed modular multiplication method and device |
US8417756B2 (en) * | 2007-11-29 | 2013-04-09 | Samsung Electronics Co., Ltd. | Method and apparatus for efficient modulo multiplication |
CN101834723A (en) * | 2009-03-10 | 2010-09-15 | 上海爱信诺航芯电子科技有限公司 | RSA (Rivest-Shamirh-Adleman) algorithm and IP core |
CN102207847A (en) * | 2011-05-06 | 2011-10-05 | 广州杰赛科技股份有限公司 | Data encryption and decryption processing method and device based on Montgomery modular multiplication operation |
CN102707924A (en) * | 2012-05-02 | 2012-10-03 | 广州中大微电子有限公司 | RSA coprocessor for RFID (radio frequency identification device) intelligent card chip |
CN103914277A (en) * | 2014-04-14 | 2014-07-09 | 复旦大学 | Extensible modular multiplier circuit based on improved Montgomery modular multiplication algorithm |
Non-Patent Citations (1)
Title |
---|
8比特AVR微控制器上高效及抗侧信道攻击的RSA算法的实现;刘哲;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120415(第4期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN107665109A (en) | 2018-02-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10698657B2 (en) | Hardware accelerator for compressed RNN on FPGA | |
CN111213125B (en) | Efficient direct convolution using SIMD instructions | |
US8028015B2 (en) | Method and system for large number multiplication | |
US8271571B2 (en) | Microprocessor | |
CN106445471A (en) | Processor and method for executing matrix multiplication on processor | |
EP3659051A1 (en) | Accelerated mathematical engine | |
US8756268B2 (en) | Montgomery multiplier having efficient hardware structure | |
US20120072704A1 (en) | "or" bit matrix multiply vector instruction | |
US20200372097A1 (en) | Apparatus and method for matrix operations | |
US10402196B2 (en) | Multi-dimensional sliding window operation for a vector processor, including dividing a filter into a plurality of patterns for selecting data elements from a plurality of input registers and performing calculations in parallel using groups of the data elements and coefficients | |
US20170169132A1 (en) | Accelerated lookup table based function evaluation | |
US11586442B2 (en) | System and method for convolving image with sparse kernels | |
US20030005267A1 (en) | System and method for parallel computing multiple packed-sum absolute differences (PSAD) in response to a single instruction | |
US9098381B2 (en) | Modular arithmatic unit and secure system including the same | |
US20080288756A1 (en) | "or" bit matrix multiply vector instruction | |
CN115348002B (en) | Montgomery modular multiplication rapid calculation method based on multi-word length multiplication instruction | |
CN107665109B (en) | Montgomery modular multiplication calculation method suitable for embedded system | |
CN116888591A (en) | Matrix multiplier, matrix calculation method and related equipment | |
JP2502836B2 (en) | Preprocessing device for division circuit | |
CN113504895B (en) | Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device | |
CN113705794B (en) | Neural network accelerator design method based on dynamic activation bit sparseness | |
US11403727B2 (en) | System and method for convolving an image | |
US8332447B2 (en) | Systems and methods for performing fixed-point fractional multiplication operations in a SIMD processor | |
US20210081178A1 (en) | Performing constant modulo arithmetic | |
US7890564B2 (en) | Interpolation FIR filter and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |