CN112486457B - Hardware system for realizing improved FIOS modular multiplication algorithm - Google Patents

Hardware system for realizing improved FIOS modular multiplication algorithm Download PDF

Info

Publication number
CN112486457B
CN112486457B CN202011319638.3A CN202011319638A CN112486457B CN 112486457 B CN112486457 B CN 112486457B CN 202011319638 A CN202011319638 A CN 202011319638A CN 112486457 B CN112486457 B CN 112486457B
Authority
CN
China
Prior art keywords
module
modular multiplication
multiplier
algorithm
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011319638.3A
Other languages
Chinese (zh)
Other versions
CN112486457A (en
Inventor
王敏杰
孙浩
孙玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202011319638.3A priority Critical patent/CN112486457B/en
Publication of CN112486457A publication Critical patent/CN112486457A/en
Application granted granted Critical
Publication of CN112486457B publication Critical patent/CN112486457B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/722Modular multiplication

Abstract

The invention discloses a hardware system for realizing an improved FIOS modular multiplication algorithm. The invention adopts hardware to realize a modular multiplication circuit, and reduces the consumption of logic resources by register multiplexing; rearranging a production line and the whole algorithm time sequence, and disassembling the addition operation of the key path into multi-stage flow addition tree operation to enable the operation speed to reach 600MHz at most; by parallelization processing of independent operations, the number of operations in a single clock cycle is increased; the multiplier of the base 128 is used as a basic calculation unit, the module multiplication of 4096 bits is completed only by 3463 cycles, the consumed time is about 5.75us, the cycle times in the calculation process are obviously reduced, the number of clocks required by operation is reduced, and the calculation throughput rate in unit time is improved. The invention improves the partial product generating circuit, and further reduces the use of logic gates. The invention reduces the code length in Montgomery modular multiplication algorithm through improvement, and improves the operation efficiency of the modular multiplication process.

Description

Hardware system for realizing improved FIOS modular multiplication algorithm
Technical Field
The invention relates to the field of data encryption and decryption, in particular to a hardware system for realizing an improved FIOS modular multiplication algorithm.
Background
With the rapid development of internet technology, information internet of things has penetrated into the aspects of social life, but the development of information security technology is relatively lagged behind, so that the big data era brings convenience to people and brings many security problems, and how to ensure the network information security becomes a hotspot of research of people.
One of key technologies for guaranteeing network information security is an encryption and decryption technology, and at present, two widely used encryption forms exist: traditional encryption and public key encryption. During 1978, the RSA algorithms were proposed by r.rivest, a.shamir and l.adleman, and the RSA public key encryption algorithm is also the most mature encryption and decryption algorithm mechanism in theory and most widely used at present. The difficulty of large integer factorization determines the security of RSA encryption, so modular multiplication is the most core operation in the RSA algorithm.
The RSA encryption and decryption algorithm mainly comprises two modes of software implementation and hardware implementation. Compared with the traditional software encryption, the hardware implementation mode can ensure that the encryption is more stable, the speed is higher, the compatibility is better and the safety is higher. The key to improving the performance of RSA is to increase the speed of modular multiplication, and the most commonly used algorithm is the montgomery algorithm. In the montgomery algorithm, since the number of bits of the operation data and the intermediate result is large, a lot of hardware resources are consumed, and the operation efficiency is also reduced.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a hardware system for realizing the improved FIOS modular multiplication algorithm, which reduces the consumption of hardware resources by means of multiplexing registers, rearranging pipelines and the like, effectively reduces the cycle times in the calculation process, reduces the number of clocks required by operation, and achieves the purpose of improving the calculation throughput rate in unit time, namely improving the RSA encryption algorithm speed.
A hardware system for realizing improved FIOS modular multiplication algorithm comprises a storage module, a modular multiplication algorithm module, a booth multiplier module and a modular multiplication parameter n' 0 selection module.
The storage module comprises two memories of a RAM1 and a RAM2, wherein the RAM1 is used for storing an input multiplicand A, a multiplier B and a modulus N, and the RAM2 is used for storing an output result.
Preferably, the size of RAM1 is 96 x 128bit and the size of ram2 is 40 x 128bit.
The modular multiplication algorithm module is additionally provided with a group of registers on the basis of the existing Montgomery modular multiplier, so that parallel calculation of a pre-calculation link and a carry calculation link is realized, and then an iteration link is calculated. The modular multiplication algorithm module is used for dividing a multiplicand A, a multiplier B and a modulus N of 4096 bits into a plurality of fields with equal length, completing the modular multiplication calculation of A × BmodN, and keeping the output between 0 and 2N.
The booth multiplier module includes a partial product generation module, a partial product compression module, and a vector addition module. The subsequent circuits of the existing multiplier partial product generation module are replaced by a multiplexer. The partial product compression module compresses an input vector to two lines, adopts a CSA compressor to complete ten-stage compression, and inserts a first-stage register in front of a last-stage circuit to optimize a time sequence. The vector addition module adds the two rows of vectors output by the partial product compression module through a carry-look-ahead adder. And finishing the multiplication operation in the modular multiplication calculation process.
The module for selecting the modular multiplication parameter n' 0 comprises an input port, an output port, a cyclic shift module, two D triggers, an adder and an alternative selector. The input of the cyclic shift module is connected with the input port of the modular multiplication parameter n' 0 selection module, and the output end of the cyclic shift module is connected with the D end of the first D trigger; one input end of the adder is connected with the Q end of the first D trigger, the other input end of the adder is input with a multiplier B, and the output end of the adder is connected with the D end of the second D trigger. The Q end of the second D trigger is connected with the output port of the module for selecting the modular multiplication parameter n' 0 through the alternative selector to output the multiplicand A.
The modular multiplication calculation process based on the hardware system comprises the following steps: the system reads 4096 bits of multiplicand a, multiplier B and modulus N from the input and stores them in RAM 1. The module for selecting the module multiplication parameter N '0 reads the module N stored in the RAM1, and the low 128 bits N' 0 of the module N are obtained by pre-calculation. The modular multiplication algorithm module reads a multiplicand A and a multiplier B stored in the RAM1 and n '0 obtained by calculation of the modular multiplication parameter n' 0 selection module, completes multiplication operation by calling a booth multiplier, and stores the finally obtained result into the RAM2 to complete the whole modular multiplication operation process.
The invention has the following beneficial effects:
1. by adding a group of registers and register multiplexing on the basis of the original Montgomery modular multiplier and rearranging a production line and an algorithm time sequence, the parallel computation of a pre-computation link and a carry computation link is realized, the code length is reduced, the consumption of hardware resources is reduced, and s is reduced 2 -s times of word addition, 2s 2 S reads and 2s 2 S writes, the maximum running speed of the modular multiplier can reach 600MHz.
2. The booth multiplier is improved, a first-level register is inserted in a key path, the use of logic gates is reduced, the calculation bit width and the calculation speed of the multiplier are improved, the hardware area consumption is reduced, the cycle number in the modular multiplication calculation process is reduced, and therefore the operation period is reduced.
3. The module of modular multiplication algorithm finishes the calculation of the multiplication part by calling the booth multiplier, improves the processing magnitude in unit time, only needs 3463 cycles to realize the modular multiplication of 4096 bits, consumes about 5.75us, and improves the processing magnitude in unit time.
Drawings
FIG. 1 is a functional block diagram of the present invention;
FIG. 2 is a block diagram of an improved modular multiplication process according to the present invention;
FIG. 3 is a data transfer diagram of the modular multiplication computation process of the present invention;
FIG. 4 is a circuit logic of a partial product codec scheme according to the present invention;
FIG. 5 is the optimized circuit logic for the partial product coding/decoding scheme of the present invention;
FIG. 6 is a partial product compression process according to the present invention;
FIG. 7 is a circuit diagram of a module for selecting a modular multiplication parameter n' 0 according to the present invention.
Detailed Description
The invention is further explained below with reference to the drawings;
as shown in FIG. 1, a hardware system for implementing an improved FIOS modular multiplication algorithm includes a storage module, a modular multiplication algorithm module, a booth multiplier module, and a modular multiplication parameter n' 0 selection module.
The storage module comprises a RAM1 with 96 × 128bit and a RAM2 with 40 × 128bit, wherein the RAM1 is used for storing an input multiplicand A, a multiplier B and a modulus N, and the RAM2 is used for storing an output result.
The Montgomery modular multiplication algorithm in the prior art is as follows:
Figure BDA0002792460240000031
the modular multiplication algorithm module is additionally provided with a group of registers, the number of stages of a production line is compressed through a parallelization processing pre-calculation link 1 and a carry calculation link 3, and a key path of an original algorithm is disassembled into two steps to complete connection calculation at a position where three numbers are connected in one cycle. The data transfer of the modular multiplication algorithm module is shown in fig. 3, and the algorithm is as follows:
Figure BDA0002792460240000041
the booth multiplier comprises a partial product generating module, a partial product compressing module and a vector adding module. The subsequent circuit of the existing multiplier partial product generation module is replaced by a multiplexer, an expression is generated according to the coding logic relationship and the partial product, the circuit logic diagram of the coding and decoding scheme shown in fig. 4 can be obtained, and the circuit logic diagram of the partial product generation module is shown in fig. 5. The partial product compression module takes the vector input generated by the partial product generation module as the input of the Wallace tree structure, and the vector is subjected to ten-stage compression by the CSA compressor to be compressed into two terms. A first-stage register is inserted in front of the last-stage circuit of the partial product compression module, so that the retention time of the circuit is ensured, and the circuit can reach higher clock frequency. And inputting the two rows of vectors obtained by compression into a vector addition module, and adding the two rows of vectors output by the partial product compression module through a carry look-ahead adder to obtain a final multiplication result.
The module for selecting the modular multiplication parameter N' [0] comprises an input port, an output port, a cyclic shift module, two D flip-flops, an adder and an alternative selector, the circuit structure is shown in figure 7, the low 128 bits N of the input module N are N, and the bit width of N is related to the bit width of a booth multiplier; the left cyclic shift processing is carried out on n through a cyclic shift module, the n is added with a multiplier B through an adder after passing through a first D trigger, B + n [0] calculation is carried out through the temporary storage output of a second D trigger, a multiplicand A is finally output through an alternative selector, a complete byte is obtained after eight times of cyclic output, and the calculation of n '0 is completed, wherein the algorithm of n' 0 is as follows:
Figure BDA0002792460240000042
Figure BDA0002792460240000051
the system reads 4096 bits of multiplicand a, multiplier B and modulus N from the input and stores them in RAM 1. The module for selecting the module multiplication parameter N '0 reads the module N stored in the RAM1, and the low 128 bits N' 0 of the module N are obtained by pre-calculation. The modular multiplication algorithm module reads a multiplicand A and a multiplier B stored in the RAM1 and n '0 calculated by the modular multiplication parameter n' 0 selection module, completes multiplication operation by calling a booth multiplier, and stores the finally obtained result into the RAM2 to complete the whole modular multiplication operation process.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the spirit of the present invention, and these modifications and decorations should also be regarded as being within the scope of the present invention.

Claims (2)

1. A hardware system for implementing an improved FIOS modular multiplication algorithm, comprising: comprises a storage module, a module multiplication algorithm module, a booth multiplier module and a module multiplication parameter n' 0 selection module;
the storage module comprises two memories of an RAM1 and an RAM2, wherein the RAM1 is used for storing an input multiplicand A, a multiplier B and a modulus N, and the RAM2 is used for storing an output result;
the modular multiplication algorithm module is additionally provided with a group of registers, the number of stages of a production line is compressed through a parallelization processing pre-calculation link 1 and a carry calculation link 3, and a Montgomery modular multiplication key path is disassembled into two steps to complete connection calculation at a position where three numbers are connected in one cycle; the modular multiplication algorithm module is used for dividing a 4096bit multiplicand A, a multiplier B and a modulus N into a plurality of fields with equal length to complete modular multiplication calculation of A × BmodN;
the booth multiplier module comprises a partial product generation module, a partial product compression module and a vector addition module; replacing a subsequent circuit of the existing multiplier partial product generation module with a multiplexer; the partial product compression module adopts a CSA compressor to complete ten-stage compression, and a first-stage register is inserted in front of a last-stage circuit to realize the compression of the input vector to two lines; the vector addition module adds two rows of vectors output by the partial product compression module through a carry look ahead adder; finishing multiplication operation in the modular multiplication calculation process;
the module multiplication parameter n' 0 selection module comprises an input port, an output port, a cyclic shift module, two D triggers, an adder and an alternative selector; the input of the cyclic shift module is connected with the input port of the modular multiplication parameter n' 0 selection module, and the output end of the cyclic shift module is connected with the D end of the first D trigger; one input end of the adder is connected with the Q end of the first D trigger, the other input end of the adder is input with a multiplier B, and the output end of the adder is connected with the D end of the second D trigger; the Q end of the second D trigger is connected with the output port of the modular multiplication parameter n' 0 selection module through the alternative selector to output a multiplicand A;
the system reads a 4096-bit multiplicand A, a multiplier B and a modulus N from an input and stores the multiplicand A, the multiplier B and the modulus N in an RAM 1; the module for selecting the modular multiplication parameter N '0 reads the modulus N stored in the RAM1, and the low 128 bits N' 0 of the modulus N are obtained through pre-calculation; the modular multiplication algorithm module reads a multiplicand A and a multiplier B stored in the RAM1 and n '0 calculated by the modular multiplication parameter n' 0 selection module, completes multiplication operation by calling a booth multiplier, and stores the finally obtained result into the RAM2 to complete the whole modular multiplication operation process.
2. A hardware system implementing an improved FIOS modular multiplication algorithm as claimed in claim 1, wherein: the size of the RAM1 is 96 × 128bits, and the size of the RAM2 is 40 × 128bits.
CN202011319638.3A 2020-11-23 2020-11-23 Hardware system for realizing improved FIOS modular multiplication algorithm Active CN112486457B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011319638.3A CN112486457B (en) 2020-11-23 2020-11-23 Hardware system for realizing improved FIOS modular multiplication algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011319638.3A CN112486457B (en) 2020-11-23 2020-11-23 Hardware system for realizing improved FIOS modular multiplication algorithm

Publications (2)

Publication Number Publication Date
CN112486457A CN112486457A (en) 2021-03-12
CN112486457B true CN112486457B (en) 2022-12-20

Family

ID=74932876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011319638.3A Active CN112486457B (en) 2020-11-23 2020-11-23 Hardware system for realizing improved FIOS modular multiplication algorithm

Country Status (1)

Country Link
CN (1) CN112486457B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361687B (en) * 2021-05-31 2023-03-24 天津大学 Configurable addition tree suitable for convolutional neural network training accelerator
CN115202616A (en) * 2022-06-24 2022-10-18 上海途擎微电子有限公司 Modular multiplier, security chip, electronic device and encryption method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020172355A1 (en) * 2001-04-04 2002-11-21 Chih-Chung Lu High-performance booth-encoded montgomery module
KR100458031B1 (en) * 2003-03-14 2004-11-26 삼성전자주식회사 Apparatus and method for performing a montgomery type modular multiplication
JP2004326112A (en) * 2003-04-25 2004-11-18 Samsung Electronics Co Ltd Multiple modulus selector, accumulator, montgomery multiplier, method of generating multiple modulus, method of producing partial product, accumulating method, method of performing montgomery multiplication, modulus selector, and booth recorder
JP4180024B2 (en) * 2004-07-09 2008-11-12 Necエレクトロニクス株式会社 Multiplication remainder calculator and information processing apparatus
CN100435090C (en) * 2005-08-18 2008-11-19 上海微科集成电路有限公司 Extensible high-radix Montgomery's modular multiplication algorithm and circuit structure thereof
CN101625634A (en) * 2008-07-09 2010-01-13 中国科学院半导体研究所 Reconfigurable multiplier
CN102999313B (en) * 2012-12-24 2016-01-20 飞天诚信科技股份有限公司 A kind of data processing method based on montgomery modulo multiplication
CN103226461B (en) * 2013-03-26 2016-07-06 中山大学 A kind of Montgomery modular multiplication method for circuit and circuit thereof
CN103761068B (en) * 2014-01-26 2017-02-01 上海交通大学 Optimized Montgomery modular multiplication hardware
KR102132261B1 (en) * 2014-03-31 2020-08-06 삼성전자주식회사 Method and apparatus for computing montgomery multiplication performing final reduction wihhout comparator

Also Published As

Publication number Publication date
CN112486457A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
KR100834178B1 (en) Multiply-accumulate mac unit for single-instruction/multiple-data simd instructions
CN112486457B (en) Hardware system for realizing improved FIOS modular multiplication algorithm
Kuang et al. Energy-efficient high-throughput Montgomery modular multipliers for RSA cryptosystems
CN115344237B (en) Data processing method combining Karatsuba and Montgomery modular multiplication
Cilardo Exploring the potential of threshold logic for cryptography-related operations
CN110058840A (en) A kind of low-consumption multiplier based on 4-Booth coding
CN103761068A (en) Optimized Montgomery modular multiplication method, optimized modular square method and optimized modular multiplication hardware
CN112540743A (en) Signed multiplication accumulator and method for reconfigurable processor
CN114895870B (en) Efficient reconfigurable SM2 dot multiplication method and system based on FPGA
CN113628094A (en) High-throughput SM2 digital signature computing system and method based on GPU
Lee et al. Subquadratic Space-Complexity Digit-Serial Multipliers Over $ GF (2^{m}) $ Using Generalized $(a, b) $-Way Karatsuba Algorithm
CN113794572A (en) Hardware implementation system and method for high-performance elliptic curve digital signature and signature verification
CN101304312B (en) Ciphering unit being suitable for compacting instruction set processor
CN109284085B (en) High-speed modular multiplication and modular exponentiation operation method and device based on FPGA
CN113872608B (en) Wallace tree compressor based on Xilinx FPGA primitive
Mekhallalati et al. Novel radix finite field multiplier for GF (2 m)
CN114089949A (en) Digital signal processor capable of directly supporting multi-operand addition operation
Gutub Merging GF (p) elliptic curve point adding and doubling on pipelined VLSI cryptographic ASIC architecture
Raghuram et al. A programmable processor for cryptography
Cui et al. A Hardware-Efficient Elliptic Curve Cryptographic Architecture over GF (p)
Miyamoto et al. Systematic design of high-radix Montgomery multipliers for RSA processors
Rao et al. Designing of ALU Block for RISC-V-Based Processor Core with Low Power
CN112068800B (en) Array compressor and large number multiplier with same
Jayasanthi Implementation of Power Efficient Multiply Accumulate Unit for DSP Applications
CN115658007A (en) High-bandwidth distributable pipeline-level parallel multiplier operation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant