WO2014101632A1 - 一种基于蒙哥马利模乘的数据处理方法 - Google Patents

一种基于蒙哥马利模乘的数据处理方法 Download PDF

Info

Publication number
WO2014101632A1
WO2014101632A1 PCT/CN2013/088305 CN2013088305W WO2014101632A1 WO 2014101632 A1 WO2014101632 A1 WO 2014101632A1 CN 2013088305 W CN2013088305 W CN 2013088305W WO 2014101632 A1 WO2014101632 A1 WO 2014101632A1
Authority
WO
WIPO (PCT)
Prior art keywords
random access
access memory
register
contents
cpu
Prior art date
Application number
PCT/CN2013/088305
Other languages
English (en)
French (fr)
Inventor
陆舟
于华章
Original Assignee
飞天诚信科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 飞天诚信科技股份有限公司 filed Critical 飞天诚信科技股份有限公司
Priority to US14/434,275 priority Critical patent/US9588696B2/en
Publication of WO2014101632A1 publication Critical patent/WO2014101632A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/728Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic using Montgomery reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device

Definitions

  • the Montgomery modular multiplication algorithm is considered to be the most efficient and the most suitable algorithm for hardware implementation.
  • the Montgomery algorithm and its L5 variant are mostly used in the design of modular multipliers for large integer modular multiplication.
  • the existing modular multipliers are designed to store intermediate results, and the next cycle needs to be read and used. Frequent reading and writing of storage devices, and reading and writing of storage devices take clock cycles, which affects the efficiency of the modular multiplier and reduces the data processing rate based on Montgomery modular multiplication, such as the existing Montgomery algorithm.
  • the shortcomings of slow operation on hardware can lead to low efficiency and low speed of encryption algorithms such as RSA and ECC.
  • the present invention proposes an efficient data processing method based on Montgomery modular multiplication.
  • the technical solution adopted by the present invention is: a data processing method based on Montgomery modular multiplication, a first random access memory for storing a multiplier, a second random access memory for storing a multiplicand, and a third random access memory 15 for storing a modulus
  • the method includes the following steps:
  • Step 1 The CPU initializes a fifth random access memory, and initializes a first offset and a second offset.
  • the first offset is used to indicate an offset address in the second random access memory relative to the base address.
  • the second offset is used to indicate an offset address in the fifth random access memory relative to the base address.
  • Step 2 The CPU reads a word from the second random access memory according to the first offset and writes the first operation register;
  • Step 3 The CPU calls the multiply-add module to perform an operation of adding the content of the first operation register to the content of the first random storage memory after multiplying the content of the first random storage memory, according to the second offset Writing the obtained operation result from the lower bit to the upper bit into the fifth random access memory;
  • Step 4 The CPU reads a word from the fifth random memory according to the second offset and writes the second operation register, multiplies the content of the second operation register and the content of the constant register, and multiplies The lower word of the result is written to the fourth register;
  • Step 5 The CPU reads the contents of the fourth register, the third random access memory and the fifth random access memory, and calls the multiply-add module to execute the content of the fourth register and the content of the third random access memory. Subsequently adding the content of the fifth random access memory, adding a second offset to the second offset, and writing the obtained operation result from the lower bit to the upper bit according to the second offset; Step 6: The CPU determines whether the first offset is equal to the preset step size, if yes, step 8 is performed; otherwise, L5 performs step 7;
  • Step 7 Add 1 to the first offset, and return to step 2;
  • Step 8 The CPU reads the contents of the fifth random access memory and the third random access memory, and determines whether the value of the content of the fifth random access memory is greater than or equal to the value of the content of the third random access memory. Otherwise, go to step 10;
  • Step 9 The CPU subtracts the content of the read fifth random memory from the content of the third random memory, and writes the subtraction result from the low to the high according to the second offset to the fifth random Memory, perform step 10;
  • Step 10 The CPU outputs the content of the fifth random access memory.
  • the step 2 includes: reading a corresponding offset from the base address of the second random access memory to the left and writing the corresponding word to the first arithmetic register.
  • the multiply-add module is used to implement multiplication and addition operations supported by the CPU.
  • Step 3 The operation of the CPU calling the multiply-add module to perform the multiplication of the content of the first operation register with the content of the first random access memory and adding the content of the fifth random access memory includes the following steps: Step 201: The CPU determines whether the first offset is 0, if yes, step 210 is performed, otherwise step 202 is performed;
  • Step 202 Initialize a carry register, and initialize an index variable
  • Step 203 The CPU obtains one word from each of the first random memory and the fifth random memory according to the index variable, and writes the same into the third operation register and the fourth operation register respectively;
  • Step 204 The CPU multiplies the content of the first operation register and the content of the third operation register, and the multiplication result is added to the content of the fourth operation register to obtain a first calculation result;
  • Step 205 The CPU adds the first calculation result to the content of the carry register to obtain a second calculation result
  • Step 206 The CPU writes the first word of the second calculation result into the carry register, and the remaining words are written into the fifth random memory according to the index variable;
  • Step 207 The CPU determines whether the index variable is equal to the preset word length, if yes, step 209 is performed, otherwise step 208 is performed;
  • Step 208 The index variable is incremented by 1, and then returns to step 203;
  • Step 209 The CPU reads the contents of the carry register, and writes the contents of the carry register to the fifth random access memory according to the index variable;
  • Step 210 The CPU outputs the content of the fifth random memory as the operation result of step 3.
  • the step 203 includes: acquiring, from the first random access memory and the fifth random access memory, words on the storage location corresponding to the current index variable, respectively, into the third arithmetic register and the fourth arithmetic register.
  • Step 206 the remaining words are written into the fifth random register according to the index variable
  • the CPU includes: the CPU calculates the remaining words of the second calculation result except the first word from the lower order to the upper order from the fifth random memory.
  • the storage locations corresponding to the current index variables are written in order.
  • the steps 3 and 5, according to the second offset, writing the obtained operation result from the lower bit to the upper bit into the fifth random access memory include: starting the operation result from the base address in the fifth random access memory 15 in the order from the lower bit to the upper bit The position shifted to the left by the second offset is sequentially written.
  • Step 4 the CPU reads a word from the fifth random memory according to the second offset, and writes the second operation register to: the reading is performed after the second offset is shifted to the left from the base address of the fifth random memory. The word is written to the second arithmetic register.
  • Step 5 calling the multiply and add module to execute the content of the fourth register and the third random storage
  • the operation of adding the contents of the memory to the content of the fifth random access memory includes the following steps: Step 301: The CPU determines whether the first offset is 0, if yes, step 310 is performed, otherwise step 302 is performed;
  • Step 302 Initialize the carry register, initialize the index variable
  • Step 303 The CPU obtains one word from each of the third random access memory and the fifth random access memory according to the index variable, and writes the third operation register and the fourth operation register respectively;
  • Step 304 The CPU multiplies the content of the third operation register by the content of the fourth register, and the multiplication result is added to the content of the fourth operation register to obtain a third calculation result;
  • Step 305 The CPU adds the third calculation result to the content of the carry register to obtain a fourth calculation result.
  • Step 306 The CPU writes the first word of the fourth calculation result into the carry register, and the remaining words are written into the fifth random memory according to the index variable;
  • Step 307 The CPU determines whether the index variable is equal to the preset word length, if yes, step 309 is performed, otherwise step 308 is performed;
  • L5 step 308 the index variable is incremented by 1, and then returns to step 303;
  • Step 309 The CPU reads the content of the carry register, and writes the content of the carry register into the fifth random access memory according to the index variable;
  • Step 310 The CPU outputs the content of the fifth random access memory as the operation result of step 5.
  • Step 9: writing the subtraction result from the lower bit to the upper bit according to the second offset into the fifth random storage device includes: subtracting the subtraction result from the base address in the fifth random memory to the left in the order from the low bit to the upper bit The position at which the second offset is shifted starts to be written in order.
  • the invention provides an efficient data processing method based on Montgomery modular multiplication, which can reduce the cycle of system operation and improve the data processing rate by increasing the modulus multiplication efficiency, especially
  • FIG. 1 is a flow chart of a data processing method based on Montgomery modular multiplication according to the present invention.
  • FIG. 2 is a schematic diagram of a process of calling a multiply-add module according to step 103 in FIG. 1;
  • FIG. 3 is a schematic diagram of a process of calling the multiply-add module to perform the operation in step 106 of FIG. detailed description
  • This embodiment provides a data processing method based on Montgomery modular multiplication, and takes a modular multiplication operation process performed by a CPU processor of a computer as an example.
  • the first random access memory is used to store the multiplier in the modular multiplication operation
  • the second random access memory is used to store the multiplicand in the modular multiplication operation
  • the third random access memory is used to store the modulus in the modular multiplication operation.
  • the fourth register is used to store the intermediate L0 operand
  • the fifth random memory is used to store the operation result of the modular multiplication operation and the operation result of the multiplication and addition module in the modular multiplication operation
  • the constant register is used to store the preset constant.
  • the storage space of the first random access memory, the second random access memory, and the third random access memory selected according to the large number of participating operations is greater than equal L5 to n, and may be selected as n to reduce the occupied space, and the storage space of the fifth random access memory is greater than or equal to 2n. +l , can be selected as 2n+l to reduce the occupied space; the storage space of the fourth register and constant register is greater than or equal to 1 word, which can be selected as 1 word to reduce the occupied space.
  • the data processing method based on Montgomery modular multiplication performed by the CPU processor includes steps 101-112.
  • Step 101 Initialize the fifth random access memory, and initialize the first offset and the second offset.
  • the content in the fifth random access memory is initialized to 0, and the first offset and the second offset are initialized to 0.
  • the first offset is used to represent an offset address in the second random access memory relative to the base address.
  • the second offset is used to represent an offset address in the fifth random access memory relative to the base address.
  • Step 102 Write a word from the second random access memory to the first operation 15 register according to the first offset.
  • the reading a word from the second random memory according to the first offset into the first operation register comprises: reading a corresponding word write after moving the first offset from the base address of the second random memory to the left Enter the first arithmetic register.
  • the size of the storage space of the first operation register may be 1 word, and is used for A word read from the second random access memory according to the first offset during the calculation is stored.
  • the content of the second random access memory is:
  • Step 103 The multiplication and addition module is called to perform operations on the contents of the first operation register, the first random access memory, and the fifth random access memory, and the obtained operation result is written from the lower bit to the upper bit to the L0 fifth random access memory according to the second offset.
  • the multiply-add module can implement a multiplication operation and an addition operation supported by the CPU, and is used to multiply the content of the first operation register with the content of the first random access memory, and then add the content of the fifth random access memory and obtain an operation. As a result, the operation result is then written into the fifth random access memory according to the second offset.
  • writing, by the second offset, the operation result to the fifth random memory according to the second offset includes: sequentially shifting the operation result from the lower order to the upper order from the base address in the fifth random memory to the left shift by the second offset Write.
  • the content of the first operation register participating in the operation is: CA6F360C; the content of the fifth random access memory is 0.
  • the content of the first random access memory is:
  • the contents of the fifth random access memory are:
  • Step 104 Write a word from the fifth random memory to the second operation register according to the second offset, multiply the content of the second operation register and the constant register, and write the lower word of the multiplication result into the first Four registers.
  • the reading a word from the fifth random memory according to the second offset into the second operation register comprises: reading a corresponding word write after moving the second offset from the base address of the fifth random memory to the left L0 enters the second arithmetic register.
  • the size of the storage space of the second operation register may be 1 word, and is used to store a word read from the fifth random memory according to the second offset during the calculation.
  • the content of the second operation register that participates in the operation in this embodiment is: AD311CB0, and the content of the constant register is: 1A788E41.
  • Step 105 Read the contents of the fourth register, the third random access memory, and the fifth random access memory.
  • the content of the read third random access memory is:
  • Step 106 The multiplication and addition module is called to perform operations on the contents of the fourth register, the third random access memory, and the fifth random access memory, and the second offset is incremented by one, and the obtained operation result is obtained from the low to the high position according to the second offset. Write to the fifth random access memory.
  • the multiply-add module can implement a multiplication operation and an addition operation supported by the CPU, and is used for multiplying the content of the fourth register by the content of the third random access memory, and adding the content of the fifth random access memory to obtain An operation result, and then the second offset is incremented by one, and the operation result is written into the fifth random access memory according to the second offset.
  • the writing the obtained operation result from the lower bit to the upper bit according to the second offset amount into the fifth random memory includes: starting the operation result from the lower address to the upper bit order from the base address in the fifth random memory The position where the current second offset is shifted to the left is written in order.
  • the content of the fifth random access memory is:
  • Step 107 Determine whether the first offset is equal to the preset step size, if yes, go to step 109, otherwise go to step 108.
  • the preset step size is 15.
  • Step 108 The first offset is incremented by 1, and the process returns to step 102.
  • Step 109 Read the contents of the fifth random access memory and read the contents of the third random access memory.
  • Step 110 Determine whether the value of the content of the read fifth random access memory is greater than or equal to the value of the content of the third random access memory. If yes, go to step 111; otherwise, go to step 112.
  • Step 111 Subtracting the content of the read fifth random access memory from the content of the third random access memory, and writing the subtraction result from the lower bit to the upper bit into the fifth random access memory according to the second offset, and performing step 10 112.
  • the writing the subtraction result from the lower bit to the upper bit according to the second offset into the fifth random memory includes: starting the subtraction result from the lower address to the upper order starting from the base address in the fifth random memory. The position at which the second offset is shifted starts to be written in order.
  • Step 112 Output the content of the fifth random access memory.
  • the content of the fifth random access memory outputted in this step in this embodiment is the result of the modular multiplication operation.
  • the output of the fifth random access memory is:
  • the multiply-add module mentioned in this embodiment is used to implement the multiplication and addition operations of the large numbers supported by the CPU, and the multiplication and addition of the large number implemented by the multiplication and addition module in steps 103 and 106 in FIG. A detailed description will be given as an example.
  • the carry register is used to store the carry in the multiplication and addition operations. The size of the storage space of the carry register is greater than or equal to 1 word, and 1 word can be selected to reduce the occupied space.
  • the process of invoking the multiply-add module on the contents of the first arithmetic register, the first random access memory, and the fifth random memory in step 103 in FIG. 1 may be as shown in FIG. 2, including step 201 - step L0 210.
  • Step 201 The CPU determines whether the first offset is 0. If yes, step 210 is performed; otherwise, step 202 is performed.
  • the first offset is used to represent an offset address in the second random access memory relative to the base address.
  • Step 202 The CPU initializes the carry register and initializes the index variable.
  • the initialization carry register may be selected to initialize the content of the carry register to 0, and the initialization index variable may be initialized to initialize the index variable to 1.
  • Step 203 The CPU obtains one word from each of the first random access memory and the fifth random access memory according to the index variable, and writes the third arithmetic register and the fourth arithmetic register respectively.
  • acquiring one word from each of the first random access memory and the fifth random access memory according to the index variable and writing the third arithmetic register and the fourth arithmetic register respectively includes: acquiring from the first random access memory and the fifth random access memory The words at the storage locations corresponding to the current index variable are written to the third arithmetic register and the fourth arithmetic register, respectively. For example, if the current index variable is 1, the first word of the first random memory is written into the third operation register, and the first word of the fifth random memory is written.
  • the third operation register is configured to store a multiplier in the multiply-add operation
  • the fourth operation register is configured to store an addend in the multiply-add operation
  • the size of the storage space of the fourth arithmetic register can be selected as 1 word.
  • Step 204 The CPU multiplies the content of the first operation register by the content of the third operation register.
  • the multiplication result is added to the content of the fourth operation register to obtain a first calculation result.
  • the first calculation result may be stored in a temporary register, and the size of the storage space of the temporary register may be selected as 2 words, and before the step 205 is performed, the first calculation is read from the temporary register. result.
  • Step 205 The CPU adds the first calculation result to the content of the carry register to obtain a second calculation result.
  • Step 206 The CPU writes the first word of the second calculation result into the carry register, and the remaining words are written into the fifth random memory according to the index variable.
  • the remaining words are written into the fifth random register according to the index variable
  • the CPU includes: the CPU replaces the remaining words of the L0 second calculation result from the lower order to the upper order from the lower random register to the upper random register.
  • the storage locations corresponding to the current index variables are written in order. For example, if the current index variable is 1, the remaining words of the second operation result except the one of the most significant bits are sequentially written from the lower word of the fifth random memory.
  • Step 207 The CPU determines whether the index variable is equal to the preset word length, if yes, step 209 is performed, and if no, the step 208 is performed.
  • the CPU determines whether the index variable is equal to n, and executes step 209, otherwise performs step 208.
  • Step 208 The index variable is incremented by 1, and then returns to step 203.
  • Step 209 The CPU reads the contents of the carry register and writes the contents of the carry register to the fifth random access memory according to the index variable.
  • the CPU writes the contents of the read carry register to the storage location corresponding to the n+1th word in the fifth random access memory.
  • Step 210 The CPU outputs the content of the fifth random access memory.
  • the content of the fifth random access memory outputted in this step is the operation result of the multiplication and addition operation.
  • the process of calling the multiply-add module to operate on the read number in step 106 of FIG. 1 may be as shown in FIG. 3, including steps 301-310.
  • Step 301 The CPU determines whether the first offset is 0. If yes, step 310 is performed, otherwise step 302 is performed.
  • the first offset is used to represent an offset address in the second random access memory relative to the base address.
  • Step 302 The CPU initializes the carry register and initializes the index variable.
  • the initialization carry register may be selected to initialize the content of the carry register to 0, and the initialization index variable may be initialized to initialize the index variable to 1.
  • Step 303 The CPU obtains one word from each of the third random access memory and the fifth random access memory according to the index variable, and writes the third arithmetic register and the fourth arithmetic register respectively.
  • acquiring a word from each of the third random memory and the fifth random memory according to the index variable and writing the third operation register and the fourth operation register respectively includes: acquiring from the third random access memory and the L0 fifth random access memory.
  • the words at the storage locations corresponding to the current index variable are written to the third arithmetic register and the fourth arithmetic register, respectively. For example, if the current index variable is n, the nth word from the lowest bit of the acquired third random memory is written into the third operation register, and the nth word of the lowest bit of the obtained fifth random memory is written into the Four arithmetic registers.
  • the third operation register is configured to store a multiplier in the multiply-add operation
  • the fifth operation register is configured to store an addend in the multiply-add operation
  • the size of the storage space of the register and the fourth arithmetic register can be selected as 1 word.
  • Step 304 The CPU multiplies the content of the third operation register by the content of the fourth register, and the multiplication result is added to the content of the fourth operation register to obtain a third calculation result.
  • the third calculation result may be stored in a temporary register, and the storage space of the temporary storage device may be selected as 2 machine word lengths, and includes reading from the temporary register before performing step 305.
  • the third calculation result may be stored in a temporary register, and the storage space of the temporary storage device may be selected as 2 machine word lengths, and includes reading from the temporary register before performing step 305. The third calculation result.
  • Step 305 The CPU adds the third calculation result to the content of the carry register to obtain a fourth calculation result.
  • Step 306 The CPU writes the first word of the fourth calculation result into the carry register, and the remaining 15 words are written into the fifth random memory according to the index variable.
  • the rest of the words are written into the fifth random register according to the index variable
  • the CPU extracts the remaining words of the fourth calculation result except the upper one from the lower order to the upper order from the fifth random memory.
  • the storage locations corresponding to the current index variables are written in order. For example, if the current index variable is n, the remaining words of the fourth calculation result except the upper one are from the fifth random
  • the nth word in the lower bits of the memory is written in order.
  • Step 307 The CPU determines whether the index variable is equal to the preset word length. If yes, step 309 is performed, otherwise step 308 is performed.
  • Step 308 Add 1 to the index variable, and then return to step 303.
  • Step 309 The CPU reads the contents of the carry register and writes the contents of the carry register to the fifth random access memory according to the index variable.
  • the CPU writes the contents of the read carry register to the storage location corresponding to the n+1th word in the fifth random access memory L0.
  • Step 310 The CPU outputs the content of the fifth random access memory.
  • the content of the fifth random access memory outputted in this step is the operation result of the multiplication and addition operation.
  • the above is only a specific embodiment of the present invention, but the scope of the present invention is not limited to L5. Any person skilled in the art can easily think of changes or substitutions within the technical scope of the present disclosure. All should be covered by the scope of the present invention. Therefore, the scope of protection of the invention should be determined by the scope of the claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

本发明公开了一种基于蒙哥马利模乘的数据处理方法,该方法包括:CPU初始化第五随机存储器,对第二随机存储器中内容按字为单位执行以下操作:1)调用乘加模块将第二随机存储器中内容的一个字与第一随机存储器的内容相乘后与第五随机存储器的内容相加,2)从1)的运算结果中提取一个字和常数寄存器的内容相乘,并将相乘结果的低位字写入第四寄存器,3)调用乘加模块将第四寄存器的内容与第三随机存储器的内容相乘后与第五随机存储器的内容相加;最后根据第三存储器中内容长度输出第五随机存储器的内容。

Description

一种基于蒙哥马利模乘的数据处理方法 本申请要求于 2012 年 12 月 24 日提交中国专利局、 申请号为 201210566979.X, 发明名称为 "一种基于蒙哥马利模乘的数据处理方法" 5 的中国专利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域 本发明属于计算机技术领域,尤其涉及一种基于蒙哥马利模乘的数据处理 方法。
L0
背景技术
在目前的大整数模乘算法的硬件实现中, 蒙哥马利 (Montgomery)模 乘算法被认为是最高效的, 也是最适合用硬件实现的一种算法。 目前, 实现大整数模乘的模乘器的设计中大多采用 Montgomery算法及其 L5 变形,现有的模乘器的设计是对中间结果进行储存, 下一循环需要用到时再读 取, 需要对存储设备进行频繁的读写, 而对存储设备的读写是需要花费时钟周 期的,从而影响了模乘器的工作效率,降低基于蒙哥马利模乘的数据处理速率, 比如现有的 Montgomery算法存在的在硬件上实现起来运算速度慢的缺陷, 会 导致 RSA、 ECC等加密算法的效率低, 速度低。
10 发明内容
为解决现有技术中存在的问题,本发明提出了一种高效的基于蒙哥马利模 乘的数据处理方法。
本发明采取的技术方案是: 一种基于蒙哥马利模乘的数据处理方法, 第一 随机存储器用于存储乘数, 第二随机存储器用于存储被乘数, 第三随机存储器 15 用于存储模数, 所述方法包括以下步骤:
步骤 1 : CPU初始化第五随机存储器, 初始化第一偏移量、 第二偏移量; 其中,所述第一偏移量用于表示所述第二随机存储器中相对于基地址的偏移地 址, 所述第二偏移量用于表示所述第五随机存储器中相对于基地址的偏移地 址。 步骤 2: CPU根据第一偏移量从所述第二随机存储器中读取一个字写入第 一运算寄存器;
步骤 3: CPU调用乘加模块执行第一运算寄存器的内容与所述第一随机存 5 储器的内容相乘后与所述第五随机存储器的内容相加的运算,根据第二偏移量 将得到的运算结果从低位到高位写入所述第五随机存储器;
步骤 4: CPU根据第二偏移量从所述第五随机存储器中读取一个字写入第 二运算寄存器,将所述第二运算寄存器的内容和常数寄存器的内容相乘, 并将 相乘结果的低位字写入第四寄存器;
L0 步骤 5: CPU读取所述第四寄存器、 所述第三随机存储器和所述第五随机 存储器的内容,调用乘加模块执行所述第四寄存器的内容与所述第三随机存储 器的内容相乘后与所述第五随机存储器的内容相加的运算, 将第二偏移量加 1 ,根据第二偏移量将得到的运算结果从低位到高位写入所述第五随机存储器; 步骤 6: CPU判断第一偏移量是否等于预设步长, 是则执行步骤 8, 否则 L5 执行步骤 7;
步骤 7: 第一偏移量加 1 , 返回执行步骤 2;
步骤 8: CPU读取所述第五随机存储器和所述第三随机存储器的内容, 判 断所述第五随机存储器的内容的值是否大于等于第三随机存储器的内容的值, 是则执行步骤 9, 否则执行步骤 10;
10 步骤 9: CPU用读取到的所述第五随机存储器的内容与第三随机存储器的 内容相减,并根据第二偏移量将相减结果从低位到高位写入所述第五随机存储 器, 执行步骤 10;
步骤 10: CPU输出所述第五随机存储器的内容。
所述步骤 2包括:读取从第二随机存储器的基地址起向左移动第一偏移量 15 后对应的字写入第一运算寄存器。
所述乘加模块用于实现 CPU支持的乘法和加法运算。
步骤 3 所述 CPU调用乘加模块执行第一运算寄存器的内容与所述第一随 机存储器的内容相乘后与所述第五随机存储器的内容相加的运算包括以下步 骤: 步骤 201 : CPU判断第一偏移量是否为 0, 是则执行步骤 210, 否则执行 步骤 202;
步骤 202: 初始化进位寄存器, 初始化索引变量;
步骤 203: CPU根据索引变量从第一随机存储器和第五随机存储器中各获 5 取一个字分别写入第三运算寄存器和第四运算寄存器;
步骤 204: CPU将第一运算寄存器的内容和第三运算寄存器的内容相乘, 相乘结果与第四运算寄存器的内容相加得到第一计算结果;
步骤 205: CPU将第一计算结果与进位寄存器的内容相加得到第二计算结 果;
L0 步骤 206: CPU将第二计算结果的高位起第一个字写入进位寄存器, 其余 字根据索引变量写入第五随机存储器;
步骤 207: CPU判断索引变量是否等于预设字长, 是则执行步骤 209, 否 则执行步骤 208;
步骤 208: 索引变量加 1 , 然后返回执行步骤 203;
L5 步骤 209: CPU读取进位寄存器的内容, 并根据索引变量将进位寄存器的 内容写入第五随机存储器;
步骤 210: CPU输出第五随机存储器的内容作为步骤 3所述运算结果。 所述步骤 203包括:从第一随机存储器和第五随机存储器中获取与当前索 引变量对应的存储位置上的字分别写入第三运算寄存器和第四运算寄存器。 10 步骤 206所述其余字根据索引变量写入第五随机寄存器包括: CPU将第 二计算结果中除高位起第一个字之外的其余字按照从低位到高位的顺序从第 五随机存储器中与当前索引变量对应的存储位置开始依次写入。
步骤 3和步骤 5所述根据第二偏移量将得到的运算结果从低位到高位写入 第五随机存储器包括:将运算结果按照从低位到高位的顺序从第五随机存储器 15 中基地址起向左移第二偏移量的位置开始依次写入。
步骤 4所述 CPU根据第二偏移量从第五随机存储器中读取一个字写入第 二运算寄存器包括:读取从第五随机存储器的基地址起向左移动第二偏移量后 对应的字写入第二运算寄存器。
步骤 5 所述调用乘加模块执行所述第四寄存器的内容与所述第三随机存 储器的内容相乘后与所述第五随机存储器的内容相加的运算, 包括以下步骤: 步骤 301 : CPU判断第一偏移量是否为 0, 是则执行步骤 310, 否则执行 步骤 302;
步骤 302: 初始化进位寄存器, 初始化索引变量;
5 步骤 303: CPU根据索引变量从第三随机存储器和第五随机存储器中各获 取一个字分别写入第三运算寄存器和第四运算寄存器;
步骤 304: CPU将第三运算寄存器的内容与第四寄存器的内容相乘,相乘 结果与第四运算寄存器的内容相加得到第三计算结果;
步骤 305: CPU将第三计算结果与进位寄存器的内容相加得到第四计算结
L0 果;
步骤 306: CPU将第四计算结果的高位起第一个字写入进位寄存器, 其余 字根据索引变量写入第五随机存储器;
步骤 307: CPU判断索引变量是否等于预设字长, 是则执行步骤 309, 否 则执行步骤 308;
L5 步骤 308: 索引变量加 1 , 然后返回执行步骤 303;
步骤 309: CPU读取进位寄存器的内容, 并根据索引变量将进位寄存器的 内容写入第五随机存储器;
步骤 310: CPU输出第五随机存储器的内容作为步骤 5所述运算结果。 步骤 9 所述根据第二偏移量将相减结果从低位到高位写入第五随机存储 10 器包括:将相减结果按照从低位到高位的顺序从第五随机存储器中基地址起向 左移第二偏移量的位置开始依次写入。 本发明提供的一种高效的基于蒙哥马利模乘的数据处理方法,能够达到降 低系统运行的周期和通过提高模乘效率提高数据处理速率的效果,尤其使其应
15 用于数据加密算法中时, 能够提高数据加解密的效率和速度。 附图说明
图 1是本发明提出的一种基于蒙哥马利模乘的数据处理方法流程 图 2是图 1中步骤 103所述调用乘加模块进行运算过程示意图; 图 3是图 1中步骤 106所述调用乘加模块进行运算的过程示意图。 具体实施方式
为使本发明的目的、技术方案和优点更加清楚, 下面将结合附图对本发明 5 实施方式作进一步地详细描述。
本实施例提供了一种基于蒙哥马利模乘的数据处理方法,以计算机的 CPU 处理器对数据进行的模乘运算处理过程为例进行介绍。本实施例中, 第一随机 存储器用于存储模乘运算中的乘数,第二随机存储器用于存储模乘运算中的被 乘数, 第三随机存储器用于存储模乘运算中的模数, 第四寄存器用于存储中间 L0 操作数,第五随机存储器用于存储模乘运算的运算结果和模乘运算过程中的乘 加模块的运算结果, 常数寄存器用于存储预设常数。
本实施例中, 模乘运算的乘数、 被乘数和模数的长度用机器字长 n表示, n > 0, 可选一个字的位长为 32位, 可选 n=16即 512位。 根据参与运算的大数 选取的第一随机存储器、第二随机存储器和第三随机存储器的存储空间大于等 L5 于 n,可选取为 n以减少占用空间,第五随机存储器的存储空间大于等于 2n+l , 可选取为 2n+l以减少占用空间; 第四寄存器、 常数寄存器的存储空间大于等 于 1个字, 可选取为 1个字以减少占用空间。
如图 1所示, CPU处理器执行的基于蒙哥马利模乘的数据处理方法, 包 括步骤 101-步骤 112。
10 步骤 101: 初始化第五随机存储器, 初始化第一偏移量、 第二偏移量。
将第五随机存储器中的内容初始化为 0, 将第一偏移量、 第二偏移量初始 化为 0。 所述第一偏移量用于表示第二随机存储器中相对于基地址的偏移地 址。 所述第二偏移量用于表示第五随机存储器中相对于基地址的偏移地址。
步骤 102: 根据第一偏移量从第二随机存储器中读取一个字写入第一运算 15 寄存器。
所述根据第一偏移量从第二随机存储器中读取一个字写入第一运算寄存 器包括:读取从第二随机存储器的基地址起向左移动第一偏移量后对应的字写 入第一运算寄存器。
本实施例中, 所述第一运算寄存器的存储空间的大小可选为 1个字, 用于 存放计算过程中根据第一偏移量从第二随机存储器中读取的一个字。
可选地, 本实施例中, 第二随机存储器的内容为:
91D46B9B F7BF6BB6 37EF4369 9B20C28E
5C312C18 83F0AB86 CE7D029D 67400BCB
5 CB024F12 9EFEC843 C7BA6010 97275C41
84FA3D48 FF5CA205 761382C0 CA6F360C
当第一偏移量为 0时, 从第二随机存储器中读取的一个字为 CA6F360C。 步骤 103: 对第一运算寄存器、 第一随机存储器和第五随机存储器的内容 调用乘加模块进行运算,根据第二偏移量将得到的运算结果从低位到高位写入 L0 第五随机存储器。
本实施例中, 乘加模块可以实现 CPU支持的乘法运算和加法运算, 用于 将第一运算寄存器与第一随机存储器的内容相乘,再与第五随机存储器的内容 相加并得到一个运算结果,然后根据第二偏移量将运算结果写入第五随机存储 器中。
L5 所述根据第二偏移量将运算结果写入第五随机存储器包括:将运算结果按 照从低位到高位的顺序从第五随机存储器中基地址开始左移第二偏移量的位 置开始依次写入。
可选地, 本实施例中, 当第一偏移量为 0时, 参与运算的第一运算寄存器 的内容为: CA6F360C; 第五随机存储器的内容为 0。
10 第一随机存储器的内容为:
FA371FB2 CA0972D1 A51D20FC D9B12C38
830024AE 5F66E7C7 B13C5C14 17D0A993
5EF27616 D1D36B0E 9E3015E2 37CB5C8F
3F7979D9 CC2085D2 D0E2B6BD E4D00064。
15 骤写入第五随机存储器的内容为:
Figure imgf000008_0001
00000000 00000000 00000000 C5DC31BD 2D3641B1 ABD92E50 B3BB127C 5780E849
AAA110CE 3F267692 D1C1873E F15E853E
CDBC2679 62A6A22B 8BFB6695 AD40EDEC
49E6D2F6 CCFC3470 B00EF5A9 AD311CB0。
5 步骤 104: 根据第二偏移量从第五随机存储器读取一个字写入第二运算寄 存器,将第二运算寄存器和常数寄存器的内容相乘, 并将相乘结果的低位字写 入第四寄存器。
所述根据第二偏移量从第五随机存储器中读取一个字写入第二运算寄存 器包括:读取从第五随机存储器的基地址起向左移动第二偏移量后对应的字写 L0 入第二运算寄存器。所述第二运算寄存器的存储空间的大小可选为 1个字, 用 于存放计算过程中根据第二偏移量从第五随机存储器中读取的一个字。
可选地, 本实施例中参与运算的写入第二运算寄存器的内容为: AD311CB0, 常数寄存器的内容为: 1A788E41。
本步骤写入第四寄存器的内容为: 89E1E8B0。
L5 步骤 105: 读取第四寄存器、 第三随机存储器和第五随机存储器的内容。
可选地, 本实施例中, 读取到的第三随机存储器的内容为:
A9E55F8A A3D41743 634D40B3 646FA84E
7628CEAB 9B597420 4F226B6F 80E6AECF
76CE3C52 0632A7EF 8053CEC7 A30E4F9D
10 BFE8E6A4 E4A32F00 81564573 B4117E3F。
步骤 106: 对第四寄存器、 第三随机存储器和第五随机存储器的内容调用 乘加模块进行运算, 将第二偏移量加 1 , 根据第二偏移量将得到的运算结果从 低位到高位写入第五随机存储器。
本实施例中, 乘加模块可以实现 CPU支持的乘法运算和加法运算, 用于 15 将第四寄存器的内容与第三随机存储器的内容相乘,再与第五随机存储器的内 容相加并得到一个运算结果,然后第二偏移量加 1后根据第二偏移量将运算结 果写入第五随机存储器中。
所述根据第二偏移量将得到的运算结果从低位到高位写入第五随机存储 器包括:将运算结果按照从低位到高位的顺序从第五随机存储器中基地址开始 左移当前第二偏移量的位置开始依次写入。
可选地, 骤写入第五随机存储器的内容为:
Figure imgf000010_0001
00000000 00000000 00000001 215DDEE4
04041356 051CDD28 D9E5280B 7EFD69C7
5CF11456 78A28D3F 83C42F05 494E8116
4B9882E0 52D7C671 4CB773CA 2497EF6C
L0 40531B1B 746DD1FD C05E3055 00000000。
步骤 107: 判断第一偏移量是否等于预设步长, 是则执行步骤 109, 否则 执行步骤 108。
本实施例中, 预设步长为 15。
步骤 108: 第一偏移量加 1 , 返回执行步骤 102。
L5 步骤 109: 读取第五随机存储器的内容, 读取第三随机存储器的内容。
步骤 110: 判断读取的第五随机存储器的内容的值是否大于等于第三随机 存储器的内容的值, 是则执行步骤 111 , 否则执行步骤 112。
步骤 111: 用读取到的第五随机存储器的内容与第三随机存储器的内容相 减, 并根据第二偏移量将相减结果从低位到高位写入第五随机存储器,执行步 10 骤 112。
本实施例中,所述根据第二偏移量将相减结果从低位到高位写入第五随机 存储器包括:将相减结果按照从低位到高位的顺序从第五随机存储器中基地址 开始左移第二偏移量的位置开始依次写入。
步骤 112: 输出第五随机存储器的内容。
15 本实施例中本步骤输出的第五随机存储器的内容即为模乘运算的结果。可 选地, 输出的第五随机存储器的内容为:
65F36D6C AD704FF4 06219952 FA62DCC6
0F9892D1 BBC23E74 1EFECDE3 4717BDA3
55545D9E 18A97A65 59EB8832 F31DD5BC 397DA4B5 773E8EB3 8F89123B 0A05453E。 本实施例中提到的乘加模块用于实现 CPU支持的大数的乘法运算和加法 运算,下面以图 1中步骤 103和步骤 106采用乘加模块实现的大数的乘法和加 5 法运算为例进行详细描述。其中, 进位寄存器用于存放乘法和加法运算中的进 位, 进位寄存器的存储空间的大小大于等于 1个字, 可选取 1个字以减少占用 空间。
图 1中步骤 103中所述对第一运算寄存器、第一随机存储器和第五随机存 储器的内容调用乘加模块进行运算的过程可以如图 2所示,包括步骤 201-步骤 L0 210。
步骤 201 : CPU判断第一偏移量是否为 0, 是则执行步骤 210, 否则执行 步骤 202。
本实施例中,所述第一偏移量用于表示第二随机存储器中相对于基地址的 偏移地址。
L5 步骤 202: CPU初始化进位寄存器, 初始化索引变量。
本实施例中, 初始化进位寄存器可选为将进位寄存器的内容初始化为 0, 初始化索引变量可选为将索引变量初始化为 1。
步骤 203: CPU根据索引变量从第一随机存储器和第五随机存储器中各获 取一个字分别写入第三运算寄存器和第四运算寄存器。
10 本实施例,根据索引变量从第一随机存储器和第五随机存储器中各获取一 个字分别写入第三运算寄存器和第四运算寄存器包括:从第一随机存储器和第 五随机存储器中获取与当前索引变量对应的存储位置上的字分别写入第三运 算寄存器和第四运算寄存器。 例如, 当前索引变量为 1 , 则将第一随机存储器 的低位起第一个字写入第三运算寄存器,将第五随机存储器低位起第一个字写
15 入第四运算寄存器。
可选地, 本实施例中, 所述第三运算寄存器用于存储乘加运算中的乘数, 所述第四运算寄存器用于存储乘加运算中的被加数,所述第三运算寄存器和第 四运算寄存器的存储空间的大小可选为 1个字。
步骤 204: CPU将第一运算寄存器的内容和第三运算寄存器的内容相乘, 相乘结果与第四运算寄存器的内容相加得到第一计算结果。
本实施例中, 所述第一计算结果可以存放在一个临时寄存器中, 该临时寄 存器的存储空间的大小可选为 2个字,在执行步骤 205之前包括从该临时寄存 器中读取第一计算结果。
5 步骤 205: CPU将第一计算结果与进位寄存器的内容相加得到第二计算结 果。
步骤 206: CPU将第二计算结果的高位起第一个字写入进位寄存器, 其余 字根据索引变量写入第五随机存储器。
本实施例中, 其余字根据索引变量写入第五随机寄存器包括: CPU将第 L0 二计算结果中除高位起一个字之外的其余字按照从低位到高位的顺序从第五 随机存储器中与当前索引变量对应的存储位置开始依次写入。例如, 当前索引 变量为 1 , 则将第二运算结果中除最高位的一个字之前的其余字从第五随机存 储器中低位起第一个字开始依次写入。
步骤 207: CPU判断索引变量是否等于预设字长, 是则执行步骤 209, 否 L5 则执行步骤 208。
本实施例中具体地, CPU判断索引变量是否等于 n, 是执行步骤 209, 否 则执行步骤 208。
步骤 208: 索引变量加 1 , 然后返回执行步骤 203。
步骤 209: CPU读取进位寄存器的内容, 并根据索引变量将进位寄存器的 10 内容写入第五随机存储器。
本实施例中, CPU将读取到的进位寄存器的内容写入到第五随机存储器 中第 n+1个字对应的存储位置。
步骤 210: CPU输出第五随机存储器的内容。
本步骤输出的第五随机存储器的内容即为乘加运算的运算结果。
15
图 1中步骤 106中所述调用乘加模块对读取到的数进行运算的过程可以如 图 3所示, 包括步骤 301-步骤 310。
步骤 301 : CPU判断第一偏移量是否为 0, 是则执行步骤 310, 否则执行 步骤 302。 本实施例中,所述第一偏移量用于表示第二随机存储器中相对于基地址的 偏移地址。
步骤 302: CPU初始化进位寄存器, 初始化索引变量。
本实施例中, 初始化进位寄存器可选为将进位寄存器的内容初始化为 0, 5 初始化索引变量可选为将索引变量初始化为 1。
步骤 303: CPU根据索引变量从第三随机存储器和第五随机存储器中各获 取一个字分别写入第三运算寄存器和第四运算寄存器。
本实施例,根据索引变量从第三随机存储器和第五随机存储器中各获取一 个字分别写入第三运算寄存器和第四运算寄存器包括:从第三随机存储器和第 L0 五随机存储器中获取与当前索引变量对应的存储位置上的字分别写入第三运 算寄存器和第四运算寄存器。 例如, 当前索引变量为 n, 则将获取到的第三随 机存储器的最低位起第 n个字写入第三运算寄存器,将获取到的第五随机存储 器最低位起第 n个字写入第四运算寄存器。
可选地, 本实施例中, 所述第三运算寄存器用于存储乘加运算中的乘数, L5 所述第四运算寄存器用于存储乘加运算中的被加数,所述第三运算寄存器和第 四运算寄存器的存储空间的大小可选为 1个字。
步骤 304: CPU将第三运算寄存器的内容与第四寄存器的内容相乘,相乘 结果与第四运算寄存器的内容相加得到第三计算结果。
本实施例中, 所述第三计算结果可以存放在一个临时寄存器中, 该临时寄 10 存器的存储空间可选为 2个机器字长,在执行步骤 305之前包括从该临时寄存 器中读取第三计算结果。
步骤 305: CPU将第三计算结果与进位寄存器的内容相加得到第四计算结 果。
步骤 306: CPU将第四计算结果的高位起第一个字写入进位寄存器, 其余 15 字根据索引变量写入第五随机存储器。
本实施例中, 其余字根据索引变量写入第五随机寄存器包括: CPU将第 四计算结果中除高位起第一个字之外的其余字按照从低位到高位的顺序从第 五随机存储器中与当前索引变量对应的存储位置开始依次写入。例如, 当前索 引变量为 n, 则将第四计算结果中除高位起第一个字之外的其余字从第五随机 存储器中低位起第 n个字开始依次写入。
步骤 307: CPU判断索引变量是否等于预设字长, 是则执行步骤 309, 否 则执行步骤 308。
本实施例中, CPU判断索引变量是否等于 n, 是执行步骤 309, 否则执行 5 步骤 308; 可选的 n=16。
步骤 308: 索引变量加 1 , 然后返回执行步骤 303。
步骤 309: CPU读取进位寄存器的内容, 并根据索引变量将进位寄存器的 内容写入第五随机存储器。
本实施例中, CPU将读取到的进位寄存器的内容写入到第五随机存储器 L0 中第 n+1个字对应的存储位置。
步骤 310: CPU输出第五随机存储器的内容。
本步骤输出的第五随机存储器的内容即为乘加运算的运算结果。 以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于 L5 此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易想到 变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围应 以权利要求的保护范围为准。

Claims

权 利 要 求
1、 一种基于蒙哥马利模乘的数据处理方法, 其特征在于: 第一随机存储 器用于存储乘数, 第二随机存储器用于存储被乘数, 第三随机存储器用于存储
5 模数, 所述方法包括:
步骤 1: CPU初始化第五随机存储器, 初始化第一偏移量、 第二偏移量; 步骤 2: CPU根据第一偏移量从所述第二随机存储器中读取一个字写入第 一运算寄存器;
步骤 3: CPU调用乘加模块执行第一运算寄存器的内容与所述第一随机存 L0 储器的内容相乘后与所述第五随机存储器的内容相加的运算,根据第二偏移量 将得到的运算结果从低位到高位写入所述第五随机存储器;
步骤 4: CPU根据第二偏移量从所述第五随机存储器中读取一个字写入第 二运算寄存器,将所述第二运算寄存器的内容和常数寄存器的内容相乘, 并将 相乘结果的低位字写入第四寄存器;
L5 步骤 5: CPU读取所述第四寄存器、 所述第三随机存储器和所述第五随机 存储器的内容,调用乘加模块执行所述第四寄存器的内容与所述第三随机存储 器的内容相乘后与所述第五随机存储器的内容相加的运算, 将第二偏移量加 1 ,根据第二偏移量将得到的运算结果从低位到高位写入所述第五随机存储器; 步骤 6: CPU判断第一偏移量是否等于预设步长, 是则执行步骤 8, 否则 10 执行步骤 7;
步骤 7: 第一偏移量加 1 , 返回执行步骤 2;
步骤 8: CPU读取所述第五随机存储器和所述第三随机存储器的内容, 判 断所述第五随机存储器的内容的值是否大于等于第三随机存储器的内容的值, 是则执行步骤 9, 否则执行步骤 10;
15 步骤 9: CPU用读取到的所述第五随机存储器的内容与第三随机存储器的 内容相减,并根据第二偏移量将相减结果从低位到高位写入所述第五随机存储 器, 执行步骤 10;
步骤 10: CPU输出所述第五随机存储器的内容。
2.根据权利要求 1 所述方法, 其特征在于: 所述第一偏移量用于表示所 述第二随机存储器中相对于基地址的偏移地址,所述第二偏移量用于表示所述 第五随机存储器中相对于基地址的偏移地址。
3.根据权利要求 1所述方法, 其特征在于: 所述步骤 2包括: 读取从第二随机存储器的基地址起向左移动第一偏移量后对应的字写入 5 第一运算寄存器。
4.根据权利要求 1所述方法, 其特征在于: 所述乘加模块用于实现 CPU 支持的乘法和加法运算。
5.根据权利要求 1所述方法, 其特征在于: 步骤 3 所述 CPU调用乘加模 块执行第一运算寄存器的内容与所述第一随机存储器的内容相乘后与所述第
L0 五随机存储器的内容相加的运算包括:
步骤 201 : CPU判断第一偏移量是否为 0, 是则执行步骤 210, 否则执行 步骤 202;
步骤 202: 初始化进位寄存器, 初始化索引变量;
步骤 203: CPU根据索引变量从第一随机存储器和第五随机存储器中各获 L5 取一个字分别写入第三运算寄存器和第四运算寄存器;
步骤 204: CPU将第一运算寄存器的内容和第三运算寄存器的内容相乘, 相乘结果与第四运算寄存器的内容相加得到第一计算结果;
步骤 205: CPU将第一计算结果与进位寄存器的内容相加得到第二计算结 果;
10 步骤 206: CPU将第二计算结果的高位起第一个字写入进位寄存器, 其余 字根据索引变量写入第五随机存储器;
步骤 207: CPU判断索引变量是否等于预设字长, 是则执行步骤 209, 否 则执行步骤 208;
步骤 208: 索引变量加 1 , 然后返回执行步骤 203;
15 步骤 209: CPU读取进位寄存器的内容, 并根据索引变量将进位寄存器的 内容写入第五随机存储器;
步骤 210: CPU输出第五随机存储器的内容作为步骤 3所述运算结果。
6.根据权利要求 5所述方法, 其特征在于: 所述步骤 203 包括: 从第一 随机存储器和第五随机存储器中获取与当前索引变量对应的存储位置上的字 分别写入第三运算寄存器和第四运算寄存器。
7.根据权利要求 5所述方法, 其特征在于: 步骤 206所述其余字根据索 引变量写入第五随机寄存器包括: CPU将第二计算结果中除高位起第一个字 之外的其余字按照从低位到高位的顺序从第五随机存储器中与当前索引变量
5 对应的存储位置开始依次写入。
8.根据权利要求 1所述方法, 其特征在于: 步骤 3和步骤 5所述根据第 二偏移量将得到的运算结果从低位到高位写入第五随机存储器包括:将运算结 果按照从低位到高位的顺序从第五随机存储器中基地址起向左移第二偏移量 的位置开始依次写入。
L0 9.根据权利要求 1所述方法, 其特征在于: 步骤 4所述 CPU根据第二偏 移量从第五随机存储器中读取一个字写入第二运算寄存器包括:读取从第五随 机存储器的基地址起向左移动第二偏移量后对应的字写入第二运算寄存器。
10. 根据权利要求 1所述方法, 其特征在于: 步骤 5所述调用乘加模块执 行所述第四寄存器的内容与所述第三随机存储器的内容相乘后与所述第五随 L5 机存储器的内容相加的运算, 包括:
步骤 301 : CPU判断第一偏移量是否为 0, 是则执行步骤 310, 否则执行 步骤 302;
步骤 302: 初始化进位寄存器, 初始化索引变量;
步骤 303: CPU根据索引变量从第三随机存储器和第五随机存储器中各获 10 取一个字分别写入第三运算寄存器和第四运算寄存器;
步骤 304: CPU将第三运算寄存器的内容与第四寄存器的内容相乘,相乘 结果与第四运算寄存器的内容相加得到第三计算结果;
步骤 305: CPU将第三计算结果与进位寄存器的内容相加得到第四计算结 果;
15 步骤 306: CPU将第四计算结果的高位起第一个字写入进位寄存器, 其余 字根据索引变量写入第五随机存储器;
步骤 307: CPU判断索引变量是否等于预设字长, 是则执行步骤 309, 否 则执行步骤 308;
步骤 308: 索引变量加 1 , 然后返回执行步骤 303; 步骤 309: CPU读取进位寄存器的内容, 并根据索引变量将进位寄存器的 内容写入第五随机存储器;
步骤 310: CPU输出第五随机存储器的内容作为步骤 5所述运算结果。
11.根据权利要求 10所述方法, 其特征在于: 步骤 9所述根据第二偏移 量将相减结果从低位到高位写入第五随机存储器包括:将相减结果按照从低位 到高位的顺序从第五随机存储器中基地址起向左移第二偏移量的位置开始依 次写入。
PCT/CN2013/088305 2012-12-24 2013-12-02 一种基于蒙哥马利模乘的数据处理方法 WO2014101632A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/434,275 US9588696B2 (en) 2012-12-24 2013-12-02 Montgomery modular multiplication-based data processing method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210566979.X 2012-12-24
CN201210566979.XA CN102999313B (zh) 2012-12-24 2012-12-24 一种基于蒙哥马利模乘的数据处理方法

Publications (1)

Publication Number Publication Date
WO2014101632A1 true WO2014101632A1 (zh) 2014-07-03

Family

ID=47927926

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/088305 WO2014101632A1 (zh) 2012-12-24 2013-12-02 一种基于蒙哥马利模乘的数据处理方法

Country Status (3)

Country Link
US (1) US9588696B2 (zh)
CN (1) CN102999313B (zh)
WO (1) WO2014101632A1 (zh)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999313B (zh) 2012-12-24 2016-01-20 飞天诚信科技股份有限公司 一种基于蒙哥马利模乘的数据处理方法
CN103207770B (zh) * 2013-04-16 2016-09-28 飞天诚信科技股份有限公司 一种在嵌入式系统中实现大数预计算的方法
CN104793919B (zh) * 2015-04-15 2017-11-07 深圳国微技术有限公司 一种蒙哥马利模乘装置及具有其的嵌入式安全芯片
CN106681690B (zh) * 2015-11-07 2019-02-26 上海复旦微电子集团股份有限公司 基于蒙哥马利模乘的数据处理方法、模乘运算方法及装置
CN106681691B (zh) * 2015-11-07 2019-01-29 上海复旦微电子集团股份有限公司 基于蒙哥马利模乘的数据处理方法、模乘运算方法和装置
IL244842A0 (en) * 2016-03-30 2016-07-31 Winbond Electronics Corp Efficient non-modular multiplexing is protected against side-channel attacks
CN106873941B (zh) * 2017-01-19 2019-05-21 西安交通大学 一种快速模乘和模平方电路及其实现方法
US10778407B2 (en) 2018-03-25 2020-09-15 Nuvoton Technology Corporation Multiplier protected against power analysis attacks
US11508263B2 (en) * 2020-06-24 2022-11-22 Western Digital Technologies, Inc. Low complexity conversion to Montgomery domain
CN112486457B (zh) * 2020-11-23 2022-12-20 杭州电子科技大学 一种实现改进的fios模乘算法的硬件系统
CN112286496B (zh) * 2020-12-25 2021-03-30 九州华兴集成电路设计(北京)有限公司 蒙哥马利算法的模乘器和电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1731345A (zh) * 2005-08-18 2006-02-08 上海微科集成电路有限公司 可扩展高基蒙哥马利模乘算法及其电路结构
CN1967469A (zh) * 2006-11-09 2007-05-23 北京华大信安科技有限公司 高效模乘方法及装置
US20070233772A1 (en) * 2006-03-30 2007-10-04 Sanu Mathew Modular multiplication acceleration circuit and method for data encryption/decryption
CN102999313A (zh) * 2012-12-24 2013-03-27 飞天诚信科技股份有限公司 一种基于蒙哥马利模乘的数据处理方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3709553B2 (ja) * 2000-12-19 2005-10-26 インターナショナル・ビジネス・マシーンズ・コーポレーション 演算回路および演算方法
CN1259617C (zh) * 2003-09-09 2006-06-14 大唐微电子技术有限公司 一种加快rsa加/解密过程的方法及其模乘、模幂运算电路
CN1696894B (zh) * 2004-05-10 2010-04-28 华为技术有限公司 大数模乘计算乘法器
FR2917198B1 (fr) * 2007-06-07 2010-01-29 Thales Sa Operateur de reduction modulaire ameliore.
CN102231102B (zh) * 2011-06-16 2013-08-07 天津大学 基于余数系统的rsa密码处理方法及协处理器

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1731345A (zh) * 2005-08-18 2006-02-08 上海微科集成电路有限公司 可扩展高基蒙哥马利模乘算法及其电路结构
US20070233772A1 (en) * 2006-03-30 2007-10-04 Sanu Mathew Modular multiplication acceleration circuit and method for data encryption/decryption
CN1967469A (zh) * 2006-11-09 2007-05-23 北京华大信安科技有限公司 高效模乘方法及装置
CN102999313A (zh) * 2012-12-24 2013-03-27 飞天诚信科技股份有限公司 一种基于蒙哥马利模乘的数据处理方法

Also Published As

Publication number Publication date
CN102999313A (zh) 2013-03-27
US9588696B2 (en) 2017-03-07
CN102999313B (zh) 2016-01-20
US20150293698A1 (en) 2015-10-15

Similar Documents

Publication Publication Date Title
WO2014101632A1 (zh) 一种基于蒙哥马利模乘的数据处理方法
US6820105B2 (en) Accelerated montgomery exponentiation using plural multipliers
CN111580866B (zh) 一种向量运算装置及运算方法
US20090043836A1 (en) Method and system for large number multiplication
TWI276972B (en) Efficient multiplication of small matrices using SIMD registers
CN103221916B (zh) 执行乘乘累加指令
US8356160B2 (en) Pipelined multiple operand minimum and maximum function
US20090049113A1 (en) Method and Apparatus for Implementing a Multiple Operand Vector Floating Point Summation to Scalar Function
US20100312997A1 (en) Parallel processing and internal processors
US9996345B2 (en) Variable length execution pipeline
CN103761068A (zh) 优化的蒙哥马利模乘方法、模平方方法和模乘硬件
CN114341802A (zh) 用于执行存储器内处理操作的方法及相关存储器装置和系统
Meher New approach to LUT implementation and accumulation for memory-based multiplication
TWI493456B (zh) 向量計算指令執行之方法、裝置及系統
Gopal et al. Fast and constant-time implementation of modular exponentiation
KR20080016803A (ko) 큰 수 곱셈 방법 및 디바이스
CN114385112A (zh) 处理模数乘法的装置及方法
EP1821197A2 (fr) Dispositif de traitement en notation polonaise inversée, et circuit intégré électronique comprenant un tel dispositif de traitement
Sergiyenko et al. Design of data buffers in field programmablr gate arrays
KR101100753B1 (ko) 부스 곱셈기
Galuzzi et al. High-bandwidth address generation unit
Chang Implementation of High Speed Large Integer Multiplication Algorithm on Contemporary Architecture
Wu Reducing memory requirements in CSA-based scalable montgomery modular multipliers
WO2020211049A1 (zh) 数据处理方法和设备
CN104008086B (zh) 流水线离散希尔伯特变换电路

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13869478

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14434275

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13869478

Country of ref document: EP

Kind code of ref document: A1