CN118192934A - Modular multiplication operation method, device, chip, board card and vehicle-mounted system - Google Patents
Modular multiplication operation method, device, chip, board card and vehicle-mounted system Download PDFInfo
- Publication number
- CN118192934A CN118192934A CN202410395880.0A CN202410395880A CN118192934A CN 118192934 A CN118192934 A CN 118192934A CN 202410395880 A CN202410395880 A CN 202410395880A CN 118192934 A CN118192934 A CN 118192934A
- Authority
- CN
- China
- Prior art keywords
- data
- multiplication
- instruction
- register
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000012545 processing Methods 0.000 claims abstract description 36
- 238000004364 calculation method Methods 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 12
- 230000014759 maintenance of location Effects 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 35
- 238000004422 calculation algorithm Methods 0.000 description 27
- 230000005540 biological transmission Effects 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
- G06F7/722—Modular multiplication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/45—Structures or tools for the administration of authentication
- G06F21/46—Structures or tools for the administration of authentication by designing passwords or checking the strength of passwords
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
Abstract
The application provides a modular multiplication operation method, a modular multiplication operation device, a chip, a board card and a vehicle-mounted system, and relates to the technical field of computers. The modular multiplication device comprises: a processor and a data operator; the data operator comprises a multiplier, an accumulator, a first register, a second register, a third register, a first multi-path data selector, a second multi-path data selector, a third multi-path data selector and a fourth multi-path data selector; the processor is used for acquiring target data matched with the data operation instruction after receiving the data operation instruction; transmitting the target data to the data arithmetic unit; the data arithmetic unit is used for receiving the target data and carrying out arithmetic processing on the target data to obtain a data arithmetic result; the data operation result is used for indicating a modular multiplication operation result. The method of the application improves the operation speed of the modular multiplication operation and improves the operation performance.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a modular multiplication method, a modular multiplication device, a modular multiplication chip, a modular multiplication board, and a modular multiplication board.
Background
An elliptic curve cryptography (Elliptic Curve Cryptography, ECC) algorithm is a public key cryptography algorithm based on an elliptic curve, and the ECC algorithm has wide application in the fields of digital signature, information security, blockchain and the like due to higher security and smaller key length.
With the popularity of the ECC algorithm, the calculated data size corresponding to the ECC algorithm is gradually increasing. At this time, when performing the modular multiplication operation of the ECC algorithm, the memory may overflow due to the gradually increased data amount, or the calculation efficiency may be reduced, thereby affecting the performance of the ECC algorithm.
Disclosure of Invention
The application provides a modular multiplication operation method, a modular multiplication operation device, a chip, a board card and a vehicle-mounted system, which are used for solving the problems of low efficiency and poor performance of the traditional modular multiplication operation method.
In a first aspect, the present application provides a modular multiplication apparatus comprising: a processor and a data operator; the data operator comprises a multiplier, an accumulator, a first register, a second register, a third register, a first multi-path data selector, a second multi-path data selector, a third multi-path data selector and a fourth multi-path data selector; the processor is connected with the multiplier, the first multi-path data selector, the second multi-path data selector and the third multi-path data selector respectively; the first multipath data selector is connected with the first register; the first register is connected with the first multi-path data selector, the second multi-path data selector and the fourth multi-path data selector respectively; the second multipath data selector is connected with the first register, the second register and the processor respectively; the second register is respectively connected with the accumulator and the second multipath data selector; the accumulator is respectively connected with the third multipath data selector and the fourth multipath data selector; the third multipath data selector is connected with the third register; the third register is respectively connected with the accumulator and the fourth multipath data selector;
the processor is used for acquiring target data matched with the data operation instruction after receiving the data operation instruction; transmitting the target data to the data arithmetic unit;
the data arithmetic unit is used for receiving the target data and carrying out arithmetic processing on the target data to obtain a data arithmetic result; the data operation result is used for indicating a modular multiplication operation result.
In a second aspect, the present application provides a modular multiplication method, the method being applied to a processor; comprising the following steps:
Responding to a modular multiplication operation instruction, and acquiring data to be operated; the data to be calculated comprises data to be calculated and pre-calculation data; the data to be calculated represents the data which need to be subjected to modular multiplication operation; the pre-calculated data represents intermediate data which is relied on in the process of carrying out modular multiplication operation; the pre-calculation data is data determined based on the data to be calculated;
Performing modular multiplication operation on the data to be operated based on a preset data operation instruction to obtain a modular multiplication operation result corresponding to the data to be operated; wherein the preset data operation instruction is implemented based on any one of the modular multiplication operation devices in the first aspect.
In a third aspect, the present application provides a chip comprising the modular arithmetic device of any one of the first aspects.
In a fourth aspect, the present application provides a board comprising the chip of the third aspect.
Fifth aspect the present application provides an in-vehicle system comprising the chip of the third aspect or the board card of the fourth aspect.
The modular multiplication operation method, the modular multiplication operation device, the chip, the board card and the vehicle-mounted system can respond to the modular multiplication operation instruction to acquire the data to be operated, and then the modular multiplication operation device is adopted to carry out modular multiplication operation on the data to be operated according to the preset data operation instruction to acquire a modular multiplication operation result corresponding to the data to be operated. In the embodiment, the modular multiplication operation can be realized by customizing the data operation instruction and the hardware unit, so that the operation processing speed of the modular multiplication operation can be improved, and the operation efficiency of the modular multiplication operation can be improved. At this time, when the modular multiplication method is applied to a complex algorithm (for example, elliptic curve cryptography), the computing performance of the complex algorithm can be improved, so that the application of the complex algorithm is smoother and wider, and the robustness of the complex algorithm is improved. Further, the performance of the computer device (e.g., an in-vehicle system) applying the complex algorithm can be further improved, so that the use experience of a user using the computer device is further improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic diagram of a general software architecture of an elliptic curve cryptography according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a modular multiplication device according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a preset operation instruction according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an arithmetic device corresponding to a first multiplication expansion instruction according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an arithmetic device corresponding to a second extended instruction according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an arithmetic device corresponding to a third multiplication expansion instruction according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an arithmetic device corresponding to a fourth multiplication expansion instruction according to an embodiment of the present application;
fig. 8 is a schematic diagram of an arithmetic device corresponding to a first zero-setting extension instruction according to an embodiment of the present application;
Fig. 9 is a schematic diagram of an arithmetic device corresponding to a second zero-setting extension instruction according to an embodiment of the present application;
fig. 10 is a schematic diagram of an arithmetic device corresponding to a data retention and expansion instruction according to an embodiment of the present application;
FIG. 11 is a schematic flow chart of a modular multiplication method according to the present application;
FIG. 12 is a schematic flow chart of another modular multiplication method according to the present application;
FIG. 13 is a schematic diagram of a first multiplication instruction corresponding to n-word length data according to an embodiment of the present application;
FIG. 14 is a schematic diagram of a second multiplication instruction corresponding to n-word length data according to an embodiment of the present application;
FIG. 15 is a schematic flow chart of a multiplication operation implemented by n-word length data according to an embodiment of the present application;
FIG. 16 is a schematic flow chart of a single word length data multiplication operation according to an embodiment of the present application;
FIG. 17 is a schematic diagram of an operation flow of Montgomery modular multiplication according to an embodiment of the present application;
FIG. 18 is a schematic diagram of a modular multiplication performance evaluation result according to an embodiment of the present application;
FIG. 19 is a schematic diagram of a chip according to an embodiment of the present application;
FIG. 20 is a schematic diagram of a board according to an embodiment of the present application;
fig. 21 is a schematic diagram of a vehicle-mounted system according to an embodiment of the present application.
Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
At present, with the popularization of the ECC algorithm, the calculated data size corresponding to the ECC algorithm is gradually increasing.
In an example, fig. 1 is a schematic diagram of a general software architecture of elliptic curve cryptography according to an embodiment of the present application. As shown in fig. 1, the elliptic curve operation mainly includes "fixed point scalar multiplication", "free point scalar multiplication", "point addition", and "double addition", and the like. The elliptic curve operation is mainly implemented based on a prime field operation, for example, the prime field operation comprises a modular addition operation, a modular subtraction operation, a modular multiplication operation, a modular inverse operation and the like. At this time, elliptic curve cryptography algorithm can be realized according to the random number generator and hash function combined with elliptic curve operation. At this time, an elliptic curve cryptography algorithm interface may be provided to externally implement encryption/decryption based on an elliptic curve cryptography.
At present, when the modular multiplication operation of the prime domain is performed based on the elliptic curve cryptography algorithm, the memory overflows or the calculation efficiency is reduced due to the gradually increased data quantity, so that the performance of the ECC algorithm is affected.
The modular multiplication operation method provided by the application aims at realizing modular multiplication operation by designing a hardware operation unit corresponding to modular multiplication operation and combining with a RISC-V (reduced instruction set computer) -V (virtual machine) expansion instruction so as to solve the technical problems in the prior art.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
In one example, in the modular multiplication device provided by the embodiment of the present application, a single word length processed by the multiplier, the accumulator and the multi-way data selector is illustrated by taking 32 bits as an example. Because the first register and the second register are used for storing multiplication results corresponding to the multiplier, the word length of data which can be stored in the first register and the second register is 64 bits. The third register is used to store the summation result of the accumulator, so the word length of the data that the third register can store is 67 bits.
Note that, the word length that can be processed by the hardware processing unit may be 64 bits, 16 bits, 128 bits, or the like, in addition to 32 bits, and the word length that can be processed by the hardware processing unit is not limited herein, so that it can be implemented.
Fig. 2 is a schematic diagram of a modular multiplication device according to an embodiment of the present application, as shown in fig. 2, the device includes: a processor 21 and a data operator 22; the data operator 22 includes a multiplier 201, an accumulator 202, a first register 203, a second register 204, a third register 205, a first multiplexer 206, a second multiplexer 207, a third multiplexer 208, and a fourth multiplexer 209.
The processor 21 is connected to the multiplier 201, the first multiplexer 206, the second multiplexer 207, and the third multiplexer 208, respectively; the first multiplexer 206 is connected to the first register 203; the first register 203 is connected to the first multiplexer 206, the second multiplexer 207, and the fourth multiplexer 209, respectively; the second multiplexer 207 is connected to the first register 203, the second register 204, and the processor 21, respectively; the second register 204 is connected with the accumulator 202 and the second multiplexer 207 respectively; the accumulator 202 is connected to the third multiplexer 208 and the fourth multiplexer 209, respectively; the third multiplexer 208 is connected to the third register 205; the third register 205 is connected to the accumulator 202 and the fourth multiplexer 209, respectively.
The processor 21 is configured to obtain target data matched with the data operation instruction after receiving the data operation instruction; and transmits the target data to the data operator 22.
The data arithmetic unit 22 is configured to receive the target data, and perform arithmetic processing on the target data to obtain a data arithmetic result; the data operation result is used for carrying out modular multiplication operation.
In one example, a data operation instruction may be used to indicate an instruction to perform an operation process on target data. The data operation instruction may be used to instruct multiplication processing on target data or output processing on multiplication results. At this point, the target data may characterize the multiplier and multiplicand, or be used to characterize the multiplication result.
In one example, since the single word length of the data processed by the multi-path data selector is 32 bits, when the data is transmitted through the multi-path data selector, if the word length of the data to be transmitted is greater than 32 bits, the data can be transmitted according to the word length of the data to be transmitted, and a plurality of multi-path data selectors are used simultaneously to complete the data transmission, and the data transmission can also be achieved by repeatedly using a single multi-path data selector.
For example, as shown in fig. 2, the first multiplexer 206 is configured to select one of the data stored in the first register 203, the value 0 and the multiplication result output by the multiplier 201, and at this time, the data stored in the first register 203 and the multiplication result output by the multiplier 201 may be transmitted by reusing a single multiplexer, that is, the first multiplexer 206.
For another example, as shown in fig. 2, the second and third multi-way data selectors 207 and 208 enable transmission of data by using a plurality of multi-way data selectors simultaneously in the case where the word length of transmission data is longer than 32 bits. The number of the multiple data selectors used for data transmission is not limited herein, so as to satisfy the actual requirements.
In one example, when the modular multiplication operation is implemented by using the operation device shown in fig. 2, the corresponding data operation instruction may be set first, and the operation processing may be performed according to the data operation instruction, so as to obtain the modular multiplication operation result.
In an example, fig. 3 is a schematic diagram of a preset operation instruction provided in an embodiment of the present application, where, as shown in fig. 3, the preset operation instruction includes: the first multiply-expand instruction, i.e., the multadd instruction shown in FIG. 3, that characterizes the multiply operation and sum is described as "multadd rd, rs1, rs2"; the second multiplication expansion instruction, i.e. multaddh instruction shown in fig. 3, which is characterized by reserving the operation result of the upper bits and outputting the operation result of the lower bits after multiplication and summation is described as 'multaddh rd, rs1, rs 2'; a third multiplication expansion instruction, i.e., the mul instruction shown in fig. 3, representing the multiplication result of the output low order after the multiplication is performed, is described as "mul rd, rs1, rs2"; a fourth multiplication expansion instruction, i.e., mulh instruction shown in fig. 3, which characterizes the multiplication result of the upper-order multiplication, described as "mulh rd, rs1, rs2"; a first zero extension instruction, i.e., rdlset instruction shown in fig. 3, that characterizes the output low-order data post-zero, described as "rdlset 0rd"; a second zero extension instruction that characterizes the post-zeroing of the output high-order data, i.e., the rdhset instruction shown in fig. 3, is described as "rdhset0rd"; a data hold expand instruction that characterizes the remaining data after outputting the low order data, i.e., the rdlkeep instruction shown in fig. 3, is described as "RDLKEEP RD". Wherein rs1 and rs2 represent input data of the instruction, and rd represents output data of the instruction.
In one example, the preset operation instruction may be designed based on RISC-V, so as to obtain a custom RISC-V expansion instruction, that is, a preset operation instruction. At this time, as shown in fig. 3, two field values funct, funct in the RISC-V expansion instruction can be customized to call the corresponding operation instruction. For example, the funct7 field corresponding to the multadd instruction may have a value of "0011100" and the funct3 field may have a value of "000"; the funct field corresponding to the multaddh instruction may have a value of "0011101" and the funct3 field may have a value of "000"; the value of the funct field corresponding to the mul instruction may be "0000001" and the value of the corresponding funct3 field may be "000"; the funct field corresponding to the mulh instruction may have a value of "0000001" and the funct field corresponding to the funct instruction may have a value of "001"; the funct7 field corresponding to the rdlset0 instruction may have a value of "0011110" and the funct3 field may have a value of "000"; the funct7 field corresponding to the rdhset0 instruction may have a value of "0011111" and the funct3 field may have a value of "000"; the funct field corresponding to the rdlkeep instruction may have a value of "0100000" and the corresponding funct3 field may have a value of "000".
It should be noted that, in the embodiment of the present application, two field values funct, funct3 may be set in a self-defined manner, where the two field values funct, funct3 are not limited, so as to accurately distinguish each operation instruction.
The operation of each operation instruction shown in fig. 3 will be described below with reference to a specific operation apparatus schematic diagram.
In one example, the data operation instruction characterizes a first multiplication expansion instruction; the first multiplication expansion instruction characterizes an instruction for multiplication and summation; the target data includes a first multiplier and a first multiplicand.
At this time, referring to fig. 4, fig. 4 is a schematic diagram of an operation device corresponding to a first multiplication expansion instruction according to an embodiment of the present application, where the operation device is shown by a solid line in fig. 4:
The processor 41 is configured to, after receiving the first multiplication expansion instruction, transmit a first multiplier (rs 1 shown in fig. 4) and a first multiplicand (rs 2 shown in fig. 4) to the multiplier 401, so as to perform a multiplication operation on the first multiplier (rs 1 shown in fig. 4) and the first multiplicand (rs 2 shown in fig. 4) based on the multiplier 401, to obtain a first multiplication operation result; the first multiplication result is transferred to the first register 403 through the first multiplexer 406 and stored.
The first register 403 is configured to receive the first multiplication result, and transmit and store the first multiplication result to the second register 404 through the second multiplexer 407.
The second register 404 is configured to obtain the data stored in the third register 405, and transmit the data stored in the third register 405 and the received first multiplication result to the accumulator 402.
An accumulator 402, configured to perform an addition operation on the data stored in the third register and the first multiplication result, to obtain a first addition operation result (mutt_add_data as shown in fig. 4); the first addition result (mult_add_data as shown in fig. 4) is saved to the third register 405 through the third multiplexer 408.
At this time, the data stored in the third register 405 is a result rd obtained after the operation processing is performed on rs1 and rs2 by the instruction of the first multiplication expansion instruction multadd.
In one example, the data operation instruction characterizes a second multiplication expansion instruction; the second multiplication expansion instruction characterizes an instruction which performs multiplication and summation, reserves the operation result of the high order and outputs the operation result of the low order; the target data includes a second multiplier and a second multiplicand.
At this time, referring to fig. 5, fig. 5 is a schematic diagram of an arithmetic device corresponding to a second multiplication expansion instruction according to an embodiment of the present application, where the solid line in fig. 5 is shown as follows:
The processor 51 is configured to, after receiving the second multiplication expansion instruction, transmit a second multiplier (rs 1 shown in fig. 5) and a second multiplicand (rs 2 shown in fig. 5) to the multiplier 501, so as to perform a multiplication operation on the second multiplier (rs 1 shown in fig. 5) and the second multiplicand (rs 2 shown in fig. 5) based on the multiplier 501, to obtain a second multiplication operation result; the second multiplication result is transmitted to the first register 503 through the first multiplexer 506 and stored.
The first register 503 is configured to receive the second multiplication result, and transmit and store the second multiplication result to the second register 504 through the second multiplexer 506.
A second register 504 for transferring the result of the second multiplication operation to the accumulator 502.
An accumulator 502, configured to obtain the data stored in the third register 505, and perform an addition operation on the data stored in the third register 505 and the received second multiplication result, to obtain a second addition operation result (mutt_add_data shown in fig. 5).
The processor 51 is further configured to obtain and output, through the fourth multiplexer 509, low-order data in the second addition result (mult_add_data as shown in fig. 5); and, the high order data in the second addition result is saved to the third register 505 by the third multi-way data selector 508.
At this time, the low-order data in the output second addition operation result is a result rd obtained after the operation processing is performed on rs1 and rs2 through the instruction of the second multiplication expansion instruction multaddh.
In one example, the data operation instruction characterizes a third multiplier extension instruction; the third multiplication expansion instruction characterizes an instruction which outputs a multiplication result of a low order after multiplication; the target data includes a third multiplier and a third multiplicand. At this time, referring to fig. 6, fig. 6 is a schematic diagram of an arithmetic device corresponding to a third multiplication expansion instruction according to an embodiment of the present application, where the arithmetic device is shown by a solid line in fig. 6:
The processor 61 is configured to transmit a third multiplier (rs 1 shown in fig. 6) and a third multiplicand (rs 2 shown in fig. 6) to the multiplier 601 after receiving the third multiplication expansion instruction.
A multiplier 601 for receiving a third multiplier (rs 1 shown in fig. 6) and a third multiplicand (rs 2 shown in fig. 6); multiplying the third multiplier (rs 1 shown in fig. 6) and the third multiplicand (rs 2 shown in fig. 6) to obtain a third multiplication result; the third multiplication result is transmitted through the first multiplexer 606 and stored in the first register 603.
The processor 61 is further configured to select and output low-order data in the third multiplication result based on the fourth multiplexer 609.
At this time, the low-order data in the output third multiplication result is the result rd obtained after the operation processing is performed on rs1 and rs2 through the third multiplication expansion instruction mul instruction.
In one example, the data operation instruction characterizes a fourth multiply-expand instruction; the fourth multiplication expansion instruction characterizes an instruction which outputs a multiplication result of a high bit after multiplication operation; the target data includes a fourth multiplier, a fourth multiplicand, and a first target value. At this time, referring to fig. 7, fig. 7 is a schematic diagram of an operation device corresponding to a fourth multiplication expansion instruction according to an embodiment of the present application, where the operation device is shown by a solid line in fig. 7:
The processor 71 is configured to transmit the fourth multiplier (rs 1 shown in fig. 7) and the fourth multiplicand (rs 2 shown in fig. 7) to the multiplier 701 after receiving the fourth multiplication expansion instruction.
A multiplier 701 for receiving a fourth multiplier (rs 1 shown in fig. 7) and a fourth multiplicand (rs 2 shown in fig. 7); multiplying the fourth multiplier (rs 1 shown in fig. 7) and the fourth multiplicand (rs 2 shown in fig. 7) to obtain a fourth multiplication result; the fourth multiplication result is transferred to the first register 703 through the first multiplexer 706 and stored.
The first register 703 is configured to transmit the fourth multiplication result to the second register 704 through the second multiplexer 707.
The processor 71 is further configured to output high-order data in the fourth multiplication result stored in the second register 704 through the fourth multi-way data selector 709; transmitting a first target value (the first target value is the value 0 shown in fig. 7) to the second register 704 through the second multiplexer 707 to perform zero setting processing on the second register 704; the first target value is transferred to the third register 705 through the third multiplexer 708 to zero the third register 705.
At this time, the high-order data in the output fourth multiplication result is a result rd obtained after the operation processing is performed on rs1 and rs2 through the fourth multiplication expansion instruction mulh instruction.
At this time, by performing zero setting processing on the third register 705 and the second register 704, preparation can be made for the next operation.
In one example, the data operation instruction characterizes a first zero-set extension instruction; the first zeroing expansion instruction represents an instruction which zeroes after outputting low-bit data; the target data includes a summation operation result after the accumulator operation and a first target value. At this time, referring to fig. 8, fig. 8 is a schematic diagram of an arithmetic device corresponding to a first zeroing expansion instruction according to an embodiment of the present application, where the solid line in fig. 8 is shown as follows:
A processor 81 for selecting and outputting low-order data in the summation operation result (i.e., mult_add_data shown in fig. 8) after the accumulator operation based on the fourth multi-way data selector 809; transmitting a first target value (the first target value is the value 0 shown in fig. 8) to the second register 804 through the second multiplexer 807 to perform zero setting processing on the second register 804; the first target value is transferred to the third register 805 through the third multiplexer 808 to zero the third register 805.
In one example, the data operation instruction characterizes a second zero-set extension instruction; the second zeroing expansion instruction represents an instruction which zeroes after outputting high-order data; the target data includes data stored in the third register and the first target value. At this time, referring to fig. 9, fig. 9 is a schematic diagram of an arithmetic device corresponding to a second zero-setting expansion instruction according to an embodiment of the present application, where the solid line in fig. 9 is shown as follows:
A processor 91 for outputting high-order data among the data stored in the third register 905 based on the fourth multi-way data selector 909; transmitting a first target value (the first target value is the value 0 shown in fig. 9) to the second register 904 through the second multiplexer 907 to perform zero setting processing on the second register 904; the first target value is transferred to the third register 905 through the third multiplexer 908 to perform the zeroing process on the third register 905.
In one example, the data operation instruction characterizes a data hold extension instruction; the data retention expansion instruction characterizes an instruction for retaining residual data after outputting low-order data; the target data includes data stored in the second register and data stored in the third register. At this time, referring to fig. 10, fig. 10 is a schematic diagram of an operation device corresponding to a data retention expansion instruction according to an embodiment of the present application, where the operation device is shown by a solid line in fig. 10:
an accumulator 1002 for receiving data stored in the second register 1004 and the third register 1005; the data stored in the second register 1004 and the data stored in the third register 1005 are added to each other, and a third addition result is obtained.
A processor 101, configured to output low-bit data in the third addition result through the fourth multiplexer 1009 after the accumulator 1002 obtains the third addition result; and the high order data in the third addition result is transmitted to and stored in the third register 1005 through the third multiplexer 1008.
Based on the data operation instruction realized by the modular multiplication operation device, the application also provides a modular multiplication operation method. Referring to fig. 11, fig. 11 is a schematic flow chart of a modular multiplication method provided by the present application, and as shown in fig. 11, the method may be applied to a processor in any of the above embodiments, and includes:
s1101, responding to a modular multiplication operation instruction, and acquiring data to be operated.
The data to be calculated comprises data to be calculated and pre-calculation data; the data to be calculated represents the data which need to be subjected to modular multiplication operation; the pre-calculated data represents intermediate data which is relied on in the process of carrying out modular multiplication operation; the pre-calculation data is data determined based on the data to be calculated.
In one example, the data to be calculated may be understood as data requiring a modular multiplication operation, e.g., the data to be calculated may include a multiplier x, a multiplicand y, and a modulus m.
In one example, the pre-calculated data may include the modulus data R, the first intermediate data w, and the second intermediate data nm. At this time, the data to be calculated may be determined based on the data to be calculated, and at this time, the modular exponentiation data R has a value of r=2 N, where 2 N-1<m≤2N. The value of the first intermediate data w is w= -m -1 mod R. The value of the second intermediate data nm is nm= -m mod R.
S1102, carrying out modular multiplication operation on data to be operated based on a preset data operation instruction to obtain a modular multiplication operation result corresponding to the data to be operated.
The preset data operation instruction is realized based on any modular multiplication operation device.
In one example, the modular multiplication method provided by the embodiment of the present application may be used to calculate a complex algorithm, for example, an elliptic curve cipher, where the complex algorithm to which the modular multiplication method can be applied is not limited and can be implemented.
As can be seen from the above description, according to the embodiment of the present application, the data to be operated can be obtained in response to the modular multiplication operation instruction, and then the modular multiplication operation is performed on the data to be operated by using the operation device according to the preset data operation instruction, so as to obtain the modular multiplication operation result corresponding to the data to be operated. In the embodiment, the modular multiplication operation can be realized by customizing the data operation instruction and the hardware unit, so that the operation processing speed of the modular multiplication operation can be improved, and the operation efficiency of the modular multiplication operation can be improved. At this time, when the modular multiplication method is applied to a complex algorithm (for example, elliptic curve cryptography), the computing performance of the complex algorithm can be improved, so that the application of the complex algorithm is smoother and wider, and the robustness of the complex algorithm is improved. Further, the performance of the computer device (e.g., an in-vehicle system) applying the complex algorithm can be further improved, so that the use experience of a user using the computer device is further improved.
In an example, referring to fig. 12, fig. 12 is a schematic flow chart of another modular multiplication method provided in the present application, as shown in fig. 12, the method includes:
s1201, responding to a modular multiplication operation instruction, and acquiring data to be operated.
The data to be calculated comprises data to be calculated and pre-calculation data; the data to be calculated represents the data which need to be subjected to modular multiplication operation; the pre-calculated data represents intermediate data which is relied on in the process of carrying out modular multiplication operation; the pre-calculation data is data determined based on the data to be calculated.
In an example, this step may be described in S1101, and will not be described in detail herein.
In one example, after obtaining data to be operated, performing modular multiplication operation on the data to be operated based on a preset data operation instruction, and before obtaining a modular multiplication operation result corresponding to the data to be operated, the embodiment of the application may determine a first multiplication instruction and a second multiplication instruction for implementing modular multiplication operation based on the preset data operation instruction (where the first multiplication instruction is used for determining high-order data in the multiplication operation result through the first multiplication operation, and the second multiplication instruction is used for determining low-order data in the multiplication operation result through the second multiplication operation), specifically refer to the steps described below.
S1202, determining a first multiplication instruction and a second multiplication instruction based on a data word length corresponding to data to be operated and a preset data operation instruction.
Wherein the first multiplication instruction is used for realizing a first multiplication operation; the second multiply instruction is to implement a second multiply operation.
In one example, the word length of the data to be operated on may be greater than or equal to the single word length (i.e., 32 bits) of the data that can be processed by the modular multiplication device.
At this time, in the case that the data word length corresponding to the data to be operated is multiple word lengths, the preset data operation instruction includes a first multiplication expansion instruction, a second multiplication expansion instruction, a first zero expansion instruction, a second zero expansion instruction, and a data holding expansion instruction. At this time, based on the data word length corresponding to the data to be operated and the preset data operation instruction, the first multiplication instruction and the second multiplication instruction are determined, which specifically includes the following steps:
First, a target word length corresponding to a multiplicand and a multiplier in data to be operated is determined.
Wherein the target word length characterizes the data word length of the multiplicand or multiplier. At this time, the target word length may be represented as n, where n is a positive integer greater than 1.
In one example, the target word length may be understood as the maximum value of the first word length corresponding to the multiplicand and the second word length corresponding to the multiplier, for example, when the first word length corresponding to the multiplicand is 5 and the second word length corresponding to the multiplier is 3, the value of the target word length is 5. For example, when the first word length corresponding to the multiplicand is 5 and the second word length corresponding to the multiplier is 5, the target word length is 5.
Then, a first multiplication instruction and a second multiplication instruction are determined based on the target word length, the first multiplication expansion instruction, the second multiplication expansion instruction, the first zeroing expansion instruction, the second zeroing expansion instruction, and the data hold expansion instruction.
In one example, the first multiplication instruction is determined based on the target word length, the first multiplication expansion instruction, the second multiplication expansion instruction, the first zeroing expansion instruction, the second zeroing expansion instruction, and the data retention expansion instruction, and specifically includes the steps of:
First, based on the data length corresponding to the single word length and the target word length, the multiplicand and the multiplier are segmented to obtain a plurality of first sub-data corresponding to the multiplicand and a plurality of second sub-data corresponding to the multiplier.
In one example, assuming that the data length corresponding to the single word length is 32 bits, when the multiplicand and the multiplier are split, the first sub-data corresponding to the multiplicand and the second sub-data corresponding to the multiplier may be obtained by performing the split processing in units of 32 bits from the lowest bit of the multiplicand and the multiplier, respectively.
Then, determining a first data combination pattern that matches the first multiply instruction; and determining a plurality of first multiplied data sets based on the matched first data combination pattern.
Wherein the first multiplied data set is used for indicating the first sub-data and the second sub-data for which multiplication calculation is required.
In one example, the first data combination mode is used to indicate a combination manner when the first sub data and the second sub data are subjected to operation processing, for example, assume that the first sub data is denoted by X i, where i represents the ith first sub data in the multiplicand. Assume that the second sub-data is denoted as Y j, where j represents the j-th second sub-data in the multiplier. Then, the first data combination mode may be i+j=2n, and at this time, the first sub data and the second sub data, in which the sum of i and j is 2n, may be combined to perform the arithmetic processing. At this time, the first multiplied data set is determined based on the i-th first sub data and the j-th second sub data, and the first sub data and the second sub data where the sum of i and j is 2 n.
In one example, the number of first data combination patterns that match the first multiply instruction may be multiple, e.g., the value of i+j determined by the first data combination pattern may be 2n,2n-1, …, n, …,3,2.
In this case, the number of the first multiplied data sets determined based on the first data combination pattern may be one or a plurality of. For example, when the value of i+j is 2n, the values of i and j are both n, and at this time, the number of the first multiplied data sets is 1. When i+j has a value of 3, i is 1, j is 2, or i is 2, j is 1, at this time, the number of first multiplied data sets may be determined to be 2.
Next, a first number of first multiply-expand instructions included in the first multiply instruction and a second number of second multiply-expand instructions included in the first multiply instruction are determined based on the number of first data combination patterns and the number of first multiply data sets determined based on each of the first data combination patterns.
Finally, a first multiplication instruction is determined based on the first number of first multiplication expansion instructions, the second number of second multiplication expansion instructions, the data hold expansion instruction, and the second zeroing expansion instruction.
In one example, fig. 13 is a schematic diagram of a first multiplication instruction corresponding to n-word length data according to an embodiment of the present application, as shown in fig. 13, when a first multiplication instruction and a second multiplication instruction are executed for each first multiplication data set in each first data combination mode (for example, i+j=2n-1, i+j=2n-2, i+j=2n-3, …, i+j=n+1, i+j=n, …, i+j=4, i+j=3, i+j=2 shown in fig. 13), in the case that it is determined that an operation is performed for the first time, the first multiplication instruction is executed for the first sub data and the second sub data in the first multiplication data set. And then, executing a second multiplication expansion instruction on the first sub data and the second sub data in each first multiplication data group except the last first data combination mode in each first data combination mode until the last first multiplication data group in the first data combination mode, so that the first sub data and the second sub data in the last first multiplication data group execute the first multiplication expansion instruction. Then, a second multiply-expand instruction is executed on the first sub-data and the second sub-data within the first multiplied data set in the last first data combination mode, and after execution is completed, the data hold expand instruction and the second zero-set expand instruction continue to be executed.
In this case, the higher multiplication result can be obtained by outputting the value of the n+1st word, …, the value of the 4 th word, the value of the 3 rd word, the value of the 2 nd word, and the value of the 1 st word (i.e., the value of the highest 32 bits).
In one example, for the first multiplication instruction shown in fig. 13, it may be determined that, in the case where n has a value of 2, the first data combination mode may be: i+j=4, i+j=3, i+j=2, and at this time, the number of first data combining patterns is 3. In the case where the first data combination pattern is i+j=4, the number of the determined first multiplied data sets is 1; in the case where the first data combination pattern is i+j=3, the number of the determined first multiplied data sets is 2; in the case where the first data combination pattern is i+j=2, the number of the determined first multiplied data sets is 1. At this time, it may be determined that the first number of the first multiplication expansion instructions included in the first multiplication instruction is 2 and the second number of the second multiplication expansion instructions is 2.
In one example, the second multiplication instruction is determined based on the target word length, the first multiplication expansion instruction, the second multiplication expansion instruction, the first zeroing expansion instruction, the second zeroing expansion instruction, and the data retention expansion instruction, and specifically includes the steps of:
First, based on the data length corresponding to the single word length and the target word length, the multiplicand and the multiplier are subjected to segmentation processing to obtain a plurality of third sub-data corresponding to the multiplicand and a plurality of fourth sub-data corresponding to the multiplier.
In one example, the third sub-data corresponding to the multiplicand and the fourth sub-data corresponding to the multiplier may be obtained by performing the segmentation processing with the length corresponding to the single word length as the step length from the lowest bit of the multiplicand and the multiplier, respectively.
Then, determining a second data combination pattern matching the second multiplication instruction; and determining a plurality of second phase multiplier data sets based on the matched second data combining patterns.
Wherein the second multiplied data set is used for indicating the third sub-data and the fourth sub-data for which multiplication calculation is required.
In one example, the second data combination mode is used to indicate a combination manner of the third sub-data and the fourth sub-data when the third sub-data is subjected to operation processing, for example, assume that the third sub-data is denoted by x i, where i represents the ith third sub-data in the multiplicand. Assume that the second sub-data is denoted as y j, where j represents the j fourth sub-data in the multiplier. Then, the first data combination mode may be i+j=2n, and at this time, the third sub data and the fourth sub data of which the sum of i and j is 2n may be combined to perform the arithmetic processing. At this time, a second phase multiplier data set is determined based on the i-th third sub data and the j-th fourth sub data, and the third sub data and the fourth sub data where the sum of i and j is 2 n.
In one example, the number of second data combination patterns that match the second multiply instruction may be multiple, e.g., the value of i+j determined by the second data combination pattern may be 2n,2n-1, …, n+1.
At this time, the number of the second multiplied data sets determined based on the second data combination pattern may be one or a plurality. For example, when the value of i+j is 2n, the values of i and j are both n, and at this time, the number of second multiplied data sets is 1. When i+j has a value of 3, i is 1, j is 2, or i is 2, j is 1, and at this time, the number of second phase multiplier data sets may be determined to be 2.
Next, a third number of first multiply-expand instructions included by the second multiply instruction and a fourth number of second multiply-expand instructions included by the second multiply instruction are determined based on the number of second data combination patterns and the number of second multiply data sets determined based on each second data combination pattern.
Finally, a second multiply instruction is determined based on the third number of first multiply-expand instructions, the fourth number of second multiply-expand instructions, and the first zero-expand instruction.
In one example, fig. 14 is a schematic diagram of a second multiplication instruction corresponding to n-word length data according to an embodiment of the present application, as shown in fig. 14, when a first multiplication instruction and a second multiplication instruction are executed for each second multiplication data set in each second data combination mode (for example, i+j=2n-1, i+j=2n-2, i+j=2n-3, …, i+j=n+1 shown in fig. 14), in the case that an operation is determined to be performed for the first time, the first multiplication instruction is executed for third sub-data and fourth sub-data in the second phase multiplier data set. Then, the second multiplication expansion instruction is executed on the third sub data and the fourth sub data in each first multiplication data group in each first data combination mode, and then the first multiplication expansion instruction is executed. And executing the first zero-setting expansion instruction after completing the first multiplication instruction and the second multiplication expansion instruction on the third sub data and the fourth sub data in each second phase multiplier data group.
At this time, the low-order multiplication result can be obtained by outputting the value of the lowest 32 bits, the value of the 2n-1 st word, the value of the 2n-2 nd word, the value of … n+1st word, and the value of the 1 st word.
In one example, for the first multiply instruction illustrated in fig. 14, it may be determined that, in the case where n has a value of 2, the second data combination mode may be: i+j=4, i+j=3, and at this time, the number of second data combination patterns is 2. In the case where the second data combination pattern is i+j=4, the number of the determined second phase-multiplier data sets is 1; in the case where the first data combination pattern is i+j=3, the number of the determined first multiplied data sets is 2. At this time, it may be determined that the third number of the first multiplication expansion instructions included in the second multiplication instruction is 2 and the fourth number of the second multiplication expansion instructions included in the second multiplication instruction is 1.
In one example, embodiments of the present application may implement a multiplication of data based on a first multiplication instruction and a second multiplication instruction. At this time, referring to fig. 15, fig. 15 is a schematic flow chart of implementing multiplication operation for n-word length data according to an embodiment of the present application. As shown in fig. 15, by simultaneously outputting the output result of the first multiplication instruction and the output result of the second multiplication instruction, a final multiplication result can be obtained, so as to implement the multiplication process.
In one example, in a case where the data word length corresponding to the data to be operated is a single word length, the preset data operation instruction includes a third multiplication expansion instruction and a fourth multiplication expansion instruction. At this time, when determining the first multiplication instruction and the second multiplication instruction based on the data word length corresponding to the data to be calculated and the preset data calculation instruction, the third multiplication expansion instruction may be directly determined as the second multiplication instruction, and the fourth multiplication expansion instruction may be determined as the first multiplication instruction.
At this time, when implementing multiplication of data to be operated based on the first multiplication instruction and the second multiplication instruction, refer to fig. 16, and fig. 16 is a schematic flow chart of implementing multiplication of data with a single word length according to an embodiment of the present application. As shown in fig. 16, the multiplication process is implemented by executing the second multiplication instruction (i.e., the third multiplication expansion instruction) and then executing the first multiplication instruction (i.e., the fourth multiplication expansion instruction).
In one example, the first multiplication instruction may be used to perform a first multiplication operation, and the second multiplication instruction may be used to perform a second multiplication operation, where after determining the first multiplication instruction and the second multiplication instruction that perform a modular multiplication operation on the data to be operated, the modular multiplication operation may be performed on the data to be operated, to obtain a modular multiplication operation result corresponding to the data to be operated. Here, the modular multiplication operation may be understood as a montgomery modular multiplication operation, and in this case, when the modular multiplication operation is performed on data to be operated, reference may be made to the steps described below.
S1203, based on a preset data operation instruction, performing a first multiplication operation on the multiplicand and the multiplier to obtain a first result representing high bits, and performing a second multiplication operation on the multiplicand and the multiplier to obtain a first result representing low bits.
S1204, performing a second multiplication operation on the first result representing the low order and the first intermediate data to obtain a second result representing the low order.
S1205, performing a first multiplication operation on the second result representing the low order and the data of the data bank to obtain a third result representing the high order, and performing a second multiplication operation on the second result representing the low order and the data of the data bank to obtain a third result representing the low order.
S1206, adding the first result representing the low order and the third result representing the low order to obtain a first summation result.
The first summation result comprises first carry data and first summation data.
S1207 performs addition calculation on the first result representing the high order, the third result representing the high order, and the first carry data, to obtain a second summation result.
The second summation result comprises second carry data and second summation data.
S1208, if it is determined that carry occurs based on the second carry data, or it is determined that the second summation data is greater than or equal to the modulo data, performing addition calculation on the second summation data, the second intermediate data and the second target value to obtain a third summation result; and determining third summation data included in the third summation result as a modular multiplication operation result corresponding to the data to be operated.
In one example, the second target value may be used to indicate carry data that is to be added when the second summation data and the second intermediate data are added, where the second target value may be a value of 1.
And S1209, otherwise, determining the second summation data as a modular multiplication operation result corresponding to the data to be operated.
In one example, fig. 17 is a schematic diagram of an operation flow of a montgomery modular multiplication operation according to an embodiment of the present application, as shown in fig. 17, data to be calculated may be obtained in response to a montgomery modular multiplication operation instruction, where the data includes a multiplicand x, a multiplier y, and a modulus m, that is, inputs x, y, m shown in fig. 17. Then, based on the data to be calculated, pre-calculation data including the modular exponentiation data R, the first intermediate data w, and the second intermediate data nm, that is, the determined pre-calculation data R, w, nm shown in fig. 17 is determined. Wherein r=2 N satisfies 2 N-1<m≤2N;w=-m-1 mod R; nm= -m mod R.
Thereafter, the product of the multiplicand x and the multiplier y may be calculated and the first result of the high order may be stored in mult_div_xy and the first result of the low order may be stored in mult_mod_xy. At this time, the above-described process may be expressed as: mut_xy=mut (x, y) =mut_div_xy||mut_mod_xy is calculated, where "|" represents a connector, e.g., 01||10=0110. Wherein mult_div_xy is the result obtained by performing the first multiplication operation, and mult_mod_xy is the result obtained by performing the second multiplication operation.
Then, performing a second multiplication operation on the mult_mod_xy and the first intermediate data w to obtain a second result, and storing the second result in t, where the above process can be expressed as: calculate t= MultMod (mult_mod_xy, w).
Then, the product of t and the modulus m is calculated, and the high third result is stored in mult_div_tm, and the low third result is stored in mult_mod_tm, where the above procedure can be expressed as: mutt _ tm=mutt is calculated (t, m) =mult_div_tm i mult_mod_tm.
Then, modulo addition is performed on the first result mult_mod_xy of the lower order and the third result of the lower order, and the carry data input of the modulo addition is set to 0. After performing the modulo addition, the obtained most significant data (i.e., the first carry data) is stored in the cout1, and the remaining data (i.e., the first sum data) is stored in the drop1, where the above procedure can be expressed as: calculate (drop 1, cout 1) =add (mult_mod_xy, mult_mod_tm, 0).
Then, modulo addition is performed on the first result mult_div_xy of the high order and the third result mult_div_tm of the high order, and the carry data input of the modulo addition is set to cout1. After performing the modulo addition processing, the obtained most significant data (i.e., the second carry data) is stored in cout2, and the remaining data (i.e., the second sum data) is stored in res, and at this time, the above procedure can be expressed as: calculate (res, cout 2) =add (mult_div_xy, mult_div_tm, cout 1).
Then, it can be judged whether cout2=1 or res.gtoreq.m is satisfied.
If yes, performing modulo addition on the second sum data res and the second intermediate data nm, and setting the carry data input of the modulo addition to be 1. After modulo addition, the highest bit data may be stored in drop2 and the remaining data (i.e., the third summation data) may be stored in z. At this time, this process can be expressed as: calculate (z, drop 2) =add (res, nm, 1). At this time, the data corresponding to z (i.e., the third summation data) may be determined as the result of the Montgomery modular multiplication operation.
Otherwise, the data corresponding to res (i.e., the second summation data) is determined as the operation result of the Montgomery modular multiplication operation.
In the above embodiment, the modular multiplication device and the data operation instruction which can be realized by the modular multiplication device can be designed to realize modular multiplication operation, so that the calculation speed and calculation performance of the modular multiplication operation are obviously improved by the design of the low-cost hardware operation unit, and the robustness of the modular multiplication operation application is widened.
In one example, after Montgomery modular multiplication operations are performed based on the modular multiplication method and modular multiplication device described above, the method is compared with a conventional method for implementing Montgomery modular multiplication operations based on a software algorithm
In one example, in the RISC-V simulation experiment environment, the embodiment of the present application performs the operation performance evaluation on the modular multiplication operation before the optimization and the modular multiplication operation method implemented based on the data operation instruction and the modular multiplication operation device provided by the present application, and the evaluation result is shown in fig. 18.
FIG. 18 is a schematic diagram of a modular multiplication performance evaluation result according to an embodiment of the present application, where the performance of performing a modular multiplication operation is 3543cycles before optimizing the modular multiplication operation as shown in FIG. 18. After the modular multiplication method provided by the embodiment of the application is used for optimization, the operation performance of executing one modular multiplication operation is 1272cycles. Therefore, before and after optimization, the modular multiplication performance is improved by about 64%, and the modular multiplication performance is remarkably improved.
The inventor also found that when the modular multiplication method provided by the embodiment of the application is used for calculating the elliptic curve cryptography algorithm, the performance of the elliptic curve cryptography algorithm is improved by about 24%.
Fig. 19 is a schematic diagram of a chip according to an embodiment of the present application, and as shown in fig. 19, the chip includes the above-mentioned modular multiplication device.
Fig. 20 is a schematic diagram of a board according to an embodiment of the application, as shown in fig. 20, where the board includes the chip shown in fig. 19.
Fig. 21 is a schematic diagram of a vehicle-mounted system provided in an embodiment of the present application, where, as shown in fig. 21, the vehicle-mounted system includes a chip shown in fig. 19 or a board shown in fig. 20.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
Wherein the individual modules may be physically separated, e.g. mounted in different locations of one device, or mounted on different devices, or distributed over a plurality of network elements, or distributed over a plurality of processors. The modules may also be integrated together, e.g. mounted in the same device, or integrated in a set of codes. The modules may exist in hardware, or may also exist in software, or may also be implemented in software plus hardware. The application can select part or all of the modules according to actual needs to realize the purpose of the scheme of the embodiment.
When the individual modules are implemented as software functional modules, the integrated modules may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform some of the steps of the methods of the various embodiments of the application.
It should be understood that, although the steps in the flowcharts in the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily occurring in sequence, but may be performed alternately or alternately with other steps or at least a portion of the other steps or stages.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.
Claims (20)
1. A modular multiplication apparatus, characterized in that the operation apparatus comprises: a processor and a data operator; the data operator comprises a multiplier, an accumulator, a first register, a second register, a third register, a first multi-path data selector, a second multi-path data selector, a third multi-path data selector and a fourth multi-path data selector; the processor is connected with the multiplier, the first multi-path data selector, the second multi-path data selector and the third multi-path data selector respectively; the first multipath data selector is connected with the first register; the first register is connected with the first multi-path data selector, the second multi-path data selector and the fourth multi-path data selector respectively; the second multipath data selector is connected with the first register, the second register and the processor respectively; the second register is respectively connected with the accumulator and the second multipath data selector; the accumulator is respectively connected with the third multipath data selector and the fourth multipath data selector; the third multipath data selector is connected with the third register; the third register is respectively connected with the accumulator and the fourth multipath data selector;
the processor is used for acquiring target data matched with the data operation instruction after receiving the data operation instruction; transmitting the target data to the data arithmetic unit;
the data arithmetic unit is used for receiving the target data and carrying out arithmetic processing on the target data to obtain a data arithmetic result; the data operation result is used for indicating a modular multiplication operation result.
2. The computing device of claim 1, wherein the data operation instruction characterizes a first multiplication expansion instruction; wherein the first multiplication expansion instruction characterizes an instruction for multiplication and summation; the target data includes a first multiplier and a first multiplicand;
The processor is used for transmitting the first multiplier and the first multiplicand to the multiplier after receiving the first multiplication expansion instruction so as to multiply the first multiplier and the first multiplicand based on the multiplier to obtain a first multiplication operation result; transmitting and storing the first multiplication result to the first register through the first multipath data selector;
The first register is configured to receive the first multiplication result, and transmit and store the first multiplication result to the second register through the second multipath data selector;
the second register is configured to acquire data stored in the third register, and transmit the data stored in the third register and the received first multiplication result to the accumulator;
The accumulator is used for carrying out addition operation on the data stored in the third register and the first multiplication operation result to obtain a first addition operation result; and storing the first addition operation result into the third register through the third multipath data selector.
3. The computing device of claim 1, wherein the data operation instruction characterizes a second multiplication expansion instruction; the second multiplication expansion instruction represents an instruction which is used for carrying out multiplication operation and summing, reserving an operation result of a high order and outputting an operation result of a low order; the target data includes a second multiplier and a second multiplicand;
The processor is configured to transmit the second multiplier and the second multiplicand to the multiplier after receiving the second multiplication expansion instruction, so as to perform multiplication operation on the second multiplier and the second multiplicand based on the multiplier, to obtain a second multiplication operation result; transmitting and storing the second multiplication result to the first register through the first multipath data selector;
the first register is configured to receive the second multiplication result, and transmit and store the second multiplication result to the second register through the second multipath data selector;
The second register is configured to transmit the second multiplication result to the accumulator;
the accumulator is configured to obtain the data stored in the third register, and perform an addition operation on the data stored in the third register and the received second multiplication result to obtain a second addition operation result;
the processor is further configured to obtain and output low-order data in the second addition result through the fourth multipath data selector; and storing high-order data in the second addition operation result into the third register through the third multipath data selector.
4. The computing device of claim 1, wherein the data operation instruction characterizes a third multiplication expansion instruction; the third multiplication expansion instruction characterizes an instruction which outputs a multiplication result of a low order after multiplication operation; the target data includes a third multiplier and a third multiplicand;
The processor is used for transmitting the third multiplier and the third multiplicand to the multiplier after receiving the third multiplication expansion instruction;
the multiplier is configured to receive the third multiplier and the third multiplicand; multiplying the third multiplier and the third multiplicand to obtain a third multiplication result; transmitting and storing the third multiplication operation result to the first register through the first multipath data selector;
The processor is further configured to select and output low-order data in the third multiplication result based on the fourth multipath data selector.
5. The computing device of claim 1, wherein the data operation instruction characterizes a fourth multiplication expansion instruction; the fourth multiplication expansion instruction characterizes an instruction which outputs a multiplication operation result of a high order after multiplication operation; the target data includes a fourth multiplier, a fourth multiplicand, and a first target value;
The processor is configured to transmit the fourth multiplier and the fourth multiplicand to the multiplier after receiving the fourth multiplication expansion instruction;
The multiplier is configured to receive the fourth multiplier and the fourth multiplicand; performing multiplication operation on the fourth multiplier and the fourth multiplicand to obtain a fourth multiplication operation result; transmitting and storing the fourth multiplication result to the first register through the first multipath data selector;
The first register is configured to transmit the fourth multiplication result to the second register through the second multipath data selector;
The processor is further configured to output high-order data in a fourth multiplication result stored in the second register through the fourth multi-way data selector; transmitting the first target value to the second register through the second multipath data selector so as to carry out zero setting processing on the second register; and transmitting the first target value to the third register through the third multipath data selector so as to carry out zero setting processing on the third register.
6. The computing device of claim 1, wherein the data operation instruction characterizes a first zero-set extension instruction; the first zero-setting expansion instruction represents an instruction which outputs low-bit data and then is zero-setting; the target data comprises a summation operation result obtained after the accumulator operation and a first target value;
the processor is used for selecting and outputting low-order data in the summation operation result after the accumulator operation based on the fourth multipath data selector; transmitting the first target value to the second register through the second multipath data selector so as to carry out zero setting processing on the second register; and transmitting the first target value to the third register through the third multipath data selector so as to carry out zero setting processing on the third register.
7. The computing device of claim 1, wherein the data operation instruction characterizes a second zero-set extension instruction; wherein the second zeroing expansion instruction characterizes an instruction which zeroes after outputting high-order data; the target data comprises data stored in the third register and a first target value;
The processor is used for outputting high-order data in the data stored in the third register based on the fourth multi-path data selector; transmitting the first target value to the second register through the second multipath data selector so as to carry out zero setting processing on the second register; and transmitting the first target value to the third register through the third multipath data selector so as to carry out zero setting processing on the third register.
8. The computing device of claim 1, wherein the data operation instruction characterizes a data hold extension instruction; the data retention expansion instruction characterizes an instruction for retaining residual data after outputting low-order data; the target data includes data stored in the second register and data stored in the third register;
The accumulator is used for receiving the data stored by the second register and the third register; performing addition operation on the data stored in the second register and the data stored in the third register to obtain a third addition operation result;
the processor is configured to output, after the accumulator obtains the third addition result, low-bit data in the third addition result through the fourth multi-way data selector; and transmitting and storing high-order data in the third addition operation result to the third register through the third multipath data selector.
9. A modular multiplication method, which is characterized by being applied to a processor; the method comprises the following steps:
Responding to a modular multiplication operation instruction, and acquiring data to be operated; the data to be calculated comprises data to be calculated and pre-calculation data; the data to be calculated represents the data which need to be subjected to modular multiplication operation; the pre-calculated data represents intermediate data which is relied on in the process of carrying out modular multiplication operation; the pre-calculation data is data determined based on the data to be calculated;
Performing modular multiplication operation on the data to be operated based on a preset data operation instruction to obtain a modular multiplication operation result corresponding to the data to be operated; wherein the preset data operation instruction is implemented based on the modular multiplication operation device according to any one of the above claims 1 to 6.
10. The method of claim 9, wherein the data to be calculated comprises multiplicand, multiplier, and data; the pre-calculation data includes modular power data determined based on the modular data, first intermediate data and second intermediate data determined based on the modular power data; the performing modular multiplication operation on the data to be operated based on a preset data operation instruction to obtain a modular multiplication operation result corresponding to the data to be operated, including:
Based on the preset data operation instruction, performing a first multiplication operation on the multiplicand and the multiplier to obtain a first result representing high bits, and performing a second multiplication operation on the multiplicand and the multiplier to obtain a first result representing low bits;
Performing the second multiplication operation on the first result representing the low order and the first intermediate data to obtain a second result representing the low order;
performing the first multiplication operation on the second result representing the low order and the module data to obtain a third result representing the high order, and performing the second multiplication operation on the second result representing the low order and the module data to obtain a third result representing the low order;
Adding the first result representing the low level and the third result representing the low level to obtain a first summation result; the first summation result comprises first carry data and first summation data;
Adding the first result representing the high order, the third result representing the high order and the first carry data to obtain a second summation result; wherein the second summation result comprises second carry data and second summation data;
And determining a modular multiplication operation result corresponding to the data to be operated based on the second carry data and the second summation data.
11. The method of claim 10, wherein the determining a modular multiplication result corresponding to the data to be calculated based on the second carry data and the second sum data comprises:
If the occurrence of carry is determined based on the second carry data, or the second summation data is determined to be greater than or equal to the module data, adding calculation is performed on the second summation data, the second intermediate data and a second target value, so as to obtain a third summation result; and determining third summation data included in the third summation result as a modular multiplication operation result corresponding to the data to be operated;
otherwise, determining the second summation data as a modular multiplication operation result corresponding to the data to be operated.
12. The method according to claim 10, wherein before performing a modular multiplication operation on the data to be operated on based on a preset data operation instruction to obtain a modular multiplication operation result corresponding to the data to be operated on, the method further comprises:
Determining a first multiplication instruction and a second multiplication instruction based on the data word length corresponding to the data to be calculated and the preset data operation instruction; wherein the first multiply instruction is to implement the first multiply operation; the second multiply instruction is to implement the second multiply operation.
13. The method of claim 12, wherein the data word length corresponding to the data to be operated is multiple word lengths; the preset data operation instruction comprises a first multiplication expansion instruction, a second multiplication expansion instruction, a first zero-setting expansion instruction, a second zero-setting expansion instruction and a data holding expansion instruction; the determining a first multiplication instruction and a second multiplication instruction based on the data word length corresponding to the data to be operated and the preset data operation instruction includes:
Determining a target word length corresponding to a multiplicand and a multiplier in the data to be operated; wherein the target word length characterizes a data word length of the multiplicand or the multiplier;
The first and second multiplication instructions are determined based on the target word length, the first multiplication expansion instruction, the second multiplication expansion instruction, the first zeroing expansion instruction, the second zeroing expansion instruction, and the data retention expansion instruction.
14. The method of claim 13, wherein determining the first multiply instruction based on the target word length, the first multiply-expand instruction, the second multiply-expand instruction, the first zero-expand instruction, the second zero-expand instruction, and the data-hold-expand instruction comprises:
based on the data length corresponding to the single word length and the target word length, the multiplicand and the multiplier are subjected to segmentation processing, so that a plurality of first sub-data corresponding to the multiplicand and a plurality of second sub-data corresponding to the multiplier are obtained;
Determining a first data combination pattern that matches the first multiply instruction; and determining a plurality of first multiplied data sets based on the matched first data combination pattern; the first multiplied data set is used for indicating first sub-data and second sub-data which need to be multiplied;
Determining a first number of first multiplication expansion instructions included in the first multiplication instruction and a second number of second multiplication expansion instructions included in the first multiplication instruction based on the number of first data combination patterns and the number of first multiplication data groups determined based on each of the first data combination patterns;
The first multiply instruction is determined based on the first number of first multiply-expand instructions, the second number of second multiply-expand instructions, the data-hold-expand instruction, and the second zero-set-expand instruction.
15. The method of claim 13, wherein determining the second multiplication instruction based on the target word length, the first multiplication expansion instruction, the second multiplication expansion instruction, the first zeroing expansion instruction, the second zeroing expansion instruction, and the data retention expansion instruction comprises:
Based on the data length corresponding to the single word length and the target word length, the multiplicand and the multiplier are subjected to segmentation processing, so that a plurality of third sub-data corresponding to the multiplicand and a plurality of fourth sub-data corresponding to the multiplier are obtained;
determining a second data combination pattern that matches the second multiply instruction; and determining a plurality of second phase multiplier data sets based on the matched second data combining patterns; wherein the second multiplied data set is used for indicating third sub-data and fourth sub-data which need to be multiplied;
Determining a third number of first multiplication expansion instructions included in the second multiplication instruction and a fourth number of the second multiplication expansion instructions included in the second multiplication instruction based on the number of second data combination patterns and the number of second multiplication data groups determined based on each of the second data combination patterns;
The second multiply instruction is determined based on the third number of first multiply-expand instructions, a fourth number of second multiply-expand instructions, and the first zero-set expand instruction.
16. The method of claim 12, wherein the word length of the data corresponding to the data to be operated is a single word length; the preset data operation instruction comprises a third multiplication expansion instruction and a fourth multiplication expansion instruction; the determining a first multiplication instruction and a second multiplication instruction based on the data word length corresponding to the data to be operated and the preset data operation instruction includes:
The third multiplication expansion instruction is determined to be the second multiplication instruction, and the fourth multiplication expansion instruction is determined to be the first multiplication instruction.
17. The method of claim 9, wherein the method is used to calculate elliptic curve cryptography.
18. A chip comprising the modular arithmetic device according to any one of claims 1 to 8.
19. A board comprising the chip of claim 18.
20. An in-vehicle system comprising the chip of claim 18 or the board of claim 19.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410395880.0A CN118192934A (en) | 2024-04-02 | 2024-04-02 | Modular multiplication operation method, device, chip, board card and vehicle-mounted system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410395880.0A CN118192934A (en) | 2024-04-02 | 2024-04-02 | Modular multiplication operation method, device, chip, board card and vehicle-mounted system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118192934A true CN118192934A (en) | 2024-06-14 |
Family
ID=91399812
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410395880.0A Pending CN118192934A (en) | 2024-04-02 | 2024-04-02 | Modular multiplication operation method, device, chip, board card and vehicle-mounted system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118192934A (en) |
-
2024
- 2024-04-02 CN CN202410395880.0A patent/CN118192934A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7505587B2 (en) | Elliptic curve cryptosystem apparatus, storage medium storing elliptic curve cryptosystem program, and elliptic curve cryptosystem arithmetic method | |
EP0801345B1 (en) | Circuit for modulo multiplication and exponentiation arithmetic | |
CN100527072C (en) | Device and method for carrying out montgomery mode multiply | |
CN106951211B (en) | A kind of restructural fixed and floating general purpose multipliers | |
US6397241B1 (en) | Multiplier cell and method of computing | |
US5210710A (en) | Modulo arithmetic processor chip | |
JP2002521720A (en) | Circuits and methods for modulo multiplication | |
US20210182026A1 (en) | Compressing like-magnitude partial products in multiply accumulation | |
CN112464296B (en) | Large integer multiplier hardware circuit for homomorphic encryption technology | |
CN103049710B (en) | Field-programmable gate array (FPGA) chip for SM2 digital signature verification algorithm | |
KR20110105555A (en) | Montgomery multiplier having efficient hardware structure | |
KR100442218B1 (en) | Power-residue calculating unit using montgomery algorithm | |
CN110543291A (en) | Finite field large integer multiplier and implementation method of large integer multiplication based on SSA algorithm | |
CN113467750A (en) | Large integer bit width division circuit and method for SRT algorithm with radix of 4 | |
KR20030051992A (en) | apparatus for RSA Crypto Processing of IC card | |
KR20230141045A (en) | Crypto-processor Device and Data Processing Apparatus Employing the Same | |
CN116488788A (en) | Hardware accelerator of full homomorphic encryption algorithm, homomorphic encryption method and electronic equipment | |
Lu et al. | Implementation of fast RSA key generation on smart cards | |
CN109144472B (en) | Scalar multiplication of binary extended field elliptic curve and implementation circuit thereof | |
Ito et al. | Efficient exhaustive verification of the Collatz conjecture using DSP blocks of Xilinx FPGAs | |
Parihar et al. | Fast Montgomery modular multiplier for rivest–shamir–adleman cryptosystem | |
CN118192934A (en) | Modular multiplication operation method, device, chip, board card and vehicle-mounted system | |
CN113467752B (en) | Division operation device, data processing system and method for private calculation | |
CN116225369A (en) | SM2 algorithm scalar multiplication operation optimization method and system | |
CN115658005A (en) | High-precision low-delay large integer division accelerating device based on redundancy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |