WO2020231353A1

WO2020231353A1 - A low-latency redundant multiplier and method for the same

Info

Publication number: WO2020231353A1
Application number: PCT/TR2019/050331
Authority: WO
Inventors: Erdinc Ozturk
Original assignee: Sabanci Universitesi
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2020-11-19

Abstract

The present invention relates to a redundant/modular multiplicaton means with low latency and a method for the same. To establish an efficient and fast integer modular multiplication framework, integers are represented as polynomials in a way such that any n-bit integer is expressable by a k-degree polynomial. Denoted by k = n/d, where d is digit length, every integer for modular multiplication thus becomes a polynomial of specified digit lengths (8-bit, 16-bit etc.), invention then computes the multiplication of two integers offered in the form of polynomials.

Description

A LOW-LATENCY REDUNDANT MULTIPLIER AND METHOD FOR

THE SAME Technical Field of the Present Invention

The invention presented hereby generally concerns methods enabling fast circuit implementation of modular multiplication operations. Disclosed invention more specifically falls within the technical area of shortening circuit depths for reduction/multiplication circuits such as Barrett and Montgomery as defined particularly in cryptography.

Prior Art/ Background of the Present Invention Indispensable cryptographic processes are documented with a need for efficient modular multiplication as a precursor for high-performance implementation. For public key cryptography implementations such as classical RSA, Diffie-Hellman or (hyper-) elliptic curve algorithms' demanding aspects in hardware e.g. logic operators, FPGAs, the art has mainly relied on popular methods of Montgomery multiplication and regular long-integer multiplication in combination with Barrett's modular reduction technique. In a specific point, the modular multiplication operation of large numbers and many relatively slower incarnations require optimizations for circuit depth and critical path. Solutions existing in the art mainly focus on throughput optimization for multiplication of large numbers, whereas a latency optimization is yet to be documented.

Publication document with number CN 107766032 (A) discloses a polynomial-based GF(2 n) multiplier. The multiplier is used for calculating a product of an element A and an element B in a polynomial ring, comprising a quotient solving module, an intermediate modular multiplication calculation module and a summation module, wherein the quotient solving module is used for calculating a quotient q obtained after the product of the polynomials A and B for modular multiplication is divided by an n-degree polynomial; the intermediate modular multiplication calculation module is used for calculating modular multiplication between the product AB of the polynomial A and the polynomial B and the polynomial to obtain an intermediate modular value (c+q); and the input end of the summation module is connected with the output end of the intermediate modular multiplication calculation module and the output end of the quotient solving module, and the summation module is used for subtracting the quotient q from the intermediate modular value (c+q) to obtain a modular multiplication value c of the product AB of the polynomial A and the polynomial B relative to a polynomial f(x). Through the multiplier, a direct module solving step relative to the polynomial f(x) is unavailable, less XOR gates and AND gates are available on average, and therefore the space complexity of the multiplier is lowered under the condition that time complexity is not improved. A document in the prior art US 10 101 969 (Bl) relates to a system including an integrated circuit (IC) configure to receive a multiplicand number, a multiplier number, and a modulus at one or more data inputs. The multiplicand and the multiplier numbers are partitioned into a plurality of multiplicand words with different specific widths. A plurality of outer loop iterations of an outer loop is performed to iterate through the plurality of the multiplicand words. Each outer loop iteration of the outer loop includes a plurality of inner loop iterations of an inner loop performed to iterate through the plurality of the multiplier words. A Montgomery product of the multiplicand number and the multiplier number with respect to the modulus is determined. Objects of the Present Invention

Primary object of the disclosed invention is to present a low-latency redundant multiplier.

Another object of the disclosed invention is to present a low-latency modular multiplication means.

Another object of the disclosed invention is to present a method of modular multiplication marked by a very short circuit depth enabled by an optimal critical path.

Summary of the Present Invention

In proposed invention, primary focus of which is public key cryprography applications in decentralized systems such as randomness beacons, leader election in consensus protocols, and proofs-of-replication and more specifically verifiable delay functions (VDFs); a computationally inexpensive architecture for modular multiplication is disclosed. Marked by a very low latency compared to the teachings and disclosures in the art, present method is usable in exponentiation with a very high degree, as well as obviating the need for full intermediate reduction next to rendering lazy reduction feasible.

To establish an efficient and fast integer modular multiplication framework, integers are represented as polynomials in a way such that any n-bit integer is expressable by a k-degree polynomial. Denoted by k = n/d, where d is digit length, every integer for modular multiplication thus becomes a polynomial of specified digit lengths (8-bit, 16-bit etc.), invention then computes the multiplication of two integers offered in the form of polynomials. Polynomial coefficients may be bitwise one greater than the digit width, making the algorithm efficient. Therefore, each of the digitwise polynomial coefficient computations constitute the ultimate critical path respective with the determined digit width.

To improve the critical path of the modular multiplication, disclosed invention offers a representation of modular subjects in the polynomial form that facilitates the modular multiplication operation. Polynomial multiplication centered architecture of the processing means is based on the width of the digits for conversion from very large integer to polynomial form, as digits pertain to coefficients of the polynomial form once conversion is computed. This induces a very low latency for redundant multiplication means compared to the state of the art, enabling great time advantage in public key cryptography based computation by alleviating the load.

Brief Description of the Figures of the Present Invention

Accompanying figures are given solely for the purpose of exemplifying a low latency redundant/modular multiplication architecture, whose advantages over prior art were outlined above and will be explained in brief hereinafter.

The figures are not meant to delimit the scope of protection as identified in the claims nor should they be referred to alone in an effort to interpret the scope identified in said claims without recourse to the technical disclosure in the description of the present invention. Fig. 1 demonstrates the schoolbook multiplication algorithm for polynomials according to the disclosed invention.

Fig. 2 demonstrates the accumulation layout for 4 by 4 polynomial multiplier according to an embodiment of the disclosed invention.

Fig. 3 demonstrates the reduction of lower 8-bits of 16-bit digits of a polynomial with lookup tables according to an embodiment of the disclosed invention.

Fig. 4 demonstrates the reduction of higher 8-bits of 16-bit digits of a polynomial with lookup tables according to an embodiment of the disclosed invention.

Fig. 5 demonstrates the reduction of polynomial forms in accordance with Barrett procedure according to an embodiment of the disclosed invention.

Fig. 6 demonstrates the reduction of polynomial forms in accordance with Montgomery procedure according to an embodiment of the disclosed invention.

Detailed Description of the Present Invention

The present invention discloses a highly efficient and fast circuit implementation of a modular multiplication operation with a method for the same. Disclosed invention is novel next to the technique in the art in the sense that any very large integer that is subject to a modular multiplication operation is expressable in polynomial form, i.e. an integer in n-bit digit form with every n-bit digit representing a polynomial coefficient. Polynomial multiplication therefore takes over the very-large- integer multiplication, improving the speed at which modular multiplication is handled.

In the disclosed invention, two instances of n-bit integers are accepted as input and modulus operation is executed in the following manner: C = A*B mod M (an n-bit integer). Algorithm in the disclosed invention represents an n-bit integer A as a k-degree polynomial A(x) and an n-bit integer B as a k-degree polynomial B(x) as follows (k=n/d, where d is digit length). Integers A and B are shown in Figure 1 in its form that is represented as a multiplicity of 16-bit digits. Where in an embodiment of the disclosed invention digits are 16-bit in width, in other embodiments digits may be of a general d-bit width. It should be noted that, although digits of an integer are d bits each, polynomial coefficients are allowed to grow to d+1 bits, for efficiency reasons. This particular redundancy in representation is, as crucial to the method at hand will be explained hereinafter.

Conversion from redundant polynomial representation to integer representation is done as follows: A k-degree polynomial, denoted as C(x), with (d+l)-bit coefficients and M, an n-bit modulus is accepted as input. What is the output of this sub-algorithm is C, an n-bit integer, where C is initialized as zero and from index i at zero to k, the degree of polynomial C(x), at every value of index i the bit width is multiplied with, making C much less than said value and incerementing thereof with the value. Thus, at index k, C mod M is achieved.

A sub-algorithm called the schoolbook multiplication algorithm for polynomials accepts two polynomials as such, one A(x) and B(x) to be multiplied. Referring to Figure 2, both polynomials are shown in sigma notation from index i = 0 to k. Output is therefore C(x), the multiplication of said two polynomials as itself a polynomial, straightforwardly computed. As an exemplary aspect of the disclosed invention, referring to Figure 2, a 4x4 Polynomial Multiplier is detailed in this section visualizing a Polynomial Multiplier. In Figure 2, A3:A0 are 17-bit coefficients of a degree-3 polynomial A and B3:B0 are 17-bit coefficients of a degree-3 polynomial B. Each AiBj are 34-bit numbers, which are results of multiplications Ai*Bj. Each column is accumulated together as shown in the Figure and there is no carry propagation between columns. Accumulation is realized using carry-save adder tree. Each Si-Ci pair has different length in theory. S0-C0 are 16-bit numbers, Sl-Cl are 19-bit numbers, S2-C2 are 20-bit numbers, etc. For this specific example, 8 separate accumulation circuits are needed. Each of said accumulation circuits may have different number of inputs.

Reduction is also disclosed according to an embodiment of the present invention. In this space, a polynomial, W(x) is considered, one such of degree 2k. A polynomial of degree-2k means that it has 2k coefficients, which calls for a reduction back to k coefficients. An n-bit number is representable with k coefficients, however intermediate results are allowed to extend to k+1 to eliminate an extra level of reduction. Polynomial W(x) is converted back to an integer modulo M as follows:

W = y W_tR^l(mod M), R = 2^d i= 0 The range of coefficients from 0 to (k-1) do not need to be reduced since they already exist in the (k+1) coefficient redundant result. Coefficients of range k to (2k+l) need to be reduced. According to an embodiment of the present invention, modulo reduction of each coefficient is precomputed and stored in look-up tables (LUTs). Referring to Figure 3, where a coefficient k is denoted as "W_k x k". After integer conversion, a relation with the following form "x k= 2 (d*k)=2 n" is achieved. So, if the following expression is precomputed for each j in the range (0:2d+l-l):

Pk [/] = / * 2ⁿ mod M

a look-up table that consists of polynomials Pk[j](x) of degree k with d-bit coefficients at every index is obtained as follows:

Instead of utilizing (d+l)-bit input and n-bit output look-up tables, the disclosed invention splits up each coefficient into 8-bit segments and reduce these segments separately. Figure 3 shows how the algorithm in the disclosed invention reduces the lower 8 bits. In one embodiment of the present invention, coefficients are arranged to be 17 bits and polynomial is arranged to be to be degree 128 (k=128, d=16).

Figure 4 shows how the higher 8-bit segments can be reduced using identical LUT structures as the lower 8-bit segments. This is an option for resource-limited environments. If there is enough resources, separate LUT structures can be built for reducing the higher 8-bit segments, decreasing the latency of the overall modular multiplication operation. Highest 1-bit segment of each coefficient can be reduced in the same expressed manner as explained above.

Reduction, according to at least one embodiment of the disclosed invention, may be undertaken by way of Montgomery and Barrett reduction algorithms.

After multiplying each coefficient together, (n/d) subresults are obtained, which will be accumulated together. This accumulation may be done in redundant form, utilizing Wallace tree adder structures. The accummulation seen in Figure 2 is realized utilizing Wallace tree adder, which dictates that the result is in carry-save redundant form. This enables very fast accumulation of a large number of numbers with very small circuit depth. Wallace tree adders provide a circuit depth of O(logn).

During reduction, after the results are retrieved from the look-up tables, multiple numbers need to be accumulated together. This accumulation can happen exactly as described hitherto, enabling very fast reduction of a large number of numbers with very small circuit depth.

According to one embodiment of the invention, instead of look-up table based reduction, Montgomery Reduction or Barrett Reduction algorithms may be utilized for the reduction of the 2n-bit number back to an n-bit number. Montgomery Reduction and Barrett Reduction algorithms may be applied to polynomials in the same manner as they are applied to integers. Algorithms are shown in Figures 5 and 6 respectively.

In a nutshell, disclosed invention relates to a low-latency redundant multiplication method and modular multiplication means marked with efficient and fast implementation, where integers are represented as polynomials in a way such that any n-bit integer is expressable by a k- degree polynomial. Integers for modular multiplication are represented as polynomials of specified digit lengths (8-bit, 16-bit etc.), post-which the multiplication of two integers offered in the form of polynomials is computed. Critical path of the modular multiplication is also greatly improved. Polynomial multiplication centered architecture of the processing means is based on the width of the digits for conversion from very large integer to polynomial form, as digits pertain to coefficients of the polynomial form once conversion is computed. This induces a very low latency for redundant multiplication means compared to the state of the art, enabling great time advantage in public key cryptography operations.

In one aspect of the present invention, a modular multiplication system for public key cryptography applications such as verifiable delay functions comprising a processing means is proposed.

In another aspect of the present invention, said processing means comprises at least one dedicated accumulation circuit configured for polynomial digitwise addition.

In another aspect of the present invention, said processing means further comprises a reduction mechanism configured for conversion to an integer form from a polynomial form.

In another aspect of the present invention, said processing means is configured to accept at least one integer. In another aspect of the present invention, said processing means is configured to compute polynomial form of said at least one accepted integer according to a predetermined digit width. In another aspect of the present invention, said processing means is configured to reduce lower half of polynomial digits according to a lookup table.

In another aspect of the present invention, said processing means is configured to reduce higher half of polynomial digits according to a lookup table.

In one aspect of the present invention, a modular multiplication method for public key cryptography applications such as verifiable delay functions is proposed.

In another aspect of the present invention, said method comprises a step of input accept, where at least one input for an integer for multiplication and one input for modulus are received.

In another aspect of the present invention, said method comprises a step of integer-to-polynomial, where said at least one input received for an integer are converted to polynomial representation according to a predetermined bit width.

In another aspect of the present invention, said method comprises a step of polynomial multiplication, where said at least one polynomial resulting from the previous step are multiplied based on polynomial digit addition. In another aspect of the present invention, said method comprises a step of polynomial reduction, where end product of the previous step of multiplication of 2k digits is reduced to k digits. In another aspect of the present invention, said method comprises a step of polynomial-to-integer, where the result of the reduction is converted back to an integer, modulo the said one input for modulus.

Claims

1) A modular multiplication system for public key cryptography applications such as verifiable delay functions comprising a processing means characterized in that;

said processing means comprises at least one dedicated accumulation circuit configured for polynomial digitwise addition; and, said processing means further comprises a reduction mechanism configured for conversion to an integer form from a polynomial form.

2) A modular multiplication system for public key cryptography applications as set forth in Claim 1 characterized in that said processing means is configured to accept at least one integer. 3) A modular multiplication system for public key cryptography applications as set forth in Claim 1 characterized in that said processing means is configured to compute polynomial form of said at least one accepted integer according to a predetermined digit width. 4) A modular multiplication system for public key cryptography applications as set forth in any preceding Claim characterized in that said processing means is configured to reduce lower half of polynomial digits according to a lookup table. 5) A modular multiplication system for public key cryptography applications as set forth in any preceding Claim characterized in that said processing means is configured to reduce higher half of polynomial digits according to a lookup table. 6) A modular multiplication method for public key cryptography applications such as verifiable delay functions characterized in that said method comprises distinct steps of;

input accept, where at least one input for an integer for multiplication and one input for modulus are received;

integer-to-polynomial, where said at least one input received for an integer are converted to polynomial representation according to a predetermined bit width;

polynomial multiplication, where said at least one polynomial resulting from the previous step are multiplied based on polynomial digit addition;

polynomial reduction, where end product of the previous step of multiplication of 2k digits is reduced to k digits; and,

polynomial-to-integer, where the result of the reduction is converted back to an integer, modulo the said one input for modulus.