CN113504895B - Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device - Google Patents

Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device Download PDF

Info

Publication number
CN113504895B
CN113504895B CN202110791569.4A CN202110791569A CN113504895B CN 113504895 B CN113504895 B CN 113504895B CN 202110791569 A CN202110791569 A CN 202110791569A CN 113504895 B CN113504895 B CN 113504895B
Authority
CN
China
Prior art keywords
calculation
elliptic curve
reduction
matrix
barrel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110791569.4A
Other languages
Chinese (zh)
Other versions
CN113504895A (en
Inventor
高鸣宇
张烨
董江彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhixin Huaxi Information Technology Co ltd
Original Assignee
Shenzhen Zhixin Huaxi Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhixin Huaxi Information Technology Co ltd filed Critical Shenzhen Zhixin Huaxi Information Technology Co ltd
Priority to CN202110791569.4A priority Critical patent/CN113504895B/en
Publication of CN113504895A publication Critical patent/CN113504895A/en
Application granted granted Critical
Publication of CN113504895B publication Critical patent/CN113504895B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/724Finite field arithmetic
    • G06F7/725Finite field arithmetic over elliptic curves

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The utility model discloses an elliptic curve multi-scalar point multiplication calculation optimization method and an optimization device, which are characterized in that the middle variable point in the main calculation process is cached through designing a barrel matrix, the output of Pippenger middle quantity is avoided, the calculation is continuously operated and carried out in a running way until all the calculation is finished, the transverse reduction and the longitudinal reduction are carried out, the total calculation times of serial calculation are reduced from thousands of times to one time, the spending of most serial-parallel conversion, synchronous locking and the like is eliminated, and the continuous working time of a production line is effectively prolonged, so that the integral performance is improved.

Description

Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device
Technical Field
The application relates to the technical field of cryptography, in particular to an elliptic curve multi-scalar point multiplication calculation optimization method and an optimization device.
Background
In the related art, as shown in fig. 1, the pipenger algorithm is an algorithm adopted by one-round computation in the MSM module of PipeZK in which such computation needs to be performed in totalSecondary to produce->Dots, finally, this->The individual points are again entered as inputThe rows are calculated once, thereby obtaining the final result. Where N is the total amount of input data and M is the total amount of single pass input data, the former is typically 10 6 The latter is often of the order of 1000, 1024 equivalent.
In FIG. 1, for each G i The buckets (for temporarily storing the results of a point) have different storage contents, thus calculating different G i At this time, the Bucket needs to be emptied in advance each time. And performs a longitudinal reduction (Q j =∑2 G i ) And transverse reduction (G) i =∑ i B i ) This is a serial phase of the algorithm, where no parallel algorithm is able to perform parallel calculations for this calculation process, and thus this serial phase requires execution in totalOnce, for millions of input data volumes, this stage can be performed thousands of times and cannot be optimized in parallel, which can result in some performance penalty.
Accordingly, the related art has disadvantages in that:
1) Each round of computation requires a longitudinal reduction, and this computation process is hardly parallelizable, resulting in performance loss.
2) Each round of computation requires a lateral reduction, and this computation process is hardly parallelizable, resulting in a performance penalty.
3) Each round of computation needs to be written back to the memory, and this process brings extra logic control and brings a certain difficulty in implementation.
Disclosure of Invention
The present application aims to solve, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present application is to provide an elliptic curve multi-scalar point multiplication calculation optimization method, which effectively improves the continuous working time of a pipeline, thereby improving the overall performance.
Another object of the present application is to propose an elliptic curve multi-scalar point multiplication computation optimization device.
In order to achieve the above objective, an embodiment of an aspect of the present application provides an elliptic curve multi-scalar point multiplication calculation optimization method, including:
in the main calculation process, caching intermediate variable points in the main calculation process by utilizing one row of a barrel matrix;
while canceling the transverse reduction at the end of the main computation process in the PipeZK algorithm, keeping the content in the barrel matrix and continuing to the next main computation process;
canceling a partial calculation process of the tail longitudinal reduction of each round;
after all rounds are finished, performing transverse reduction and longitudinal reduction calculation on the barrel matrix, and outputting elliptic curve points through the barrel matrix.
According to the elliptic curve multi-scalar point multiplication calculation optimization method, variable points in a main calculation process are cached through designing a barrel matrix, output of Pippenger intermediate quantities is avoided, through continuous running and calculation in a running mode until all final calculations are finished, transverse reduction and longitudinal reduction are executed again, total calculation times of serial calculation are reduced from thousands of times to one time, serial-parallel conversion stages between batch processing are optimized, algorithm parallelism is improved through eliminating the conversion, and continuous working time of a production line is effectively improved, so that overall performance is improved.
In addition, the elliptic curve multi-scalar point multiplication calculation optimization method according to the above embodiment of the present application may further have the following additional technical features:
further, in one embodiment of the present application, the bucket matrices are commonRow 2 in total ζ -1 column, where λ is the bit width of the coefficient and ζ is the bit segment width.
Further, in one embodiment of the present application, the performing lateral reduction and longitudinal reduction calculations on the bucket matrix includes: and transversely reducing each row of the barrel matrix to obtain a plurality of elliptic curve points, and longitudinally reducing the elliptic curve points.
Further, in one embodiment of the present application, the performing lateral reduction and longitudinal reduction calculations on the bucket matrix includes: and performing longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and performing transverse reduction on the elliptic curve points.
Further, in one embodiment of the present application, in the lateral reduction calculation, the calculation of different rows is performed in parallel or performed in series; in the longitudinal reduction calculation, the calculation of different columns is performed in parallel or in series.
To achieve the above object, another embodiment of the present application provides an elliptic curve multi-scalar point multiplication computation optimization device, including:
the caching module is used for caching intermediate variable points in the main computing process by utilizing one row of the barrel matrix in the main computing process;
the first optimizing module is used for keeping the content in the barrel matrix and continuing to the next main computing process while canceling the transverse reduction at the end of the main computing process in the PipeZK algorithm;
the second optimizing module is used for canceling part of calculation process of longitudinal reduction at the tail of each round;
and the output module is used for carrying out transverse reduction and longitudinal reduction calculation on the barrel matrix after all rounds are finished, and outputting elliptic curve points through the barrel matrix.
According to the elliptic curve multi-scalar point multiplication calculation optimization device, variable points in a main calculation process are cached through designing a barrel matrix, output of Pippenger intermediate quantities is avoided, through continuous running and calculation in a running mode until all final calculations are finished, transverse reduction and longitudinal reduction are executed again, total calculation times of serial calculation are reduced from thousands of times to one time, serial-parallel conversion stages between batch processing are optimized, algorithm parallelism is improved through eliminating the conversion, and continuous working time of a production line is effectively improved, so that overall performance is improved.
In addition, the elliptic curve multi-scalar point multiplication computation optimization apparatus according to the above embodiment of the present application may further have the following additional technical features:
further, in one embodiment of the present application, the bucket matrices are commonRow 2 in total ζ -1 column, where λ is the bit width of the coefficient and ζ is the bit segment width.
Further, in one embodiment of the present application, the performing lateral reduction and longitudinal reduction calculations on the bucket matrix includes: and transversely reducing each row of the barrel matrix to obtain a plurality of elliptic curve points, and longitudinally reducing the elliptic curve points.
Further, in one embodiment of the present application, the performing lateral reduction and longitudinal reduction calculations on the bucket matrix includes: and performing longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and performing transverse reduction on the elliptic curve points.
Further, in one embodiment of the present application, in the lateral reduction calculation, the calculation of different rows is performed in parallel or performed in series; in the longitudinal reduction calculation, the calculation of different columns is performed in parallel or in series.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic diagram of a calculation process according to a prior art;
FIG. 2 is a flow chart of an elliptic curve multi-scalar point multiplication calculation optimization method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a calculation flow of an elliptic curve multi-scalar point multiplication calculation optimization method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a hardware architecture and a computing flow according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an access sequence to coefficient data in various rounds according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an access sequence to a socket when solving a final result after all rounds of computation according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an elliptic curve multi-scalar point multiplication computation optimization apparatus according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.
Definitions and related terms in this application are first introduced.
Bit section:
a bit segment refers to a portion of a certain number in a binary representation, e.g. "bits 2-4 of binary number 101011" is a bit segment of that number.
The following notation is used to represent bit segments:
a[p:q]
where a is a number, p is the lowest index of the bit segment, q is the highest index of the bit segment, the index starts from 0, and 0 represents the lowest bit. For example:
a[0:3]
representing a bit segment consisting of bits 0 to 3 of the binary number a, the bit segment having a width of 4
And (3) transverse reduction:
the transverse reduction refers to the following calculation process:
wherein G is the output of the calculation process, and the type is elliptic curve points; b (B) i Is an input to the calculation process, each B i Is elliptic curve point, the total number is 2 ζ -1; ζ is the bit segment width, a constant parameter of the calculation process, usually 4.
Longitudinal reduction:
the longitudinal reduction refers to the following calculation process:
wherein Q is the output of the calculation process, and the type is elliptic curve points; g i Is the input to the calculation process, each G i Is elliptic curve point, the total number isλ is the coefficient field binary bit width, associated with the input, common values are 256, 384, 512, etc.; ζ is the bit segment width, a constant parameter of the calculation process, usually 4.
Round:
the rotation is a certain cycle in the external cycle of the algorithm, specifically the algorithm comprises an external cycle and an internal cycle, M elliptic curve points and M coefficients are required to be read in each external cycle, and M is usually 1K. A round refers to a certain time in the process of the loop iteration.
The individual rounds are denoted and distinguished hereinafter by the kth round, where k is referred to as the round number or round subscript, starting from 0, with a maximum valueN is the input size of the whole algorithm, +.>Representation pair->The result of (2) is rounded up.
The number of bits:
the order refers to a certain cycle in the algorithm inner cycle, specifically the algorithm comprises an outer cycle and an inner cycle, the inner cycle reads corresponding bit segments of M coefficients taking p and q as bit segment parameters and M elliptic curves each time, namely M bit segments and M coefficients are read in total, and M is 1K, and p and q are related to the cycle parameters. The bit number refers to a certain time in the loop iteration process.
The individual digits are hereinafter denoted and distinguished by the kth digit, where k is referred to as the digit number or digit subscript, starting from 0, the maximum value being
The lowest-order subscript and the highest-order subscript of a bit section corresponding to a certain level have a functional relationship with a level number, the functional relationship from the level number to the lowest-order subscript of the bit section is hereinafter referred to as a low-order subscript mapping, the functional relationship from the level number to the highest-order subscript of the bit section is referred to as a high-order subscript mapping, and the functional relationship from the level number to a tuple consisting of the lowest-order subscript and the highest-order subscript is referred to as a subscript mapping.
The main calculation process comprises the following steps:
the main calculation process is the main calculation process in the internal circulation of the algorithm, and receives a plurality of elliptic curve points and bit segments of an equal number of coefficients, and outputs 2 ζ -1 elliptic curve point. The main process is to check the coefficient bit section corresponding to each elliptic curve point and transfer the elliptic curve point to the bucket with the corresponding label by using the value of the bit section as the subscript. When the elliptic curve points are transferred to the barrels with corresponding labels, if no elliptic curve points exist in the barrels, the elliptic curve points are directly filled into the barrels; if the elliptic curve points exist in the barrel, filling the elliptic curve points in the barrel and the elliptic curve point addition operation result of the current elliptic curve point into the barrel. Until all elliptic curve points and coefficients are traversed, the process is finished, and at the moment, all results in the barrel are output.
The calculation process requires specifying the input elliptic curve points, the input coefficients, the coefficient bit segment parameters, and the bucket position.
The application is mainly directed to optimization of the PipeZK accelerator architecture design on the MSM module. Namely, the prior art implementation scheme is PipeZK: the solution in the paper Accelerating Zero-Knowledge Proof with a Pipelined Architecture "is among others mainly to accelerate with a distributed system, with a GPU, or with an FPGA but not for multiple scalar dot-multiplications on elliptic curves. In the original scheme, when the size of input data is N, the data is divided into a plurality of arrays with the same length, and the arrays are processed in batches, wherein the length of the arrays and the total length of the arrays are kept equal as much as possible, each time an elliptic curve point coordinate is generated, and finally, the generated elliptic curve points are processed again to obtain a final point, and the point is the output of the MSM module.
Zero knowledge proof: zero knowledge proves to be a very useful cryptographic protocol for protecting privacy, and can be widely used in a plurality of application scenes such as blockchains. The prover can prove to the verifier that he knows a certain knowledge by means of zero knowledge proof without revealing any information about the knowledge itself.
PipeZK: the PipeZK accelerator is an accelerator for one of the zero-knowledge algorithms called the non-interactive succinct zero-knowledge proof (zero-knowledge Succinct Non-interactive Argument of Knowledge, also commonly abbreviated zk-SNARK) algorithm. The current proposal for zk-SNARK acceleration adopts a distributed system, a GPU and an FPGA, the former two cases do not overlap with the present application, the third proposal adopts FPGA acceleration, and the current proposal mainly aims at the operation acceleration on elliptic curves, but does not directly aim at the calculation process of multi-scalar point multiplication.
MSM (Multi-scalar multiplication): elliptic curve point multiplication (i.e. multi-scalar point multiplication), input as a set of coefficients and coordinates of points on a set of elliptic curves, output as an elliptic curve point, and a calculation formula as each coefficient and corresponding elliptic curveThe points on the line are multiplied and each multiplication result is added. I.e. q= Σk i P i K in (k) i Is the i-th coefficient, P i Is the i-th elliptic curve point, the bold represents the elliptic curve point, and the non-bold represents the common number. PipeZK has a POLY module in addition to the MSM module, and the present application is optimized for MSM only.
Elliptic curve: an important basic theory in the cryptography field, most elliptic curve-based cryptography algorithms are designed in dependence on the difficulty of decomposition of q=kp (similar well-known RSA algorithms are designed in dependence on the difficulty of decomposition of the product of two large prime numbers). Two such operations can be performed on the elliptic curve: two elliptic curve points are added, a common positive integer is multiplied by an elliptic curve point. The specific calculation formula is not direct addition and multiplication, and has a relatively complex solving process, and the calculating amount of the solving process is relatively large. Thus, MSM often has millions of elliptic curve points to calculate, and the calculation overhead becomes very huge.
The elliptic curve multi-scalar point multiplication calculation optimization method according to the embodiment of the present application is described below with reference to the accompanying drawings.
FIG. 2 is an elliptic curve multi-scalar point multiplication calculation optimization method according to an embodiment of the present application.
As shown in fig. 1, the elliptic curve multi-scalar point multiplication calculation optimization method comprises the following steps:
step S1, in the main calculation process, caching intermediate variable points in the main calculation process by utilizing one row of the barrel matrix.
Specifically, the method and the device buffer the intermediate variable points in different main calculation processes by designing a barrel matrix instead of a group of barrels, so that output of Pippenger intermediate quantities is avoided, the method and the device run and calculate continuously in a running mode until all the last calculation is finished, and then transverse reduction and longitudinal reduction are executed, so that the total calculation times of the serial calculation can be reduced from thousands of times to one time, and spending such as serial-parallel conversion, synchronous locking and the like is eliminated, and the calculation performance of the MSM is improved.
It should be noted that a Bucket (Bucket) is a reference to a storage unit of an elliptic curve point. Pippenger is a computational method of multi-scalar point multiplication on elliptic curves.
Further, in one embodiment of the present application, the bucket matrices are commonRow 2 in total ζ -1 column, where λ is the bit width of the coefficient and ζ is the bit segment width.
Specifically, let λ be the bit width of the coefficient (that is, each coefficient is λ number in binary), and common values are 256, 384, 512, etc.; ζ is the bit width, and the common value is 4. Designing a bucket matrix for a total ofRow 2 ζ -1 column.
And S2, while canceling the transverse reduction at the end of the main calculation process in the PipeZK algorithm, keeping the content in the barrel matrix and continuing to the main calculation process of the next level.
And S3, canceling a part of calculation process of longitudinal reduction at the tail of each round.
And S4, after all rounds are finished, performing transverse reduction and longitudinal reduction calculation on the barrel matrix, and outputting elliptic curve points through the barrel matrix.
Further, in one embodiment of the present application, performing lateral reduction and longitudinal reduction calculations on the bucket matrix includes: and transversely reducing each row of the barrel matrix to obtain a plurality of elliptic curve points, and longitudinally reducing the elliptic curve points.
Further, in one embodiment of the present application, performing lateral reduction and longitudinal reduction calculations on the bucket matrix includes: and carrying out longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and carrying out transverse reduction on the plurality of elliptic curve points.
Further, in one embodiment of the present application, in the lateral reduction computation, the computation of different rows is performed in parallel or in series; in the vertical reduction calculation, the calculation of different columns is performed in parallel or in series.
Specifically, after all rounds are finished, two types of transverse reduction and longitudinal reduction calculation are performed on the barrel matrix, and an elliptic curve point is output by both reduction.
The first reduction type is first transverse reduction and then longitudinal reduction. On the barrel matrix, transversely reducing each row to obtainAnd elliptical curve points. This is then->And (5) making a longitudinal reduction on each elliptic curve point. In the process of transversely reducing different rows, the rows are generally executed in parallel, but can be executed in series, and the execution sequence among the rows is not influenced.
The second type of reduction is longitudinal reduction followed by transverse reduction. On the barrel matrix, longitudinal reduction is carried out on each column to obtain 2 ζ -1 elliptic curve point, again for these 2 ζ -1 elliptic curve point makes a lateral reduction. In the process of longitudinal reduction of different columns, the columns are generally executed in parallel, but can also be executed in series, and the execution sequence among the columns is not influenced.
As shown in fig. 3, the calculation process in the elliptic curve multi-scalar point multiplication calculation optimization method is illustrated.
In the main computing process, the original input needs to be divided into a plurality of subsets, and the general dividing method is to divide the original input into a plurality of subsets as uniformly as possibleEach subset comprising approximately M elliptic curve points and M coefficients. However, the division may be out of order and unevenly without affecting the calculation result, and the number of subsets divided does not affect the final result. Partitioning the subsets only requires that the correspondence of each elliptic curve point in the subsets to its coefficients is not destroyed, and that eachThe elliptic curve points and each coefficient are used exactly once.
There is an extreme method of bypassing the above-described division feature by changing the data input, i.e., repeating a certain elliptic curve point a plurality of times or merging the repeated elliptic curve points, and making the corresponding coefficients thereof equal to the original corresponding coefficients.
In the main computation process portion of fig. 3, there are a number of mapping methods for the subscript mapping of the bit segments.
Let the round number be i and the bit number be j, besides the lowest subscript of the bit segment with jζ and the highest subscript of the bit segment with jζ+ζ -1 shown in FIG. 3, it can also beThe lowest bit section is the subscript and the lowest bit section is the subscript>Calculating for the highest index of the bit segment, the corresponding selected bucket position should be changed to select the +.>And (3) row.
For other bit segment selection methods, the corresponding bucket positions will also change correspondingly, but the core features selected are: the first bit section does not overlap; the second, all selected bit sections are found and integrated to be just the whole bit section of the coefficient; third, the barrel positions do not overlap; fourth, the union of all selected bucket positions is just the entire bucket matrix. Both the bit section selection and the bucket position selection methods satisfying the above four conditions can obtain correct results.
There are a number of mapping methods for bucket matrix row and column indices: in addition to the above method of changing the bit segments and the positions simultaneously and cooperatively, the result may also be directly stored in other designated barrel matrix positions during the main calculation process, and the input modes of the horizontal reduction or the vertical reduction may be adjusted simultaneously, so as to bypass the subscript relationship shown in fig. 3.
The method is characterized in that: the bucket accessed in the first, main calculation process, which corresponds to the value of the access coefficient bit segment, must be equal to the coefficient multiplied in the lateral reduction. The bucket accessed in the second, main calculation process, which corresponds to the lowest order subscript of the accessed bit segment, must be equal to the base 2 logarithm of the multiplication coefficient in the longitudinal reduction. The barrel position selection method meeting the above conditions can realize the elliptic curve multi-scalar point multiplication calculation optimization method of the embodiment of the application.
Specifically, as shown in fig. 4, for the main components and calculation flow diagrams of the MSM module optimized in the present application, fig. 4 only illustrates the flow of one round of calculation and related components. In fig. 4, the coefficient bit width is 256 bits, the elliptic curve point coordinates are 768 bits per component bit width, and N is 1048576, so that the single-pass data amount is 1024. (note that the parameters above are only described in this schematic diagram, and the actual technical solution can be completely changed to other reasonable values).
In fig. 4, "1024 scale" at the upper left side indicates data of a coefficient portion, "1024Point" at the upper right side indicates data of an elliptic curve Point portion, a number of cylinders at the lower left side indicates a buffer area, that is, a socket matrix, each cylinder is a socket capable of storing one elliptic curve Point, PADD portion at the lower right side indicates elliptic curve Point addition calculating unit capable of performing addition of two elliptic curve points and outputting one elliptic curve Point, and a portion of 3 boxes at the lower right side indicates FIFO queues for buffering all elliptic curve pairs required to be calculated by PADD.
In the scheme before optimization, only one row of pockets exists, and the numerical value of a certain section in the coefficient is used for determining which pocket a certain elliptic curve point is stored in. In the optimized scheme, the socket sharesRow 2 ζ -1 column, λ is the bit width of the coefficient. In principle, the parameter ζ can be changed, and the larger the ζ parameter is, the larger the resource overhead is, and selecting ζ=4 as the bit segment width is a compromise choice on the resource performance balance.
Taking this round of computation as an example, there are 1024 coefficients in total, as shown in fig. 5 where idx0 represents the first coefficient, idx1 represents the second coefficient, until idx1023 represents the 1024 th coefficient. Each coefficient bit width is 256 bits, 4 bits are a set of slices, and there are 64 slices in total. Each coefficient is highest on the left and lowest on the right, and is respectively marked as 0 group, 1 group and 63 group from right to left.
The specific calculation flow is that firstly, the 63 st group, i.e. the group at the leftmost side, is accessed sequentially from top to bottom to idx0, idx1 until idx1023, in the process, if the value of the slice is 0, the elliptic curve point corresponding to the coefficient is discarded, if the value is 1, the elliptic curve point corresponding to the coefficient is put into the first socket of the first row, if the value is 2, the elliptic curve point corresponding to the coefficient is put into the second socket of the first row, and so on. When the total 1024 coefficients in the 63 th group are accessed completely, returning to the first coefficient, accessing the 62 th group, discarding the point if 0, putting the point into the first socket of the second row if 1, putting the point into the second socket of the second row if 2, putting the point into the third socket of the second row if 3, and so on.
As shown in fig. 6, the calculation step of longitudinal reduction should be more efficient by hardware, but it can also be calculated by a software method, and still be more efficient than a general calculation method. Specifically, for each column, the calculation is performed from top to bottom, taking the first point of the column, multiplied by 2 ζ Then add the second point of the column and multiply by 2 ζ Then the third point of this column is added, and so on. (after adding the last line of dots, no longer multiply by 2 ζ ) This calculation is written back into the first row of this column, i.e. into row 0. If implemented in hardware, multiply by 2 ζ The operation of (2) should be broken down into ζ times the point operation and the software implementation is similar.
And finally, taking out the content in row0, and calculating. The method has two methods of hardware implementation and software implementation, and has good effects. Specifically, the first column is multiplied by 1, the second column is multiplied by 2, the third column is multiplied by 3, and so on, and finally the first and second columns are added. The constant times elliptic curve point method should be decomposed into a double point and a point plus operation.
Since the processes of horizontal reduction and vertical reduction are non-parallelizable parts in the original scheme and can only be executed in series, the subsequent calculation must wait for the calculation process to end, and most of hardware components are in idle state during the serial calculation. The serial computing overhead is eliminated, the hardware is reduced from thousands of times to only one time, the hardware is almost kept in a working state, and the front-end computing is not required to be waited, so that the overall acceleration performance is improved.
As another possible implementation, the same computation may be achieved by eliminating only the lateral reduction in the loop or only the longitudinal reduction in the loop, and corresponding changes may occur to the bucket matrix and associated storage structure. This acceleration effect is less efficient than the bucket matrix acceleration effect, but still higher than the PipeZK performance.
According to the elliptic curve multi-scalar point multiplication calculation optimization method provided by the embodiment of the application, the intermediate variable points in the main calculation process are cached through designing the barrel matrix, the output of Pippenger intermediate quantities is avoided, the continuous operation and calculation are continuously carried out until all the final calculation is finished, the transverse reduction and the longitudinal reduction are carried out, the total calculation times of serial calculation are reduced from thousands of times to one time, the serial-parallel conversion stage between the split batch processing is optimized, the algorithm parallelism degree is improved through eliminating the conversion, the continuous working time of the assembly line is effectively improved, and the overall performance is improved.
Next, an elliptic curve multi-scalar point multiplication calculation optimizing apparatus according to an embodiment of the present invention will be described with reference to the accompanying drawings.
FIG. 7 is a schematic diagram of an elliptic curve multi-scalar point multiplication computation optimization apparatus according to an embodiment of the present invention.
As shown in fig. 7, the elliptic curve multi-scalar point multiplication computation optimization apparatus includes: a caching module 100, a first optimization module 200, a second optimization module 300, and an output module 400.
The caching module 100 is configured to cache, in the main computing process, intermediate variable points in the main computing process by using one row of the bucket matrix.
The first optimization module 200 is configured to maintain the content in the bucket matrix and continue to the next primary computing process while canceling the transversal reduction at the end of the primary computing process in the PipeZK algorithm.
A second optimization module 300 is configured to cancel a portion of the calculation process of the last longitudinal reduction of each round.
And the output module 400 is used for performing transverse reduction and longitudinal reduction calculation on the barrel matrix after all rounds are finished, and outputting elliptic curve points through the barrel matrix.
Further, in one embodiment of the present application, the bucket matrices are commonRow 2 in total ζ -1 column, where λ is the bit width of the coefficient and ζ is the bit segment width.
Further, in one embodiment of the present application, performing lateral reduction and longitudinal reduction calculations on the bucket matrix includes: and transversely reducing each row of the barrel matrix to obtain a plurality of elliptic curve points, and longitudinally reducing the elliptic curve points.
Further, in one embodiment of the present application, performing lateral reduction and longitudinal reduction calculations on the bucket matrix includes: and carrying out longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and carrying out transverse reduction on the plurality of elliptic curve points.
Further, in one embodiment of the present application, in the lateral reduction computation, the computation of different rows is performed in parallel or in series; in the vertical reduction calculation, the calculation of different columns is performed in parallel or in series.
It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and will not be repeated here.
According to the elliptic curve multi-scalar point multiplication calculation optimization device provided by the embodiment of the application, the intermediate variable points in the main calculation process are cached through designing the barrel matrix, the output of Pippenger intermediate quantities is avoided, the continuous operation and calculation are continuously carried out until all the final calculation is finished, the transverse reduction and the longitudinal reduction are carried out, the total calculation times of serial calculation are reduced from thousands of times to one time, the serial-parallel conversion stage between the split batch processing is optimized, the algorithm parallelism degree is improved through eliminating the conversion, the continuous working time of the assembly line is effectively improved, and the overall performance is improved.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" is at least two, such as two, three, etc., unless explicitly defined otherwise.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (10)

1. An elliptic curve multi-scalar point multiplication computation optimization method, which is applied to an MSM module in a PipeZK accelerator, is used for optimizing an elliptic curve multi-scalar point multiplication computation flow of the MSM module to improve the performance of the PipeZK accelerator, and comprises the following steps:
in the main calculation process, caching intermediate variable points in the main calculation process by utilizing one row of a barrel matrix;
while canceling the transverse reduction at the end of the main computation process in the PipeZK algorithm, keeping the content in the barrel matrix and continuing to the next main computation process;
canceling a partial calculation process of the tail longitudinal reduction of each round;
after all rounds are finished, performing transverse reduction and longitudinal reduction calculation on the barrel matrix, and outputting elliptic curve points through the barrel matrix.
2. The method of claim 1, wherein the bucket matrices are commonAnd rows, which have 2 zeta-1 columns, wherein lambda is the bit width of the coefficient, and zeta is the bit segment width.
3. The method of claim 1, wherein the performing lateral reduction and longitudinal reduction calculations on the bucket matrix comprises:
and transversely reducing each row of the barrel matrix to obtain a plurality of elliptic curve points, and longitudinally reducing the elliptic curve points.
4. The method of claim 1, wherein the performing lateral reduction and longitudinal reduction calculations on the bucket matrix comprises:
and performing longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and performing transverse reduction on the elliptic curve points.
5. The method according to any one of claim 1 to 4, wherein,
in the lateral reduction calculation, the calculation of different rows is performed in parallel or in series;
in the longitudinal reduction calculation, the calculation of different columns is performed in parallel or in series.
6. An elliptic curve multi-scalar point multiplication computation optimization apparatus, wherein the apparatus is applied to an MSM module in a PipeZK accelerator, and the apparatus is configured to optimize an elliptic curve multi-scalar point multiplication computation procedure of the MSM module to improve performance of the PipeZK accelerator, and the apparatus comprises:
the caching module is used for caching intermediate variable points in the main computing process by utilizing one row of the barrel matrix in the main computing process;
the first optimizing module is used for keeping the content in the barrel matrix and continuing to the next main computing process while canceling the transverse reduction at the end of the main computing process in the PipeZK algorithm;
the second optimizing module is used for canceling part of calculation process of longitudinal reduction at the tail of each round;
and the output module is used for carrying out transverse reduction and longitudinal reduction calculation on the barrel matrix after all rounds are finished, and outputting elliptic curve points through the barrel matrix.
7. The apparatus of claim 6, wherein the bucket matrices are commonAnd rows, which have 2 zeta-1 columns, wherein lambda is the bit width of the coefficient, and zeta is the bit segment width.
8. The apparatus of claim 6, wherein the performing lateral reduction and longitudinal reduction calculations on the bucket matrix comprises:
and transversely reducing each row of the barrel matrix to obtain a plurality of elliptic curve points, and longitudinally reducing the elliptic curve points.
9. The apparatus of claim 6, wherein the performing lateral reduction and longitudinal reduction calculations on the bucket matrix comprises:
and performing longitudinal reduction on each column of the barrel matrix to obtain a plurality of elliptic curve points, and performing transverse reduction on the elliptic curve points.
10. The device according to any one of claims 6 to 9, wherein,
in the lateral reduction calculation, the calculation of different rows is performed in parallel or in series;
in the longitudinal reduction calculation, the calculation of different columns is performed in parallel or in series.
CN202110791569.4A 2021-07-13 2021-07-13 Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device Active CN113504895B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110791569.4A CN113504895B (en) 2021-07-13 2021-07-13 Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110791569.4A CN113504895B (en) 2021-07-13 2021-07-13 Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device

Publications (2)

Publication Number Publication Date
CN113504895A CN113504895A (en) 2021-10-15
CN113504895B true CN113504895B (en) 2024-02-20

Family

ID=78013250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110791569.4A Active CN113504895B (en) 2021-07-13 2021-07-13 Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device

Country Status (1)

Country Link
CN (1) CN113504895B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114879934B (en) * 2021-12-14 2023-01-10 中国科学院深圳先进技术研究院 Efficient zero-knowledge proof accelerator and method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004177582A (en) * 2002-11-26 2004-06-24 Fujitsu Ltd Elliptic curve ciphering system, and elliptic curve ciphering operation method
DE102005028662A1 (en) * 2005-03-04 2006-09-07 IHP GmbH - Innovations for High Performance Microelectronics/Institut für innovative Mikroelektronik Polynom multiplication calculating method e.g. for elliptical curve cryptography, making available coefficients with two polynomials each polynomial fragmented into two or more fragments, being operands partial multiplication
CA2542556A1 (en) * 2005-06-03 2006-12-03 Tata Consultancy Services Limited An authentication system executing an elliptic curve digital signature cryptographic process
CN101483517A (en) * 2007-12-28 2009-07-15 英特尔公司 A technique for aacelerating characteristic 2 eeliptic curve cryptography
CN109379191A (en) * 2018-09-07 2019-02-22 阿里巴巴集团控股有限公司 A kind of point multiplication operation circuit and method based on elliptic curve basic point
CN111373694A (en) * 2020-02-21 2020-07-03 香港应用科技研究院有限公司 Zero-knowledge proof hardware accelerator and method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7240084B2 (en) * 2002-05-01 2007-07-03 Sun Microsystems, Inc. Generic implementations of elliptic curve cryptography using partial reduction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004177582A (en) * 2002-11-26 2004-06-24 Fujitsu Ltd Elliptic curve ciphering system, and elliptic curve ciphering operation method
DE102005028662A1 (en) * 2005-03-04 2006-09-07 IHP GmbH - Innovations for High Performance Microelectronics/Institut für innovative Mikroelektronik Polynom multiplication calculating method e.g. for elliptical curve cryptography, making available coefficients with two polynomials each polynomial fragmented into two or more fragments, being operands partial multiplication
CA2542556A1 (en) * 2005-06-03 2006-12-03 Tata Consultancy Services Limited An authentication system executing an elliptic curve digital signature cryptographic process
CN101483517A (en) * 2007-12-28 2009-07-15 英特尔公司 A technique for aacelerating characteristic 2 eeliptic curve cryptography
CN109379191A (en) * 2018-09-07 2019-02-22 阿里巴巴集团控股有限公司 A kind of point multiplication operation circuit and method based on elliptic curve basic point
CN111373694A (en) * 2020-02-21 2020-07-03 香港应用科技研究院有限公司 Zero-knowledge proof hardware accelerator and method thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Marlin: Preprocessing zkSNARKs with Universal and Updatable SRS;lraj Fathirad;《Annual International Conference on the Theory and Applications of Cryptographic Techniques》;20200501;738–768 *
Ye Zhang,etc.PipeZK: Accelerating Zero-Knowledge Proof with a Pipelined Architecture.《2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)》.2021,1-12. *
椭圆曲线密码学若干算法研究;于伟;《中国博士学位论文全文数据库信息科技辑》;20140515;I136-23 *
约简加速求解的属性簇方法;陈妍;宋晶晶;杨习贝;;南京理工大学学报;20200515(02);216-223 *

Also Published As

Publication number Publication date
CN113504895A (en) 2021-10-15

Similar Documents

Publication Publication Date Title
CN110070178A (en) A kind of convolutional neural networks computing device and method
CN110516801A (en) A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
CN113504895B (en) Elliptic curve multi-scalar point multiplication calculation optimization method and optimization device
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
Seo et al. Multi-precision multiplication for public-key cryptography on embedded microprocessors
CN115348002B (en) Montgomery modular multiplication rapid calculation method based on multi-word length multiplication instruction
CN110851779B (en) Systolic array architecture for sparse matrix operations
CN113312178A (en) Assembly line parallel training task allocation method based on deep reinforcement learning
Han Improved fast integer sorting in linear space
Bisseling et al. Parallel LU decomposition on a transputer network
CN110019184A (en) A kind of method of the orderly integer array of compression and decompression
CN112949845B (en) Deep convolutional neural network accelerator based on FPGA
CN109711542A (en) A kind of DNN accelerator that supporting dynamic accuracy and its implementation
CN112799634B (en) Based on base 2 2 MDC NTT structured high performance loop polynomial multiplier
CN116881618B (en) General matrix multiplication calculation optimization method, device and processor
Roche Chunky and equal-spaced polynomial multiplication
CN116561819A (en) Encryption and decryption method based on from-Cook on-loop polynomial multiplication and on-loop polynomial multiplier
CN114936350B (en) Full-homomorphic encryption gate bootstrap method based on GPU (graphic processing unit) rapid number theory conversion
CN116090518A (en) Feature map processing method and device based on systolic operation array and storage medium
Hagerup et al. Optimal algorithms for generating discrete random variables with changing distributions
Powell On updating the inverse of a KKT matrix
US6256656B1 (en) Apparatus and method for extending computational precision of a computer system having a modular arithmetic processing unit
Serrano Efficient implementation of sparse matrix-sparse vector multiplication for large scale graph analytics
Wang et al. An FPGA-based reconfigurable CNN training accelerator using decomposable Winograd

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220126

Address after: 518048 1410, building 1, Changfu Jinmao building, Fubao community, Fubao street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Zhixin Huaxi Information Technology Co.,Ltd.

Address before: 100084 Tsinghua Yuan, Beijing, Haidian District

Applicant before: TSINGHUA University

TA01 Transfer of patent application right
CB02 Change of applicant information

Address after: 518048 1410, building 1, Changfu Jinmao building, south of Shihua Road, Fubao community, Fubao street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Zhixin Huaxi Information Technology Co.,Ltd.

Address before: 518048 1410, building 1, Changfu Jinmao building, Fubao community, Fubao street, Futian District, Shenzhen, Guangdong Province

Applicant before: Shenzhen Zhixin Huaxi Information Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant