CN116360728A

CN116360728A - Large integer addition operation acceleration method, device and storage medium

Info

Publication number: CN116360728A
Application number: CN202310164353.4A
Authority: CN
Inventors: 胡雪晖; 庞皓天; 王皓阳; 汪小川; 董俊伟; 郭伟; 吴天祺; 李金库
Original assignee: Shanghai Tongtai Information Technology Co ltd; Xidian University; 702th Research Institute of CSIC
Current assignee: Shanghai Tongtai Information Technology Co ltd; Xidian University; 702th Research Institute of CSIC
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-06-30

Abstract

The invention provides a large integer addition operation acceleration method, a device and a storage medium capable of efficiently processing large integer addition in parallel, wherein the method comprises the following steps: acquiring a first large integer and a second large integer to be subjected to addition calculation; dividing the first large integer and the second large integer into p elements respectively, and forming a first vector and a second vector respectively; based on a preset vector addition instruction, p first elements in a first vector are respectively stored in p corresponding processors, and are respectively added with p second elements in a second vector in parallel to obtain a result vector; each element in the result vector is shift-added to obtain a first large integer and a second large integer added result.

Description

Large integer addition operation acceleration method, device and storage medium

Technical Field

The invention belongs to the field of ciphertext operation, and particularly relates to a large integer addition operation acceleration method, a large integer addition operation acceleration device and a storage medium.

Background

In ciphertext operations, large integer operations cannot be avoided, and large integers greater than 1024 bits, even 3072 bits, are typically used. Addition is often considered to be computationally uncomplicated in general, but for large integers addition operations remain a space for optimization, especially in ciphertext operations where there are often a large number of addition operations. Therefore, the performance and the operation efficiency can be improved by optimizing the addition operation, and the calculation cost is reduced. Therefore, the acceleration of the large integer addition is very necessary and significant, and a set of scheme for accelerating the large integer addition operation needs to be designed, so that the method can be used for various scenes such as ciphertext operation and the like, and the development of related technologies is promoted.

Common large integer addition implementations typically employ arrays to simulate addition "upstands", but this approach is not computationally parallel and is relatively inefficient. The existing parallel processing method allocates the number of processors according to the length of the to-be-added number and the maximum bit number of each processor, and still has an optimized space.

Disclosure of Invention

In order to improve the addition calculation efficiency of large integers, the invention provides a large integer addition operation acceleration method, a device and a storage medium capable of processing large integer calculation in parallel, and the invention adopts the following technical scheme:

the invention provides a large integer addition operation acceleration method, which comprises the following steps: acquiring a first large integer and a second large integer to be subjected to addition calculation; dividing the first large integer and the second large integer into p elements respectively, and forming a first vector and a second vector respectively; based on a preset vector addition instruction, p first elements in a first vector are respectively stored in p corresponding processors, and are respectively added with p second elements in a second vector in parallel to obtain a result vector; each element in the result vector is shift-added to obtain a first large integer and a second large integer added result.

The large integer addition operation acceleration method provided by the invention can also have the technical characteristics that each element in the result vector is shifted and added to obtain a first large integer and a second large integer addition result, and the method comprises the following steps: obtaining a carry vector of the result vector; adding 0 element to the lowest bit of the carry vector, and discarding the highest bit of the carry vector to obtain a carry vector to be calculated; adding the carry vector to be calculated and the result vector through a vector addition instruction to obtain a final vector; and outputting each element of the final vector in turn to obtain an addition result of the first large integer and the second large integer.

The large integer addition operation acceleration method provided by the invention can also have the technical characteristics that the lengths of the first large integer and the second large integer are n, and the lengths n/p of the elements are smaller than the number of bits which can be processed by a processor.

The large integer addition operation acceleration method provided by the invention can also have the technical characteristics that the preset vector addition instruction is realized based on an AVX-512 instruction set, and the length n is 512.

The large integer addition operation acceleration method provided by the invention can also have the technical characteristics that the preset vector addition instruction is realized based on an AVX2 instruction set, and the length n is 256.

The large integer addition operation acceleration method provided by the invention can also have the technical characteristics that when the first large integer and the second large integer are respectively segmented into p elements, if the number of bits of the elements does not meet the length n/p, the first bit of the elements is supplemented with 0.

The invention also provides a large integer multiplication acceleration device, which comprises: the acquisition module is used for acquiring a first large integer and a second large integer to be subjected to addition calculation; the segmentation module is used for respectively segmenting the first large integer and the second large integer into p elements and respectively forming a first vector and a second vector; the addition control module is used for controlling p processors to correspondingly store p first elements in the first vector based on a preset vector addition instruction, and respectively adding the p first elements in parallel with p second elements in the second vector to obtain a result vector; and the result adding module is used for carrying out shift addition on each element in the result vector to obtain a first large integer and a second large integer added result.

The invention also provides a computer readable storage medium storing computer executable instructions that, when invoked and executed by a processor, cause the processor to implement the above-described method.

The actions and effects of the invention

According to the large integer addition operation acceleration method, the large integer addition operation acceleration device and the storage medium, the two large integers to be subjected to addition calculation are segmented to form the two vectors, the number of elements (dimensions) of the vectors is the same as that of the processors, and then the preset addition instruction is used for achieving the purpose of divide-and-conquer, so that the addition operation of each element of the vectors is performed in parallel, and finally the addition result of the large integers is calculated according to the obtained result vector. Therefore, the method can utilize all parallel processors to perform sectional calculation on the large integer, save time cost and reduce the calculation complexity of the addition operation of the large integer.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flow chart of a large integer addition acceleration method according to an embodiment of the present invention.

Fig. 2 is a flowchart of the substeps of step S4 in the embodiment of the invention.

FIG. 3 is a schematic diagram of a large integer addition accelerator in accordance with an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

< example >

Referring to fig. 1, fig. 1 is a flowchart of a large integer addition acceleration method according to an embodiment of the present invention. The large integer addition operation acceleration method provided by the invention specifically comprises the following steps S1 to S4:

step S1, obtaining a first large integer A and a second large integer B with the length of n to be subjected to addition calculation.

Step S2, dividing the first large integer and the second large integer into p elements with the length of n/p,these elements respectively constitute a first vector A _i Second vector B _i (i=1, 2, …, p). If the bit number of the segmented element does not meet the length n/p, the bit is complemented by 0 at the first bit of the element.

Specifically, taking a= 12543, b= 10000012543, and p=4 as examples, the length n of the second largest integer B is 11, the length of each element is n/p=3 (2.75 is rounded up), and then the first vector a is formed by slicing _i ＝[000,000,012,543]Second vector B _i ＝[010,000,012,543]。

Step S3, based on a preset vector addition instruction, p first elements in the first vector are respectively stored in p corresponding processors, and are respectively added with p second elements in the second vector in parallel to obtain a result vector C.

The addition calculation performed in step S3 for the first vector and the second vector may use GMP to perform large integer calculations (GMP is an arbitrary precision open source arithmetic library with a rich set of functions and common interfaces for both functions).

However, in the present embodiment, the addition calculation of step S3 is implemented based on the SIMD instruction set. Taking AVX2 as an example, this approach is more efficient than GMP-based computations described above. The AVX2 instruction set provides execution units for single 256-bit data and control instructions such that the combined vector data width that the CPU can handle once is up to 256 bits (4 parallel processors, 64 bits per processor maximum) and 16 ZMM registers, which makes the AVX2 instruction set a significant advantage in HPC (high performance computing cluster) that is difficult with other addition computing schemes such as GMP described above. The embodiment utilizes the AVX2 instruction set to calculate the segmented large integer in parallel, thereby improving the efficiency.

As another implementation, the addition calculation of step S3 may also be implemented by an AVX-512 instruction set, where AVX-512 is a third generation AVX advanced vector expansion instruction set, and an execution unit of single 512-bit data and control instructions is provided, so that the single-processable combined vector data width of the CPU reaches 512 bits (with 8 parallel processors, each of which is 64 bits at maximum), and extends to 32 512-bit ZMM registers, which can provide 8 times performance improvement compared to AVX 2.

The following takes as an example the implementation of addition calculations based on the AVX2 instruction set.

In step S2 of the present embodiment, since the AVX2 instruction set supports 4 processors computing in parallel, p is taken as 4 (equal to the number of processors, p may be taken as 8 if AVX-512 is employed).

Let it be the first vector A _i ＝[000,000,012,543]Second vector B _i ＝[010,000,012,543]Then, when executing step S3 of the present embodiment, [000,000,012,543 ] is first performed]4 elements are respectively stored into 4 processors, and then the 4 processors respectively calculate first vector A _i Elements of and second vector B _i Is a sum of the elements of (a) and (b). I.e., 000+010, 000+000, 012+012, and 543+543 are performed in parallel. The resulting vector c= [010,000,024,1086]。

In addition, since the maximum number of bits that each processor can process is 64 bits in this embodiment, the length n of the first large integer and the second large integer cannot exceed 4×64=256 bits, and the length n/p of each element needs to be less than 64 bits (i.e. the maximum is 63 bits). In practical applications, if the number of bits processed by the processor increases, or the number of processors increases, the length of the corresponding large integer that can be processed increases, for example, the AVX-512 instruction set supports 8 processors, where the length n does not exceed 8×64=512.

And S4, shifting and adding each element in the result vector C to obtain a first large integer and a second large integer adding result.

Referring to fig. 2, fig. 2 is a flow chart of substeps of step S4 in an embodiment of the present invention. The step S4 specifically comprises the following substeps S4-1 to S4-4:

in step S4-1, each element of the result vector C is divided by the n/p power of 10 to obtain a carry vector C', and the remainder is reserved in the result vector C to form a result vector C to be calculated. Examples: c= [010,000,024,1086], n/p=3, then the carry vector C' = [0, 1] is obtained after performing step S4-1, and the result vector c= [010,000,024,086] is calculated.

And S4-2, adding 0 element to the lowest bit of the carry vector C ', and discarding the highest bit of the carry vector C' to obtain the carry vector to be calculated. Examples: c '= [0, 1], then the carry vector C' = [0, 1,0] to be calculated is obtained after step S4-2 is performed.

And S4-3, adding the carry vector to be calculated and the result vector through a vector addition instruction to obtain a final vector.

In step S4-3 of the present embodiment, the addition process of C and C' is calculated and the first vector A is calculated _i And a second vector B _i The calculation process of (2) is the same, namely, the addition calculation is realized through an AVX2 instruction set.

And S4-4, outputting each element of the final vector in sequence to obtain an addition result of the first large integer and the second large integer. Examples: the result vector C= [010,000,024,086] to be calculated, the carry vector C' = [0, 1,0] to be calculated are added to obtain [010,000,025,086], and the large integer 010000025086 obtained by sequentially outputting the elements is the sum of the first large integer and the second large integer.

In the above, the whole flow of the large integer addition acceleration method in this embodiment is described.

Correspondingly, the embodiment also provides a large integer addition accelerating device corresponding to the large integer addition accelerating method. As shown in fig. 3, the large integer addition accelerating device 10 includes an acquisition module 11, a segmentation module 12, an addition control module 13, and a result addition module 14.

The obtaining module 11 is configured to obtain a first large integer and a second large integer to be subjected to addition calculation.

The splitting module 12 is configured to split the first large integer and the second large integer into p elements, and form a first vector and a second vector, respectively.

The addition control module 13 is configured to control p processors to store p first elements in the first vector correspondingly based on a preset vector addition instruction, and add the p first elements in parallel with p second elements in the second vector respectively to obtain a result vector.

The result adding module 14 is configured to shift-add each element in the result vector to obtain a first large integer and a second large integer added result.

The large integer-addition accelerating device 10 according to the present embodiment has the same implementation principle and technical effects as those of the foregoing method embodiment, and for brevity, reference may be made to the corresponding content of the foregoing method embodiment for the part of the description of the device embodiment that is not mentioned.

The present embodiment also provides a computer storage medium in which a program or instructions are stored which, when executed by a processor, implement the steps of a large integer addition acceleration method as described herein above.

Example operation and Effect

According to the large integer addition acceleration method, device and storage medium provided by the embodiment, since the two large integers to be added are segmented to form two vectors, the number of elements (dimensions) of the vectors is the same as that of the processors, and then the preset addition instruction is used for achieving the purpose of divide and conquer, the addition operation of each element of the vectors is performed in parallel, and finally the addition result of the large integers is calculated according to the obtained result vector. Therefore, the method can utilize all parallel processors to perform sectional calculation on the large integer, save time cost and reduce the calculation complexity of the addition operation of the large integer.

In addition, in the present embodiment, since AVX2 is also applied to the addition operation of the first and second vectors and the result vector and the carry vector, it is possible to promote more calculation efficiency than large integer addition using GMP calculation. Further, the use of AVX-512 may provide up to 8-fold performance improvement over AVX 2.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention for illustrating the technical solution of the present invention, but not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the foregoing examples, it will be understood by those skilled in the art that the present invention is not limited thereto: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. The large integer addition operation acceleration method is characterized by comprising the following steps of:

acquiring a first large integer and a second large integer to be subjected to addition calculation;

dividing the first large integer and the second large integer into p elements respectively, and forming a first vector and a second vector respectively;

based on a preset vector addition instruction, p first elements in the first vector are respectively stored in p corresponding processors, and are respectively added with p second elements in the second vector in parallel to obtain a result vector;

and shifting and adding each element in the result vector to obtain the addition result of the first large integer and the second large integer.

2. The large integer addition acceleration method of claim 1, wherein the shift-adding each element in the result vector to obtain the first large integer and the second large integer addition result, comprises:

obtaining a carry vector of the result vector;

adding 0 element to the lowest bit of the carry vector, and discarding the highest bit of the carry vector to obtain a carry vector to be calculated;

adding the carry vector to be calculated and the result vector through the vector addition instruction to obtain a final vector;

and outputting each element of the final vector in turn to obtain the addition result of the first large integer and the second large integer.

3. The large integer addition acceleration method of claim 1, wherein:

wherein the lengths of the first large integer and the second large integer are n,

the length n/p of the element is smaller than the number of bits that the processor can handle.

4. A large integer addition acceleration method according to claim 3, characterized in that:

wherein the preset vector addition instruction is realized based on an AVX-512 instruction set,

the length n is 512.

5. A large integer addition acceleration method according to claim 3, characterized in that:

wherein the preset vector addition instruction is realized based on an AVX2 instruction set,

the length n is 256.

6. A large integer addition acceleration method according to claim 3, characterized in that:

when the first large integer and the second large integer are respectively segmented into p elements, if the number of bits of the elements does not meet the length n/p, the first bit of the elements is supplemented with 0.

7. A large integer multiplication acceleration apparatus, comprising:

the acquisition module is used for acquiring a first large integer and a second large integer to be subjected to addition calculation;

the segmentation module is used for respectively segmenting the first large integer and the second large integer into p elements and respectively forming a first vector and a second vector;

the addition control module is used for controlling p processors to correspondingly store p first elements in the first vector based on a preset vector addition instruction, and respectively adding the p first elements in parallel with p second elements in the second vector to obtain a result vector;

and the result adding module is used for carrying out shift addition on each element in the result vector to obtain an addition result of the first large integer and the second large integer.

8. A computer readable storage medium storing computer executable instructions which, when invoked and executed by a processor, cause the processor to implement the method of any one of claims 1 to 6.