CN112328206A - Parallel random number generation method for vectorization component - Google Patents
Parallel random number generation method for vectorization component Download PDFInfo
- Publication number
- CN112328206A CN112328206A CN202011212670.1A CN202011212670A CN112328206A CN 112328206 A CN112328206 A CN 112328206A CN 202011212670 A CN202011212670 A CN 202011212670A CN 112328206 A CN112328206 A CN 112328206A
- Authority
- CN
- China
- Prior art keywords
- random number
- seed
- pseudo
- random numbers
- vectorization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000008569 process Effects 0.000 claims abstract description 13
- 239000013598 vector Substances 0.000 claims description 35
- 238000012856 packing Methods 0.000 claims description 12
- 230000009191 jumping Effects 0.000 claims description 6
- 230000009471 action Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/58—Random or pseudo-random number generators
- G06F7/588—Random number generators, i.e. based on natural stochastic processes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/58—Random or pseudo-random number generators
- G06F7/582—Pseudo-random number generators
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a vectorization component-oriented parallel random number generation method, which comprises the following steps: step S10, generating a jump formula according to a linear congruence equation; step S20, determining LCG method parameters and initial values, vectorization width and the total number of pseudo-random numbers to be generated; step S30, distributing array space according to vectorization width; step S40, generating pseudo-random number subsequence seeds according to a jump formula iteration; step S50, generating random numbers according to LCG method; in step S60, it is determined whether all random numbers have been generated, and if not, the process returns to step S50, otherwise, the process ends. The invention can generate a plurality of pseudo random numbers simultaneously by executing the LCG method once, generates all random numbers by utilizing the vectorization part and the SIMD instruction iteration after inputting the initial value and the LCG method parameters, can generate a plurality of random numbers in parallel in each execution, and greatly improves the generation speed.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a parallel random number generation method for a vectorization component.
Background
With the increasing of the manufacturing process of microprocessors, it has become an important trend to integrate vectorization units supporting double-precision floating-point operations in commercial microprocessor chips to accelerate the floating-point operation capability. At present, vectorization components such as an Intel's MMX/SSE/AVX and a VPU in an MIC coprocessor are integrated on most commercial microprocessors, and the vectorization components operate vectors by using SIMD instructions, and one vector is composed of a plurality of floating point data, so that a single instruction operates a plurality of floating point data at the same time, and the calculation process of the microprocessor is accelerated.
At present, the pseudo-random number generator has important application in various scientific calculation programs, for example, in various Monte Carlo simulation programs, the rapid generation of high-quality pseudo-random numbers is the key for influencing the operation of the programs, and the program is vectorized by utilizing a vectorization component, so that the pseudo-random number generation process can be effectively accelerated. However, pseudo-random numbers are generally generated by using a mathematical method, and a digital sequence is sequentially generated from an initial value according to an iterative equation, and the digital sequence is called a pseudo-random number sequence, and in a patent with the patent number "CN 201110347722.0", which is named as a method and a device for generating pseudo-random number seeds and pseudo-random numbers, a pseudo-random number seed generation method and a pseudo-random number generation method are disclosed, which can generate pseudo-random numbers, but the method is generated serially, i.e., only one pseudo-random number can be generated at a time, the generation speed is slow, and the method may become a performance bottleneck of some programs sensitive to the pseudo-random number generation speed.
In summary, the current pseudo random number generation method is serial generation, only one pseudo random number can be generated in each execution, parallelism and execution speed of a main program are affected, and a vectorization component in a microprocessor provides vectorization support for an algorithm to generate a plurality of pseudo random numbers at one time from hardware; therefore, it is necessary to provide a parallel random number generation method for vectorization units, so as to generate multiple pseudo random numbers at a time.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a parallel random number generation method for a vectorization component.
The technical scheme of the invention is as follows:
a vectorized component-oriented parallel random number generation method, comprising the steps of:
step S10, generating a jump formula according to a linear congruence equation;
step S20, determining LCG method parameters and initial values, vectorization width and the total number of pseudo-random numbers to be generated;
step S30, distributing array space according to vectorization width;
step S40, generating pseudo-random number subsequence seeds according to a jump formula iteration;
step S50, generating random numbers according to LCG method;
and step S60, judging whether all random numbers are generated, wherein if not, jumping back to the step S50, otherwise, ending.
Further, the linear congruence equation in the step S10 is xi+1=(axi) modM in which xiIs an initial value, a is a multiplier, and M is a modulus, then the step S10 is realized by letting x be a multiplieri+2=(axi+1) mod M, let xi+3=(axi+2) mod M, and let xi+3=a(a(aximodM) and derives x from the formula x (ymodm) modM ═ xyymodmi+3=a3ximod M, so that by fitting the formula xi+3=a3xim iterations of modM derive the hopping formula as xi+m=amximodM。
Further, the LCG method parameters in step S20 are a multiplier a and a modulus M, and an initial value x of the LCG method is setiIs x0The vectorization width is W, the total number of generated pseudo random numbers is N, the precision of the pseudo random numbers is J bits, and the number of the pseudo random numbers actually generated by each vector unit is W
Further, the array space in step S30 includes a high precision array seed h and a low precision array seed l allocated with the vectorization width as the length.
Further, the step S40 is according to the jump formula xi+m=amxiThe modM jumps m steps in sequence to generate W pseudo-random number subsequence seeds which are respectively stored in a high-precision array seed H and a low-precision array seed L, wherein m is the number of steps jumping forwardsa is a known multiplier, xiIs a known initial value x0M is a known modulus and all data are highBit and lowBit sorted storage, assigning a high precision data set seed of length W for storing a high of pseudo-random number subsequence seedsBit data, allocating a high precision data set seed L of length W for storing the pseudo random number subsequence seedBit data, in parallel with operatorIs taken as high as one numberBit, operatorIs taken as one number lowA bit.
Further, the step S40 is realized by making i-1, t-1,low of stored multiplier aThe bit data is a bit data,store multiplier a highThe bit data is a bit data,storing a high of a pseudorandom number subsequence seedThe bit data is a bit data,storing pseudorandom number subsequence seedBit data, and let t be t +1, the first pseudorandom number subsequence seed is x0The first pseudorandom number subsequence seed being stored inOrReissue to order Let i equal i +1, if i is less than m, jump backTo complete solution amIf i is greater than or equal to m, let tmp1 be al×seedL[t-1]Obtaining the previous seed [ t-1 ] of pseudo-random number subsequence](ii) a Reissue to order Let t be t +1, if t is smaller than W, jump back to tmp1 be al×seedL[t-1]Otherwise, all pseudorandom number subsequence seeds have been generated and the next action is performed.
Further, the step S50 generates random numbers by using an LCG method including a linear congruence equation x according to the pseudo random number subsequence seed generated by the step S40i+1=(axi) modM, where the operator ═ x ++, and,All corresponding SIMD instructions are adopted, the operated data are required to be packed into a vector register, mmvecH, mmvecL, mmvecK, mmvecA, mmvecB and mmvecR are set as temporary variables of the vector register, the variables are different from common variables, a vector is stored inside, the assignment of the variables is actually equivalent to the process of packing the data into the vector register, and an array rand [ W ] with the length of W is distributed]The generated random number is stored.
Further, the step S50 is implemented by setting n to 1, and setting mmvecH to seed h to implement packing the entire seed h array into the vector register; let mmvecL be seedL to implement packing the entire seedL array into the vector register; packing a multiplier a in the LCG parameter into a vector register by letting mmvecK be a; order toVector multiplication for SIMD instructions; order toOrder toReissue to order An update value for calculating a next random number; let rand mmvecR to effect writing the value in the vector register back to array rand [ w ═ w]Middle, then the array rand [ w ]]The data stored in (1) is the W random numbers calculated this time.
Further, the determination condition in step S60 is to make n equal to n +1 and then compare n with nThe size of (d); if it is notIf it is determined that all the random numbers are not generated, the process returns to step S50 to continue generating the remaining random numbers, otherwise, the process ends.
By adopting the scheme, the invention has the following beneficial effects:
the invention can generate a plurality of pseudo random numbers simultaneously by executing the LCG method once, generates all random numbers by utilizing the vectorization part and the SIMD instruction iteration after inputting the initial value and the LCG method parameters, replaces serial generation, can generate a plurality of random numbers in parallel in each execution and greatly improves the generation speed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a parallel random number generation method for a vectorization component according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The invention is described in detail below with reference to the figures and the specific embodiments.
Referring to fig. 1, the present invention provides a vectorization component-oriented parallel random number generation method, including the following steps:
step S10, generating a jump formula according to a linear congruence equation;
step S20, determining LCG method parameters and initial values, vectorization width and the total number of pseudo-random numbers to be generated;
step S30, distributing array space according to vectorization width;
step S40, generating pseudo-random number subsequence seeds according to a jump formula iteration;
step S50, generating random numbers according to LCG method;
and step S60, judging whether all random numbers are generated, wherein if not, jumping back to the step S50, otherwise, ending.
In this embodiment, the linear congruence equation in the step S10 is xi+1=(axi) modM in which xiIs an initial value, a is a multiplier, and M is a modulus, then the implementation manner of step S10 is to make xi+2=(axi+1) mod M, let xi+3=(axi+2) mod M, and let xi+3=a(a(aximodM) and derives x from the formula x (ymodm) modM ═ xyymodmi+3=a3ximod M, so that by fitting the formula xi+3=a3xim iterations of modM derive the hopping formula as xi+m=amximodM。
In this embodiment, the LCG method parameters in step S20 are a multiplier a and a modulus M, and an initial value x of the LCG method is setiIs x0Vectorization width W, the total number of pseudo-random numbers generated is N, pseudoThe precision of the random number is J bits, the number of pseudo random numbers actually generated by each vector unit isI.e. the range of pseudo-random number subsequences generated by the first vector unit isThe second vector unit generates a range of pseudo-random number subsequences ofThe third and subsequent vector units generate a range of pseudo-random number subsequences and so on.
In the present embodiment, the array space in step S30 includes a high precision array seed h and a low precision array seed l allocated with the vectorization width as the length.
In the present embodiment, the step S40 is performed according to the jump formula xi+m=amxiThe modM jumps m steps in sequence to generate W pseudo-random number subsequence seeds which are respectively stored in a high-precision array seed H and a low-precision array seed L, wherein m is the number of steps jumping forwardsa is a known multiplier, xiIs a known initial value x0M is a known modulus; in order to improve the calculation precision, all data are high in the calculation processBit and lowBit sorted storage, assigning a high precision tuple seed H of length W (i.e., vectorization width) for storing a high of pseudo-random number subsequence seedsBit data, a high-precision data set seed L with the length of W is allocated for storing the fakeLow of random number subsequence seedBit data, in parallel with operatorIs taken as high as one numberBit, operatorIs taken as one number lowA bit;
specifically, the step S40 is realized by setting i equal to 1, t equal to 1,low of stored multiplier aThe bit data is a bit data,store multiplier a highThe bit data is a bit data,storing a high of a pseudorandom number subsequence seedThe bit data is a bit data,storing pseudorandom number subsequence seedBit data, and let t be t +1, the first pseudorandom number subsequence seed is x0The first pseudorandom number subsequence seed being stored inOrReissue to order Let i equal i +1, if i is less than m, jump backTo complete solution amIf i is greater than or equal to m, let tmp1 be al×seedL[t-1]Obtaining the previous seed [ t-1 ] of pseudo-random number subsequence](ii) a Reissue to order Let t be t +1, if t is smaller than W, jump back to tmp1 be al×seedL[t-1]Otherwise, all pseudorandom number subsequence seeds have been generated and the next action is performed.
In this embodiment, the step S50 generates random numbers by using an LCG method including a linear congruence equation x according to the pseudo random number subsequence seed generated in the step S40i+1=(axi) modM, where the operator ═ x ++, and,Corresponding SIMD instructions are all adopted, the operated data are packed into a vector register, mmvecH, mmvecL, mmvecK, mmvecA, mmvecB and mmvecR are set as temporary variables of the vector register,the variable is different from the common variable, the internal storage is a vector, the assignment of the variable is actually equivalent to the process of packing data into a vector register, and an array rand [ W ] with the length of W is distributed]Storing the generated random number;
specifically, the step S50 is implemented by setting n to 1, and setting mmvecH to seed h to implement packing the entire seed h array into the vector register; let mmvecL be seedL to implement packing the entire seedL array into the vector register; packing a multiplier a in the LCG parameter into a vector register by letting mmvecK be a; order toVector multiplication for SIMD instructions; order toOrder toReissue to order An update value for calculating a next random number; let rand mmvecR to effect writing the value in the vector register back to array rand [ w ═ w]Middle, then the array rand [ w ]]The data stored in (1) is the W random numbers calculated this time.
In this embodiment, the determination condition of step S60 is to make n equal to n +1 and then compare n with nThe size of (d); if it is notIf it is determined that all the random numbers are not generated, the process returns to step S50 to continue generating the remaining random numbers, otherwise, the process ends.
Compared with the prior art, the invention has the following beneficial effects:
the invention can generate a plurality of pseudo random numbers simultaneously by executing the LCG method once, generates all random numbers by utilizing the vectorization part and the SIMD instruction iteration after inputting the initial value and the LCG method parameters, replaces serial generation, can generate a plurality of random numbers in parallel in each execution and greatly improves the generation speed.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. A vectorized component-oriented parallel random number generation method, comprising the steps of:
step S10, generating a jump formula according to a linear congruence equation;
step S20, determining LCG method parameters and initial values, vectorization width and the total number of pseudo-random numbers to be generated;
step S30, distributing array space according to vectorization width;
step S40, generating pseudo-random number subsequence seeds according to a jump formula iteration;
step S50, generating random numbers according to LCG method;
and step S60, judging whether all random numbers are generated, wherein if not, jumping back to the step S50, otherwise, ending.
2. The vectorized component oriented parallel random number generation method according to claim 1, wherein the linear congruence equation in step S10 is xi+1=(axi) mod M, where xiIs an initial value, a is a multiplier, and M is a modulus, then the step S10 is realized by letting x be a multiplieri+2=(axi+1) mod M, let xi+3=(axi+2) mod M, and let xi+3=a(a(aximod M) mod M, and derives x by the formula x (y mod M) mod M-xy mod Mi+3=a3ximod M, so that by matching the formula xi+3=a3xiM iterations of mod M to derive the hopping formula as xi+m=amximod M。
3. The parallel random number generation method for vectorized components according to claim 2, wherein the LCG parameters in step S20 are a multiplier a and a modulus M, and an LCG initial value x is setiIs x0The vectorization width is W, the total number of generated pseudo random numbers is N, the precision of the pseudo random numbers is J bits, and the number of the pseudo random numbers actually generated by each vector unit is W
4. The parallel random number generation method for the vectorization unit according to claim 3, wherein the array space in step S30 includes a high precision array seed h and a low precision array seed l allocated with the vectorization width as a length.
5. The vectorized component oriented parallel random number generation method according to claim 4, wherein said step S40 is according to the jump formula xi+m=amxiThe mod M jumps M steps in sequence to generate W pseudo-random number subsequence seeds which are respectively stored in a high-precision array seed H and a low-precision array seed L, wherein M is the number of steps jumping forwardsa is a known multiplier, xiIs a known initial value x0M is a known modulus and all data are highBit and lowBit sorted storage, assigning a high precision data set seed of length W for storing a high of pseudo-random number subsequence seedsBit data, allocating a high precision data set seed L of length W for storing the pseudo random number subsequence seedBit data, in parallel with operatorIs taken as high as one numberBit, operatorIs taken as one number lowA bit.
6. The vectorized component oriented parallel random number generation method according to claim 5, wherein said step S40 is implemented by letting i-1, t-1,low of stored multiplier aThe bit data is a bit data,store multiplier a highThe bit data is a bit data,storing a high of a pseudorandom number subsequence seedThe bit data is a bit data,storing pseudorandom number subsequence seedBit data, and let t be t +1, the first pseudorandom number subsequence seed is x0The first pseudorandom number subsequence seed being stored inOrReissue to order Let i equal i +1, if i is less than m, jump backTo complete solution amIf i is greater than or equal to m, let tmp1 be al×seedL[t-1]Obtaining the previous seed [ t-1 ] of pseudo-random number subsequence](ii) a Reissue to order Let t be t +1, if t is smaller than W, jump back to tmp1 be al×seedL[t-1]Otherwise, all pseudorandom number subsequence seeds have been generated and the next action is performed.
7. The parallel random number generation method for the vectorized component of claim 6, wherein said step S50 generates the random number according to the sub-sequence seed of the pseudo random number generated in said step S40 by using the LCG method including the linear congruence equation xi+1=(axi) mod M, where the operator ═ x ++, and, All corresponding SIMD instructions are adopted, the operated data are required to be packed into a vector register, mmvecH, mmvecL, mmvecK, mmvecA, mmvecB and mmvecR are set as temporary variables of the vector register, the variables are different from common variables, a vector is stored inside, the assignment of the variables is actually equivalent to the process of packing the data into the vector register, and an array rand [ W ] with the length of W is distributed]The generated random number is stored.
8. The parallel random number generation method for the vectorization unit according to claim 7, wherein said step S50 is implemented by making n-1, and making mmvecH-seed to implement packing the whole array of seed into the vector register; let mmvecL be seedL to implement packing the entire seedL array into the vector register; packing a multiplier a in the LCG parameter into a vector register by letting mmvecK be a; order toVector multiplication for SIMD instructions; order toOrder toReissue to order An update value for calculating a next random number; let rand mmvecR to effect writing the value in the vector register back to array rand [ w ═ w]Middle, then the array rand [ w ]]The data stored in (1) is the W random numbers calculated this time.
9. The vectorization-unit-oriented parallel random number generation method according to claim 8, wherein the determination condition of step S60 is to make n-n +1 compare n with nThe size of (d); if it is notIf it is determined that all the random numbers are not generated, the process returns to step S50 to continue generating the remaining random numbers, otherwise, the process ends.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011212670.1A CN112328206A (en) | 2020-11-03 | 2020-11-03 | Parallel random number generation method for vectorization component |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011212670.1A CN112328206A (en) | 2020-11-03 | 2020-11-03 | Parallel random number generation method for vectorization component |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112328206A true CN112328206A (en) | 2021-02-05 |
Family
ID=74323378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011212670.1A Pending CN112328206A (en) | 2020-11-03 | 2020-11-03 | Parallel random number generation method for vectorization component |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112328206A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022198652A1 (en) * | 2021-03-26 | 2022-09-29 | 华为技术有限公司 | Random number generation apparatus and method, random number generation system, and chip |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011076962A1 (en) * | 2009-12-24 | 2011-06-30 | Telefonica, S.A. | Method and system for generating unpredictable pseudo-random numbers |
CN103412738A (en) * | 2013-07-08 | 2013-11-27 | 中国航空无线电电子研究所 | Pseudorandom sequence generator based on single-step iteration generator polynomial and implement method thereof |
CN108139889A (en) * | 2015-10-12 | 2018-06-08 | 甲骨文国际公司 | Pseudo-random number sequence is generated by the non-linear mixing of multiple auxiliary pseudo-random number generator |
-
2020
- 2020-11-03 CN CN202011212670.1A patent/CN112328206A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011076962A1 (en) * | 2009-12-24 | 2011-06-30 | Telefonica, S.A. | Method and system for generating unpredictable pseudo-random numbers |
CN103412738A (en) * | 2013-07-08 | 2013-11-27 | 中国航空无线电电子研究所 | Pseudorandom sequence generator based on single-step iteration generator polynomial and implement method thereof |
CN108139889A (en) * | 2015-10-12 | 2018-06-08 | 甲骨文国际公司 | Pseudo-random number sequence is generated by the non-linear mixing of multiple auxiliary pseudo-random number generator |
Non-Patent Citations (1)
Title |
---|
边利亚;叶飞跃;: "产品防伪中伪随机数的应用技术研究", 计算机工程与设计, no. 02 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022198652A1 (en) * | 2021-03-26 | 2022-09-29 | 华为技术有限公司 | Random number generation apparatus and method, random number generation system, and chip |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Saito et al. | Variants of Mersenne twister suitable for graphic processors | |
Thomas et al. | A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation | |
US8756264B2 (en) | Parallel pseudorandom number generation | |
He et al. | GPU-accelerated parallel sparse LU factorization method for fast circuit analysis | |
US10067910B2 (en) | System and method for GPU maximum register count optimization applied to general matrix-matrix multiplication | |
US8433883B2 (en) | Inclusive “OR” bit matrix compare resolution of vector update conflict masks | |
US10768898B2 (en) | Efficient modulo calculation | |
US10102043B2 (en) | Method and system for mapping an integral into a thread of a parallel architecture | |
Lai et al. | Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs | |
Shah et al. | A novel implementation of 2D3V particle-in-cell (PIC) algorithm for Kepler GPU architecture | |
Aranha et al. | High-speed parallel software implementation of the η T pairing | |
CN112328206A (en) | Parallel random number generation method for vectorization component | |
Bos et al. | ECC2K-130 on cell CPUs | |
Maleki et al. | Automatic hierarchical parallelization of linear recurrences | |
Dieguez et al. | New tridiagonal systems solvers on GPU architectures | |
WO2009042106A2 (en) | Shift-add mechanism | |
US11409500B2 (en) | Performing constant modulo arithmetic | |
US6374343B1 (en) | Array indexing with sequential address generator for a multi-dimensional array having fixed address indices | |
Emmart et al. | High precision integer multiplication with a GPU | |
JP2022101472A (en) | Systems and methods for low latency modular multiplication | |
Atanassov et al. | Tuning the generation of Sobol sequence with Owen scrambling | |
Furgailo et al. | Research of techniques to improve the performance of explicit numerical methods on the cpu | |
Emeliyanenko | Computing resultants on Graphics Processing Units: Towards GPU-accelerated computer algebra | |
CN110457008A (en) | M-sequence generation method, device and storage medium | |
KR102442943B1 (en) | In-memory stochastic rounder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |