CN112328206A

CN112328206A - Parallel random number generation method for vectorization component

Info

Publication number: CN112328206A
Application number: CN202011212670.1A
Authority: CN
Inventors: 刘锋; 侯晓东; 朱肖雄
Original assignee: Guangzhou Keze Yuntian Intelligent Technology Co ltd
Current assignee: Guangzhou Keze Yuntian Intelligent Technology Co ltd
Priority date: 2020-11-03
Filing date: 2020-11-03
Publication date: 2021-02-05

Abstract

The invention discloses a vectorization component-oriented parallel random number generation method, which comprises the following steps: step S10, generating a jump formula according to a linear congruence equation; step S20, determining LCG method parameters and initial values, vectorization width and the total number of pseudo-random numbers to be generated; step S30, distributing array space according to vectorization width; step S40, generating pseudo-random number subsequence seeds according to a jump formula iteration; step S50, generating random numbers according to LCG method; in step S60, it is determined whether all random numbers have been generated, and if not, the process returns to step S50, otherwise, the process ends. The invention can generate a plurality of pseudo random numbers simultaneously by executing the LCG method once, generates all random numbers by utilizing the vectorization part and the SIMD instruction iteration after inputting the initial value and the LCG method parameters, can generate a plurality of random numbers in parallel in each execution, and greatly improves the generation speed.

Description

Parallel random number generation method for vectorization component

Technical Field

The invention relates to the technical field of data processing, in particular to a parallel random number generation method for a vectorization component.

Background

With the increasing of the manufacturing process of microprocessors, it has become an important trend to integrate vectorization units supporting double-precision floating-point operations in commercial microprocessor chips to accelerate the floating-point operation capability. At present, vectorization components such as an Intel's MMX/SSE/AVX and a VPU in an MIC coprocessor are integrated on most commercial microprocessors, and the vectorization components operate vectors by using SIMD instructions, and one vector is composed of a plurality of floating point data, so that a single instruction operates a plurality of floating point data at the same time, and the calculation process of the microprocessor is accelerated.

At present, the pseudo-random number generator has important application in various scientific calculation programs, for example, in various Monte Carlo simulation programs, the rapid generation of high-quality pseudo-random numbers is the key for influencing the operation of the programs, and the program is vectorized by utilizing a vectorization component, so that the pseudo-random number generation process can be effectively accelerated. However, pseudo-random numbers are generally generated by using a mathematical method, and a digital sequence is sequentially generated from an initial value according to an iterative equation, and the digital sequence is called a pseudo-random number sequence, and in a patent with the patent number "CN 201110347722.0", which is named as a method and a device for generating pseudo-random number seeds and pseudo-random numbers, a pseudo-random number seed generation method and a pseudo-random number generation method are disclosed, which can generate pseudo-random numbers, but the method is generated serially, i.e., only one pseudo-random number can be generated at a time, the generation speed is slow, and the method may become a performance bottleneck of some programs sensitive to the pseudo-random number generation speed.

In summary, the current pseudo random number generation method is serial generation, only one pseudo random number can be generated in each execution, parallelism and execution speed of a main program are affected, and a vectorization component in a microprocessor provides vectorization support for an algorithm to generate a plurality of pseudo random numbers at one time from hardware; therefore, it is necessary to provide a parallel random number generation method for vectorization units, so as to generate multiple pseudo random numbers at a time.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a parallel random number generation method for a vectorization component.

The technical scheme of the invention is as follows:

a vectorized component-oriented parallel random number generation method, comprising the steps of:

step S10, generating a jump formula according to a linear congruence equation;

step S20, determining LCG method parameters and initial values, vectorization width and the total number of pseudo-random numbers to be generated;

step S30, distributing array space according to vectorization width;

step S40, generating pseudo-random number subsequence seeds according to a jump formula iteration;

step S50, generating random numbers according to LCG method;

and step S60, judging whether all random numbers are generated, wherein if not, jumping back to the step S50, otherwise, ending.

Further, the linear congruence equation in the step S10 is x_i+1＝(ax_i) modM in which x_iIs an initial value, a is a multiplier, and M is a modulus, then the step S10 is realized by letting x be a multiplier_i+2＝(ax_i+1) mod M, let x_i+3＝(ax_i+2) mod M, and let x_i+3＝a(a(ax_imodM) and derives x from the formula x (ymodm) modM ═ xyymodm_i+3＝a³x_imod M, so that by fitting the formula x_i+3＝a³x_im iterations of modM derive the hopping formula as x_i+m＝a^mx_imodM。

Further, the LCG method parameters in step S20 are a multiplier a and a modulus M, and an initial value x of the LCG method is set_iIs x₀The vectorization width is W, the total number of generated pseudo random numbers is N, the precision of the pseudo random numbers is J bits, and the number of the pseudo random numbers actually generated by each vector unit is W

Further, the array space in step S30 includes a high precision array seed h and a low precision array seed l allocated with the vectorization width as the length.

Further, the step S40 is according to the jump formula x_i+m＝a^mx_iThe modM jumps m steps in sequence to generate W pseudo-random number subsequence seeds which are respectively stored in a high-precision array seed H and a low-precision array seed L, wherein m is the number of steps jumping forwards

a is a known multiplier, x_iIs a known initial value x₀M is a known modulus and all data are high

Bit and low

Bit sorted storage, assigning a high precision data set seed of length W for storing a high of pseudo-random number subsequence seeds

Bit data, allocating a high precision data set seed L of length W for storing the pseudo random number subsequence seed

Bit data, in parallel with operator

Is taken as high as one number

Bit, operator

Is taken as one number low

A bit.

Further, the step S40 is realized by making i-1, t-1,

low of stored multiplier a

The bit data is a bit data,

store multiplier a high

The bit data is a bit data,

storing a high of a pseudorandom number subsequence seed

The bit data is a bit data,

storing pseudorandom number subsequence seed

Bit data, and let t be t +1, the first pseudorandom number subsequence seed is x₀The first pseudorandom number subsequence seed being stored in

Or

Reissue to order

Let i equal i +1, if i is less than m, jump back

To complete solution a^mIf i is greater than or equal to m, let tmp1 be a_l×seedL[t-1]Obtaining the previous seed [ t-1 ] of pseudo-random number subsequence](ii) a Reissue to order

Let t be t +1, if t is smaller than W, jump back to tmp1 be a_l×seedL[t-1]Otherwise, all pseudorandom number subsequence seeds have been generated and the next action is performed.

Further, the step S50 generates random numbers by using an LCG method including a linear congruence equation x according to the pseudo random number subsequence seed generated by the step S40_i+1＝(ax_i) modM, where the operator ═ x ++, and,

All corresponding SIMD instructions are adopted, the operated data are required to be packed into a vector register, mmvecH, mmvecL, mmvecK, mmvecA, mmvecB and mmvecR are set as temporary variables of the vector register, the variables are different from common variables, a vector is stored inside, the assignment of the variables is actually equivalent to the process of packing the data into the vector register, and an array rand [ W ] with the length of W is distributed]The generated random number is stored.

Further, the step S50 is implemented by setting n to 1, and setting mmvecH to seed h to implement packing the entire seed h array into the vector register; let mmvecL be seedL to implement packing the entire seedL array into the vector register; packing a multiplier a in the LCG parameter into a vector register by letting mmvecK be a; order to

Vector multiplication for SIMD instructions; order to

Order to

Reissue to order

An update value for calculating a next random number; let rand mmvecR to effect writing the value in the vector register back to array rand [ w ═ w]Middle, then the array rand [ w ]]The data stored in (1) is the W random numbers calculated this time.

Further, the determination condition in step S60 is to make n equal to n +1 and then compare n with n

The size of (d); if it is not

If it is determined that all the random numbers are not generated, the process returns to step S50 to continue generating the remaining random numbers, otherwise, the process ends.

By adopting the scheme, the invention has the following beneficial effects:

the invention can generate a plurality of pseudo random numbers simultaneously by executing the LCG method once, generates all random numbers by utilizing the vectorization part and the SIMD instruction iteration after inputting the initial value and the LCG method parameters, replaces serial generation, can generate a plurality of random numbers in parallel in each execution and greatly improves the generation speed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a parallel random number generation method for a vectorization component according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

The invention is described in detail below with reference to the figures and the specific embodiments.

Referring to fig. 1, the present invention provides a vectorization component-oriented parallel random number generation method, including the following steps:

step S10, generating a jump formula according to a linear congruence equation;

step S30, distributing array space according to vectorization width;

step S50, generating random numbers according to LCG method;

In this embodiment, the linear congruence equation in the step S10 is x_i+1＝(ax_i) modM in which x_iIs an initial value, a is a multiplier, and M is a modulus, then the implementation manner of step S10 is to make x_i+2＝(ax_i+1) mod M, let x_i+3＝(ax_i+2) mod M, and let x_i+3＝a(a(ax_imodM) and derives x from the formula x (ymodm) modM ═ xyymodm_i+3＝a³x_imod M, so that by fitting the formula x_i+3＝a³x_im iterations of modM derive the hopping formula as x_i+m＝a^mx_imodM。

In this embodiment, the LCG method parameters in step S20 are a multiplier a and a modulus M, and an initial value x of the LCG method is set_iIs x₀Vectorization width W, the total number of pseudo-random numbers generated is N, pseudoThe precision of the random number is J bits, the number of pseudo random numbers actually generated by each vector unit is

I.e. the range of pseudo-random number subsequences generated by the first vector unit is

The second vector unit generates a range of pseudo-random number subsequences of

The third and subsequent vector units generate a range of pseudo-random number subsequences and so on.

In the present embodiment, the array space in step S30 includes a high precision array seed h and a low precision array seed l allocated with the vectorization width as the length.

In the present embodiment, the step S40 is performed according to the jump formula x_i+m＝a^mx_iThe modM jumps m steps in sequence to generate W pseudo-random number subsequence seeds which are respectively stored in a high-precision array seed H and a low-precision array seed L, wherein m is the number of steps jumping forwards

a is a known multiplier, x_iIs a known initial value x₀M is a known modulus; in order to improve the calculation precision, all data are high in the calculation process

Bit and low

Bit sorted storage, assigning a high precision tuple seed H of length W (i.e., vectorization width) for storing a high of pseudo-random number subsequence seeds

Bit data, a high-precision data set seed L with the length of W is allocated for storing the fakeLow of random number subsequence seed

Bit data, in parallel with operator

Is taken as high as one number

Bit, operator

Is taken as one number low

A bit;

specifically, the step S40 is realized by setting i equal to 1, t equal to 1,

low of stored multiplier a

The bit data is a bit data,

store multiplier a high

The bit data is a bit data,

storing a high of a pseudorandom number subsequence seed

The bit data is a bit data,

storing pseudorandom number subsequence seed

Or

Reissue to order

Let i equal i +1, if i is less than m, jump back

In this embodiment, the step S50 generates random numbers by using an LCG method including a linear congruence equation x according to the pseudo random number subsequence seed generated in the step S40_i+1＝(ax_i) modM, where the operator ═ x ++, and,

Corresponding SIMD instructions are all adopted, the operated data are packed into a vector register, mmvecH, mmvecL, mmvecK, mmvecA, mmvecB and mmvecR are set as temporary variables of the vector register,the variable is different from the common variable, the internal storage is a vector, the assignment of the variable is actually equivalent to the process of packing data into a vector register, and an array rand [ W ] with the length of W is distributed]Storing the generated random number;

specifically, the step S50 is implemented by setting n to 1, and setting mmvecH to seed h to implement packing the entire seed h array into the vector register; let mmvecL be seedL to implement packing the entire seedL array into the vector register; packing a multiplier a in the LCG parameter into a vector register by letting mmvecK be a; order to

Vector multiplication for SIMD instructions; order to

Order to

Reissue to order

In this embodiment, the determination condition of step S60 is to make n equal to n +1 and then compare n with n

The size of (d); if it is not

Compared with the prior art, the invention has the following beneficial effects:

The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A vectorized component-oriented parallel random number generation method, comprising the steps of:

step S10, generating a jump formula according to a linear congruence equation;

step S30, distributing array space according to vectorization width;

step S50, generating random numbers according to LCG method;

2. The vectorized component oriented parallel random number generation method according to claim 1, wherein the linear congruence equation in step S10 is x_i+1＝(ax_i) mod M, where x_iIs an initial value, a is a multiplier, and M is a modulus, then the step S10 is realized by letting x be a multiplier_i+2＝(ax_i+1) mod M, let x_i+3＝(ax_i+2) mod M, and let x_i+3＝a(a(ax_imod M) mod M, and derives x by the formula x (y mod M) mod M-xy mod M_i+3＝a³x_imod M, so that by matching the formula x_i+3＝a³x_iM iterations of mod M to derive the hopping formula as x_i+m＝a^mx_imod M。

3. The parallel random number generation method for vectorized components according to claim 2, wherein the LCG parameters in step S20 are a multiplier a and a modulus M, and an LCG initial value x is set_iIs x₀The vectorization width is W, the total number of generated pseudo random numbers is N, the precision of the pseudo random numbers is J bits, and the number of the pseudo random numbers actually generated by each vector unit is W

4. The parallel random number generation method for the vectorization unit according to claim 3, wherein the array space in step S30 includes a high precision array seed h and a low precision array seed l allocated with the vectorization width as a length.

5. The vectorized component oriented parallel random number generation method according to claim 4, wherein said step S40 is according to the jump formula x_i+m＝a^mx_iThe mod M jumps M steps in sequence to generate W pseudo-random number subsequence seeds which are respectively stored in a high-precision array seed H and a low-precision array seed L, wherein M is the number of steps jumping forwards

Bit and low

Bit data, in parallel with operator

Is taken as high as one number

Bit, operator

Is taken as one number low

A bit.

6. The vectorized component oriented parallel random number generation method according to claim 5, wherein said step S40 is implemented by letting i-1, t-1,

low of stored multiplier a

The bit data is a bit data,

store multiplier a high

The bit data is a bit data,

storing a high of a pseudorandom number subsequence seed

The bit data is a bit data,

storing pseudorandom number subsequence seed

Or

Reissue to order

Let i equal i +1, if i is less than m, jump back

7. The parallel random number generation method for the vectorized component of claim 6, wherein said step S50 generates the random number according to the sub-sequence seed of the pseudo random number generated in said step S40 by using the LCG method including the linear congruence equation x_i+1＝(ax_i) mod M, where the operator ═ x ++, and,

8. The parallel random number generation method for the vectorization unit according to claim 7, wherein said step S50 is implemented by making n-1, and making mmvecH-seed to implement packing the whole array of seed into the vector register; let mmvecL be seedL to implement packing the entire seedL array into the vector register; packing a multiplier a in the LCG parameter into a vector register by letting mmvecK be a; order to

Vector multiplication for SIMD instructions; order to

Order to

Reissue to order

9. The vectorization-unit-oriented parallel random number generation method according to claim 8, wherein the determination condition of step S60 is to make n-n +1 compare n with n

The size of (d); if it is not