CN105843588A - MIC based random number generator segmented parallelizing method - Google Patents

MIC based random number generator segmented parallelizing method Download PDF

Info

Publication number
CN105843588A
CN105843588A CN201610150661.1A CN201610150661A CN105843588A CN 105843588 A CN105843588 A CN 105843588A CN 201610150661 A CN201610150661 A CN 201610150661A CN 105843588 A CN105843588 A CN 105843588A
Authority
CN
China
Prior art keywords
num
thread
random number
threads
mic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610150661.1A
Other languages
Chinese (zh)
Inventor
宋博文
周晓辉
张保东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Langchao Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Langchao Electronic Information Industry Co Ltd filed Critical Langchao Electronic Information Industry Co Ltd
Priority to CN201610150661.1A priority Critical patent/CN105843588A/en
Publication of CN105843588A publication Critical patent/CN105843588A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/58Random or pseudo-random number generators

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention discloses an MIC based random number generator segmented parallelizing method, so as to perform parallelizing by a method of segmenting a periodic sequence and splice random numbers generated by threads to form a final sequence. Compared with a CPU single thread, a speedup ratio under an MIC platform has a significant advantage.

Description

A kind of randomizer stagewise parallel method based on MIC
Technical field
The present invention relates to a kind of stagewise parallel method of general type randomizer, espespecially for Intel The hardware of MIC framework carries out the method for random number sequence output.
Background technology
Randomizer is used to produce the device of random number, is generally divided into real random number generator and pseudo random number is sent out Raw device.Scientific research at present and engineering simulation are growing, by parallel computation to performance and the rate requirement of randomizer Technology is applied to randomizer can quickly improve generation efficiency.Additionally the parallelization resarch work of randomizer is main Concentrate on multi-core central processing unit (Central Processing Unit, CPU) platform, lack based on Intel last word The correlation theory foundation of MIC (Many Integrated Core) platform parallelization and performance evaluation.
Intel MIC (Many Integrated Core) framework has less kernel and more hardware thread, with And broader vector units, it is to improve overall performance, meet the choosing of the ideal of highly-parallel application demand, it is based on x86 frame Structure, supports the parallel programming models such as OpenMP, pThread.Xeon Phi based on MIC framework calculates accelerator card by 57~61 Physical treatment core is constituted, and each physical core comprises 4 hardware threads, and accelerator card memory on board size is 6GB~8GB, double Accuracy computation peak computational ability reaches 1TFlops.It compares CPU advantageously in terms of parallel computation, solves the most also Row computational problem.
Summary of the invention
The problem that invention is to be solved
It is an object of the present invention to overcome deficiency of the prior art, it is provided that a kind of random number based on MIC occurs Device stagewise parallel method.
For solving the scheme of problem
A kind of randomizer stagewise parallel method based on MIC, comprises the following steps:
Step A, acquisition partiting step, including
Step A1, the cycle ρ of acquisition original random number sequence;
Step A2, obtain supported maximum thread N, every thread producible random number number
Step B, main thread read parameter, calculate each thread original state seed [id], including
Random number quantity random_num required for the reading of step B1, main thread, the Thread Count threads_num of distribution And seed seed, wherein 0 < random_num < ρ, 0 < threads_num < N;
Step B2, set up internal memory interval result;
Step B3, calculate each thread need produce random number num=random_num/threads_num;
Step B4, the value arranging id are 1;
Step B5, calculating seed [id];
If step B6 id > threads_num, then forward step C to;
Step B7, id value, from increasing 1, are then back to step B5;
Step C, MIC thread calculate, including
Step C1, the initial value arranging i are 0, and the initial value of U is 0;
Step C2, by random number serial algorithm calculate U;
Step C3, result [id*num+i]=U;
If step C4 i > num, then terminate this thread;
Step C5, i value, from increasing 1, are then back to step C3.
The effect of invention
The present invention utilizes the method to periodic sequence segmentation to carry out parallelization, and the random number finally generated by each thread is spelled Pick up and form final sequence.Single-threaded relative to CPU, the speed-up ratio under MIC platform has clear superiority.
Accompanying drawing explanation
Fig. 1 is randomizer stagewise parallelization schematic diagram;
Fig. 2 is the calculation flow chart realizing randomizer parallelization based on MIC;
Fig. 3 a and Fig. 3 b is that MRG32k3a is based on the speed-up ratio trendgram after CPU and MIC parallelization.
Detailed description of the invention
Various exemplary embodiments, feature and the aspect of the present invention is described in detail below with reference to embodiment.In order to more preferably The explanation present invention, detailed description of the invention below gives numerous details.Those skilled in the art should manage Solving, do not have these details, the present invention equally implements.In other example, for known method, hands Section, material are not described in detail, in order to highlight the purport of the present invention.
As it is shown in figure 1, the present invention utilizes the method to periodic sequence segmentation to carry out parallelization.First it is the former of ρ by the cycle Beginning random number sequence is divided equally into N section (N is maximum supported Thread Count), and every section comprisesNumber, the most each line Journey starts recursion generation random number from the starting point of each section.Assume the total amount that random_num is the random number needing generation, Threads_num is the Thread Count of distribution, and the most each thread produces random_num/threads_num random number.Finally will The random number that each thread generates is stitched together and forms final sequence.
Comprise the following steps:
Step A, acquisition partiting step, including
Step A1, the cycle ρ of acquisition original random number sequence;
Step A2, obtain supported maximum thread N, every thread producible random number number
Step B, main thread read parameter, calculate each thread original state seed [id], including
Random number quantity random_num required for the reading of step B1, main thread, the Thread Count threads_num of distribution And seed seed, wherein 0 < random_num < ρ, 0 < threads_num < N;
Step B2, set up internal memory interval result;
Step B3, calculate each thread need produce random number num=random_num/threads_num;
Step B4, the value arranging id are 1;
Step B5, calculating seed [id];
If step B6 id > threads_num, then forward step C to;
Step B7, id value, from increasing 1, are then back to step B5;
Step C, MIC thread calculate, including
Step C1, the initial value arranging i are 0, and the initial value of U is 0;
Step C2, by random number serial algorithm calculate U;
Step C3, result [id*num+i]=U;
If step C4 i > num, then terminate this thread;
Step C5, i value, from increasing 1, are then back to step C3.
In a kind of possible embodiment, the present invention comprises the following steps:
1. preparation
A) research randomizer serial algorithm, and obtain its cycle ρ;
B) cycle division rule is determined: supported Thread Count N, every thread producible random number number
2. main thread: read relevant parameter, calculate each thread original state seed [id]
A) random number quantity random_num, the Thread Count threads_num of distribution and the seed required for reading Seed, wherein 0 < random_num < ρ, 0 < threads_num < N;
B) internal memory interval result is set up;
C) the random number num=random_num/threads_num that each thread needs to produce is calculated;
D) id=1;
E) seed [id] is calculated;
If f) id > threads_num, then turn 3;
G) id++, turns e).
3.MIC thread: control MIC equipment by modes such as native, offload, each thread synchronization works, and carries out following Identical calculations, as in figure 2 it is shown,
A) i=0, U=0;
B) U is calculated by random number serial algorithm;
C) result [id*num+i]=U;
If d) i > num, then terminate;
E) i++, turns c).
In a kind of possible embodiment, the step of the present invention is
Control MIC equipment by native mode, use C++ to realize randomizer MRG32k3a stagewise parallelization As a example by, MRG32k3a recurrence formula is:
x 1 , n = ( a 1 , 2 x 1 , n - 2 + a 1 , 3 x 1 , n - 3 ) mod m 1 x 2 , n = ( a 2 , 1 x 2 , n - 1 + a 2 , 3 x 2 , n - 3 ) mod m 2
Wherein n >=3,
a1,2=1403580 a1,3=-810728
a2,1=527612 a2,3=-1370589
m1=232-209 m2=232-22853
Produce [0,1) between uniform random number Un
zn=(x1, n+x2, n)mod m1
U n = z n / m 1 z n > 0 ( m 1 - 1 ) / m 1 z n = 0
Its cycle ρ=2192, set and at most run N=264Individual thread, each thread at most produces L=2127Individual random number.
MRG32k3a recurrence formula is converted into vector form:
XI, n+1=AiXI, nmod mi, i=1,2
A 1 = 0 1 0 0 0 1 a 1 , 3 a 1 , 2 0 , A 2 = 0 1 0 0 0 1 a 2 , 3 0 a 2 , 1
Before each thread generates random number, need first to calculate the initial value of each thread, if the initial value of No. 0 thread is s0, then the initial value of Line 1 journey is s1, its computational methods are as follows:
According to formula (xyz) mod p=(((xy) mod p) z) mod p
It is easy to get:
s1=(ALS0) mod m=(((AL)mod m)·s0)mod m
Wherein, ALMod m can be calculated by algorithm of dividing and ruling, and recurrence formula is:
ALMod m=((AL/2mod m)·(AL/2mod m))mod m
Each thread original state s can be calculated by that analogyid, wherein 0≤id < threads_num.
In sum, MRG32k3a stagewise Parallel Algorithm is as follows:
Algorithm MRG32k3a stagewise Parallel Algorithm
Input: seed seed [6], Thread Count threads_num,
Task amount random_num
Output: random_hum random number U
Begin
(1) main thread
(2) MIC equipment, native pattern
Based on above-described embodiment, test data: respectively MRG32k3a is produced under 1,2,4,8 and 16 threads of CPU platform Raw 1,000,000,10,000,000,100,000,000,1,000,000,000 and 10,000,000,000 random number, and Produce 1,000,000 under MIC platform 1,2,4,8,16,32,56,112,168 and 224 thread thread respectively, 10,000,000, 100,000,000, the time of 1,000,000,000 and 10,000,000,000 randoms number tests.Relative to CPU single line Journey, the optimum speed-up ratio under MIC platform is 17.73.
Test result is as shown in Figure 3 a and Figure 3 b shows.
Wherein hardware environment is as follows:
The present invention utilizes the method to periodic sequence segmentation to carry out parallelization, is spliced by the random number that each thread generates Form final sequence.Single-threaded relative to CPU, the speed-up ratio under MIC platform has clear superiority.
Although illustrating the present invention with reference to embodiment of above, it will be appreciated, however, that the invention is not restricted to institute Disclosed embodiment.The scope of the appended claims should explain in broadest scope, with contain all modification, Equivalent structure and function.

Claims (1)

1. a randomizer stagewise parallel method based on MIC, it is characterised in that comprise the following steps:
Step A, acquisition partiting step, including
Step A1, the cycle ρ of acquisition original random number sequence;
Step A2, obtain supported maximum thread N, every thread producible random number number
Step B, main thread read parameter, calculate each thread original state 8eed [id], including
Step B1, main thread read required for random number quantity random_num, the Thread Count threads_num of distribution and Seed seed, wherein 0 < random_num < ρ, 0 < threads_num < N;
Step B2, set up internal memory interval result;
Step B3, calculate each thread need produce random number num=random_num/threads_num;
Step B4, the value arranging id are 1;
Step B5, calculating seed [id];
If step B6 id > threads_num, then forward step C to;
Step B7, id value, from increasing 1, are then back to step B5;
Step C, MIC thread calculate, including
Step C1, the initial value arranging i are 0, and the initial value of U is 0;
Step C2, by random number serial algorithm calculate U;
Step C3, result [id*num+i]=U;
If step C4 i > num, then terminate this thread;
Step C5, i value, from increasing 1, are then back to step C3.
CN201610150661.1A 2016-03-16 2016-03-16 MIC based random number generator segmented parallelizing method Pending CN105843588A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610150661.1A CN105843588A (en) 2016-03-16 2016-03-16 MIC based random number generator segmented parallelizing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610150661.1A CN105843588A (en) 2016-03-16 2016-03-16 MIC based random number generator segmented parallelizing method

Publications (1)

Publication Number Publication Date
CN105843588A true CN105843588A (en) 2016-08-10

Family

ID=56588013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610150661.1A Pending CN105843588A (en) 2016-03-16 2016-03-16 MIC based random number generator segmented parallelizing method

Country Status (1)

Country Link
CN (1) CN105843588A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648543A (en) * 2016-12-29 2017-05-10 北京握奇智能科技有限公司 Random number generation method and device
CN108984152A (en) * 2018-08-21 2018-12-11 北京睦合达信息技术股份有限公司 A kind of data processing method, system and computer readable storage medium
CN109521997A (en) * 2018-11-16 2019-03-26 中国人民解放军战略支援部队信息工程大学 The random digit generation method and device executed for shared storage multi-threaded parallel

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120105462A1 (en) * 2010-10-28 2012-05-03 Mizuho-Dl Financial Technology Co., Ltd. Parallelization of random number generation processing by employing gpu
CN103475469A (en) * 2013-09-10 2013-12-25 中国科学院数据与通信保护研究教育中心 Method and device for achieving SM2 algorithm with combination of CPU and GPU

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120105462A1 (en) * 2010-10-28 2012-05-03 Mizuho-Dl Financial Technology Co., Ltd. Parallelization of random number generation processing by employing gpu
CN103475469A (en) * 2013-09-10 2013-12-25 中国科学院数据与通信保护研究教育中心 Method and device for achieving SM2 algorithm with combination of CPU and GPU

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
宋博文 等: "基于MIC的MRG32k3a并行化设计与实现", 《计算机应用软件》 *
张保东 等: "基于超多核心平台的Knuth39 并行化实现及性能分析", 《计算机应用》 *
李智杰 等: "基于MIC的CLCG4并行化设计与实现", 《电子科技》 *
蔡晓龙: "基于MIC的RAN2并行化设计与实现", 《赤峰学院学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648543A (en) * 2016-12-29 2017-05-10 北京握奇智能科技有限公司 Random number generation method and device
CN106648543B (en) * 2016-12-29 2019-09-27 北京握奇智能科技有限公司 A kind of random digit generation method and device
CN108984152A (en) * 2018-08-21 2018-12-11 北京睦合达信息技术股份有限公司 A kind of data processing method, system and computer readable storage medium
CN108984152B (en) * 2018-08-21 2021-01-29 北京睦合达信息技术股份有限公司 Data processing method, system and computer readable storage medium
CN109521997A (en) * 2018-11-16 2019-03-26 中国人民解放军战略支援部队信息工程大学 The random digit generation method and device executed for shared storage multi-threaded parallel

Similar Documents

Publication Publication Date Title
US9099866B2 (en) Apparatus, methods and systems for parallel power flow calculation and power system simulation
Komura et al. Large-scale Monte Carlo simulation of two-dimensional classical XY model using multiple GPUs
CN104570081B (en) A kind of integration method pre-stack time migration Processing Seismic Data and system
Peng et al. GLU3. 0: Fast GPU-based parallel sparse LU factorization for circuit simulation
CN103617150A (en) GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system
CN105574809B (en) Electromagnetic transient simulation graphics processor parallel calculating method based on matrix exponetial
Gibson et al. Optimizing grouped convolutions on edge devices
CN105843588A (en) MIC based random number generator segmented parallelizing method
Louw et al. Using the Graphcore IPU for traditional HPC applications
CN102945224A (en) High-speed variable point FFT (Fast Fourier Transform) processor based on FPGA (Field-Programmable Gate Array) and processing method of high-speed variable point FFT processor
Singh et al. Corrections to Pauling residual entropy and single tetrahedron based approximations for the pyrochlore lattice Ising antiferromagnet
Scarf Testing for optimality in the absence of convexity
CN103064819A (en) Method for utilizing microwave integrated circuit (MIC) to rapidly achieve lattice Boltzmann parallel acceleration
Chang et al. SDCNN: An efficient sparse deconvolutional neural network accelerator on FPGA
Cádenas-Montes et al. Accelerating particle swarm algorithm with GPGPU
CN106569543A (en) Two-channel signal generator and output waveform synchronization method thereof
Jalili-Marandi et al. Large-scale transient stability simulation on graphics processing units
US20160259871A1 (en) Model generation method and information processing apparatus
CN104156268B (en) The load distribution of MapReduce and thread structure optimization method on a kind of GPU
CN101478159B (en) Transient stabilized constraint tide optimization process
CN103823969B (en) A kind of visualization construction method of Power System Steady-state model
CN105843587A (en) MIC-based salutatory parallelization method of random number generator
Cui et al. A multi-core high performance computing framework for probabilistic solutions of distribution systems
Zhao et al. GPU based parallel matrix exponential algorithm for large scale power system electromagnetic transient simulation
CN108304633A (en) Hydraulic Transient method for numerical simulation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160810