CN110727413A - Monte Carlo parallel algorithm based on GPU (graphics processing Unit) programming model - Google Patents

Monte Carlo parallel algorithm based on GPU (graphics processing Unit) programming model Download PDF

Info

Publication number
CN110727413A
CN110727413A CN201910848788.4A CN201910848788A CN110727413A CN 110727413 A CN110727413 A CN 110727413A CN 201910848788 A CN201910848788 A CN 201910848788A CN 110727413 A CN110727413 A CN 110727413A
Authority
CN
China
Prior art keywords
random number
monte carlo
gpu
algorithm
parallel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910848788.4A
Other languages
Chinese (zh)
Inventor
宗慧
华任锋
赵建洋
洪龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN201910848788.4A priority Critical patent/CN110727413A/en
Publication of CN110727413A publication Critical patent/CN110727413A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/58Random or pseudo-random number generators
    • G06F7/588Random number generators, i.e. based on natural stochastic processes

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a Monte Carlo parallel algorithm based on a GPU programming model, which comprises the following steps: selecting a random number generator; adopting a simplest integral formula; parallelizing integral codes, quantizing huge calculation times, optimizing parallel threads according to the GPU model, and solving a pi value by using a Monte Carlo integral algorithm; solving the value of e by using a Monte Carlo integral algorithm; the invention aims to provide a Monte Carlo parallel algorithm based on a CUDA programming model in order to fully utilize the parallel computing capability of a GPU and reduce the numerical computation time. Selecting a high-quality Mersene Twister random number algorithm, taking pi and e as examples of a Monte Carlo integration algorithm, and finally enabling a calculated value of the constant pi to have 8-bit effective digits and e to have 7-bit effective digits through trillion-time point throwing operations.

Description

Monte Carlo parallel algorithm based on GPU (graphics processing Unit) programming model
Technical Field
The invention relates to the technical field of supervision algorithms and text classification, in particular to a Monte Carlo parallel algorithm based on a GPU programming model.
Background
S.M. Uram and J. von Neumann, 1945, propose a new probabilistic-based solution to the problem, which is known as Monte Carlo, the famous gambling city of Morna, to embody randomness. The calculation principle is mainly based on probability statistics, sampling statistics is carried out on random projection points, and then calculation and processing are carried out to obtain an estimation result. The development of electronic digital computers provides effective tools for such large-scale random experiments, so that the Monte Carlo method is widely applied to the fields of nuclear science, artificial intelligence, medicine, management science and the like. Recently reported AI algorithms such as AlphaGo-Zero and Alpha-Zero successfully search by adopting a Monte Carlo method; the Monte Carlo method has irreplaceable effects in numerical calculations of nuclear physics.
With the continuous advance of science and technology, parallel computing is applied to multiple fields, and particularly, the multi-core technology enables the public to research the parallel computing. Meanwhile, the computation performance of the GPU is also rapidly improved, NVIDIA corporation introduced a computing graphics card, which is dedicated to improving the speed and quality of computer graphics processing, and a parallel algorithm based on GPU design has become a research hotspot in recent years. The calculation capacity of the GPU is greatly improved compared with that of the CPU, the GPU is provided with more calculation units (ALU) than the CPU, and the CPU is suitable for multi-task management and scheduling due to the design, so that the interior of the CPU mainly comprises a controller and a buffer, and only a small part of the CPU is the ALU; the GPU is dedicated to data processing, so its interior is mainly composed of "ALU", which is shown in fig. 1; at present, the technical development of a CPU chip is slower than that of Moore's law, an Inter company develops towards the direction of low power consumption, and the technical development of a GPU is far beyond the Moore's law, so that the GPU has a very good calculation prospect; secondly, the GPU has remarkable computing capability and computing speed, and is very suitable for computing a single instruction multiple data Set (SIMD) of random sampling, so that the speed of solving the problem by the Monte Carlo method in the GPU can be improved greatly compared with the speed of solving the problem by the CPU.
Disclosure of Invention
The invention aims to provide a Monte Carlo parallel algorithm based on a CUDA programming model in order to fully utilize the parallel computing capability of a GPU and reduce the numerical computation time. Selecting a high-quality Mersene Twister random number algorithm, taking pi and e as examples of a Monte Carlo integration algorithm, and finally enabling a calculated value of the constant pi to have 8-bit effective digits and e to have 7-bit effective digits through trillion-time dotting operations.
The invention is realized by the following technical scheme:
a Monte Carlo parallel algorithm based on a GPU programming model comprises the following steps:
step 1, selecting a random number generator;
step 2, adopting a simplest integral formula: the Monte Carlo method is adopted to quantify the area of a certain area in an equal probability casting manner, namely the number of casting points falling in the area can represent the size of the area, then the number of casting points is used as a medium, the size of a target value is obtained through a formula, the area with a smaller area is selected as a statistical area, and the calculation time can be reduced;
step 3, parallelizing integral codes, quantizing huge calculation times, optimizing parallel threads according to the GPU model, and solving a pi value by using a Monte Carlo integral algorithm;
and 4, solving the value of e by using a Monte Carlo integral algorithm.
The invention further adopts the technical improvement scheme that:
the step 1 adopts an MT19937 random number generator, and the process is as follows: obtaining a random number seed; adding random number seeds to the MT19937 algorithm; setting random number type and range; random number generation is performed.
The invention further adopts the technical improvement scheme that:
in the step 3, a MonteCarlo parallel integration algorithm for calculating pi based on CUDA is adopted, firstly two arrays are used for receiving random numbers generated by using an MT19937 random number generator, then a video memory is allocated to the array of the DEVICE end, the array of the stored random numbers is copied and transmitted to the array of the DEVICE end, a kernel function is called for calculation, and finally the number of points meeting the conditions on each returned thread is summed and the result value is calculated.
The invention further adopts the technical improvement scheme that:
in the step 3, a computational method for optimizing GPU threads and realizing e is adopted, a MonteCarlo parallel integration algorithm based on CUDA computing e is adopted, and generation is carried out at a HOST terminal according to a random number generation mode, and then the generated random number is transmitted to a DEVICE terminal for use judgment; the random number generator selects an MT19937 algorithm, then allocates a video memory for the array of the DEVICE end, copies the array for storing the random number and transmits the array to the array of the DEVICE end for use, then calls a CUDA kernel function for calculation, and finally sums the point numbers which meet the conditions on each thread and calculates the result value.
The invention further adopts the technical improvement scheme that:
step 1 adopts an MTGP32 random number generator, transmits the seed and the data set to the kernel function at the host end, and then the device end uses the cruond function to generate the random number, the process is as follows: setting a host side, reformatting a predefined parameter set into a kernel format, and copying kernel parameters into an equipment memory; initializing the state of each thread block and storing the state into an array; calling a cruand function by the equipment terminal, and generating random numbers according to the seed value of each thread; since the MTGP32 random number generator can only generate unsigned integer data, the computation will use unsigned integer data in solving for pi and e.
The invention further adopts the technical improvement scheme that:
in the step 3, a MonteCarlo parallel integration algorithm for calculating pi based on CUDA is adopted, firstly, a random number seed value required to be used at a DEVICE end, an array for storing a throw point number and a judgment condition value are declared for an MTGP32 random number generator at a HOST end, then a kernel function is called for calculation, and a random number is generated in each calculation of each thread; and finally summing the number of the points which meet the conditions on each returned thread and calculating a result value.
The invention further adopts the technical improvement scheme that:
in the step 3, a GPU thread is optimized and a calculation method of e is realized, a MonteCarlo parallel integration algorithm based on CUDA calculation e is adopted, and the calculation method is generated in a random number generation mode, namely directly at a DEVICE end, and then generated before condition judgment, and covered by a new random number after the condition judgment; the random number generator selects MTGP32 algorithm, calls CUDA kernel function in HOST for calculation, and finally sums the number of returned qualified points on each thread and calculates the result value.
Compared with the prior art, the invention has the following obvious advantages:
the method is suitable for improving the efficiency of the Monte Carlo integral algorithm and improving the calculation precision; the computation time of the Monte Carlo integral algorithm is multiplied along with the improvement of the computation precision requirement, and in order to improve the computation performance, the Monte Carlo parallel algorithm based on the GPU programming model is designed.
Secondly, selecting a high-quality Mersene Twister random number algorithm, respectively generating random numbers at a host end and an equipment end according to the characteristics of GPU programming, and applying the random numbers to the design and implementation of a parallel algorithm;
the Monte Carle integral algorithm based on the GPU can quickly realize a large amount of point-of-operation by using the GPU to obtain more accurate pi and e values, can quickly finish huge calculated amount to obtain a result with higher precision, and has extremely high reference significance for other similar scientific calculation tasks
Drawings
FIG. 1 is a preliminary stage of algorithm design for selecting a better quality random number generator by comparing the quality of different random number generators;
FIG. 2 illustrates the present invention in which the designed integral formula is coded, and the formula code is the simplest, and the fastest speed is obtained;
FIG. 3 is a diagram illustrating parallelization of integral codes and quantization of calculated quantities, optimization of threads according to a GPU, and then experimental calculation to obtain a pi value;
FIG. 4 is a diagram illustrating parallelization of the integral code and quantization of the calculated amount, optimization of the thread according to the GPU, and experimental calculation to obtain the value of e.
Detailed Description
The technical scheme of the invention is further described by combining the attached drawings 1-4:
the invention comprises the following steps:
step 1, selecting a random number generator;
step 2, adopting a simplest integral formula: the Monte Carlo method is adopted to quantify the area of a certain area in an equal probability casting manner, namely the number of casting points falling in the area can represent the size of the area, then the number of casting points is used as a medium, the size of a target value is obtained through a formula, the area with a smaller area is selected as a statistical area, and the calculation time can be reduced;
step 3, parallelizing integral codes, quantizing huge calculation times, optimizing parallel threads according to the GPU model, and solving a pi value by using a Monte Carlo integral algorithm;
step 4, solving the value of e by using Monte Carlo integral algorithm
The first embodiment,
Step 1, in order to obtain a good-quality result, firstly, a proper random number generation method is determined, and a large number of random points are projected on the basis.
The MT19937 random number generator is suitable for random number generation at a host end, and the C language integrates an MT19937 function library, so that the random number generation can be carried out only by calling a method without setting parameters for the random number one by one, and the short codes are as follows:
random seed// obtaining Random number seed
MT19937 gen (seed ()// random number seed was added to MT19937 algorithm
unique _ int _ distribution < int > rand ()// random number type and range are set
rand (gen)// perform random number generation.
And 2, taking pi as an example, solving by using a Monte Carlo integral algorithm to obtain a relatively accurate value, drawing an inscribed circle in a square with the side length of R, and then randomly casting points in the square, wherein each casting point is random, so that each casting point is equal in probability, and assuming that the square can be covered by 1 hundred million casting points, the number of the casting points in the circle is more than the total number of the casting points, and the area of the circle is approximate. Serial and parallel experiments were performed based on this method.
Firstly, serial realization is carried out at a host end, a problem is found when codes are embodied, and when the projection statistics is carried out, the point of which the number of projection points falls into an shadow area or the point of which the number of projection points falls out of 1/4 circular shadows is selected; since both approaches are feasible, both statistical approaches are specifically coded:
when the generated random numbers are used for simulating a projection point, two random numbers are required to be generated at the same time and are respectively used as an x coordinate point and a y coordinate point for calculation; the distance between the coordinate (x, y) and the origin of coordinates (0,0) is calculated only because the statistics falls into the shadow area, if the distance between the randomly generated coordinate point and the origin of coordinates is smaller than the radius R, the condition is met, otherwise, the condition is not met, the randomly generated coordinate point is only allowed to exist in the first quadrant, and the x, y coordinate point can not be larger than the radius R. The serial implementation uses only the Rand function since it is for code optimization below.
Counting the number of the projection points in the shadow area:
simulating the times of manual casting by using a for cycle, respectively generating an x coordinate and a y coordinate by using a rand function, then judging the distance between the coordinate point and the origin of the coordinate to carry out statistics, and finally multiplying the number of points meeting the conditions by the total casting number by 4 to obtain a pi value. The implementation code is as follows:
n represents the total number of points cast, and M is the number of points falling within the shadow area.
Figure BDA0002196211830000061
The implementation code for counting the projection points outside the shadow area is as follows:
Figure BDA0002196211830000062
Figure BDA0002196211830000071
firstly, two arrays are used for receiving random numbers generated by an MT19937 random number generator, then a display memory is allocated to the array of the DEVICE end, the array for storing the random numbers is copied and transmitted to the array of the DEVICE end, a kernel function is called for calculation, finally, the number of points which meet the conditions on each returned thread is summed, and a result value is calculated, and the implementation code is as follows:
Figure BDA0002196211830000072
Figure BDA0002196211830000081
and 3, taking e as an example, solving by using a Monte Carlo integral algorithm to obtain a relatively accurate value, and then performing serial and parallel experiments.
Firstly, serial realization is carried out at a CPU end, when a code taking pi as an example is embodied, the problem that whether the number of points falling into a shadow or the number of points falling out of the shadow is selected and counted when the point throwing statistics is carried out is found, and because the mode that the statistics fall out of the shadow is proved to be faster through tests, the statistical mode is directly selected to carry out the specific coding:
the implementation is performed using an MT19937 random number generator, and the implementation code is as follows:
Figure BDA0002196211830000082
a Monte Carlo parallel integration algorithm implementation one based on CUDA calculation e is that generation is carried out at a HOST terminal according to a first random number generation mode, and then the generated random number is transmitted to a DEVICE terminal for use judgment; selecting an MT19937 algorithm by the random number generator, then distributing a video memory for the array of the DEVICE end, copying and transmitting the array for storing the random number to the array of the DEVICE end for use, calling a CUDA kernel function for calculation, and finally summing the point numbers which meet the conditions on each returned thread and calculating a result value, wherein the code is as follows:
Figure BDA0002196211830000091
example II,
Step 1, in order to obtain a good-quality result, firstly, a proper random number generation method is determined, and a large number of random points are projected on the basis.
The MTGP32 random number generator is called Mersene Twister for Graphic Processors, and is one of special random number generators based on a Graphic processor; the MTGP32 function library is also integrated in the cruond function library, so that the seed and the data set only need to be transmitted to the kernel function at the host side, and then the device side can use the cruond function to generate the random number, and the short code is as follows:
setting a host side, reformatting a predefined parameter set into a kernel format, and copying kernel parameters into a device memory:
cudaMalloc((void**)&devKernelParams,sizeof(mtgp32_kernel_params))
initializing the state of each thread block, and storing the state into an array:
curandMakeMTGP32KernelState(devMTGPStates,mtgp32dc_params_fast_11213,devKernelParams,BLOCK,seed())
the equipment terminal calls a cruand function, and random number generation is carried out according to the seed value of each thread:
curand(&state[Id])
since MTGP32 random number generators can only generate unsigned integer data (unsigned int: 0 ~ 4294967295), the computation will use unsigned integer data in solving for π and e.
And 2, taking pi as an example, solving by using a Monte Carlo integral algorithm to obtain a relatively accurate value, drawing an inscribed circle in a square with the side length of R, and then randomly casting points in the square, wherein each casting point is random, so that each casting point is equal in probability, and assuming that the square can be covered by 1 hundred million casting points, the number of the casting points in the circle is more than the total number of the casting points, and the area of the circle is approximate. Serial and parallel experiments were performed based on this method.
Firstly, serial realization is carried out at a host end, a problem is found when codes are embodied, and when the projection statistics is carried out, the point of which the number of projection points falls into an shadow area or the point of which the number of projection points falls out of 1/4 circular shadows is selected; since both approaches are feasible, both statistical approaches are specifically coded:
when the generated random numbers are used for simulating a projection point, two random numbers are required to be generated at the same time and are respectively used as an x coordinate point and a y coordinate point for calculation; the distance between the coordinate (x, y) and the origin of coordinates (0,0) is calculated only because the statistics falls into the shadow area, if the distance between the randomly generated coordinate point and the origin of coordinates is smaller than the radius R, the condition is met, otherwise, the condition is not met, the randomly generated coordinate point is only allowed to exist in the first quadrant, and the x, y coordinate point can not be larger than the radius R. The serial implementation uses only the Rand function since it is for code optimization below.
Counting the number of the projection points in the shadow area:
simulating the times of manual casting by using a for cycle, respectively generating an x coordinate and a y coordinate by using a rand function, then judging the distance between the coordinate point and the origin of the coordinate to carry out statistics, and finally multiplying the number of points meeting the conditions by the total casting number by 4 to obtain a pi value. The implementation code is as follows:
n represents the total number of points cast, and M is the number of points falling within the shadow area.
Figure BDA0002196211830000111
Figure BDA0002196211830000121
Firstly, declaring an array of random number seed values, storage throw points and judgment condition values which need to be used at a DEVICE end for an MTGP32 random number generator at a HOST end, then calling a kernel function for calculation, and generating a random number in each calculation of each thread; finally, summing the points which meet the conditions on each returned thread and calculating a result value, wherein the implementation codes are as follows:
Figure BDA0002196211830000122
Figure BDA0002196211830000131
and 3, taking e as an example, solving by using a Monte Carlo integral algorithm to obtain a relatively accurate value, and then performing serial and parallel experiments.
Firstly, serial realization is carried out at a CPU end, when a code taking pi as an example is embodied, the problem that whether the number of points falling into a shadow or the number of points falling out of the shadow is selected and counted when the point throwing statistics is carried out is found, and because the mode that the statistics fall out of the shadow is proved to be faster through tests, the statistical mode is directly selected to carry out the specific coding:
the implementation is performed using an MT19937 random number generator, and the implementation code is as follows:
Figure BDA0002196211830000132
Figure BDA0002196211830000141
a Monte Carlo parallel integration algorithm based on CUDA calculation e is realized in a second random number generation mode, namely, the generation is directly carried out at a DEVICE end, then the generation is carried out before condition judgment, and after the condition judgment, the new random number covers the DEVICE, so that a large video memory does not need to be consumed; the random number generator selects an MTGP32 algorithm, calls a CUDA kernel function in HOST for calculation, sums the number of returned points meeting the conditions on each thread and calculates a result value, and the realization code is as follows:
the technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that modifications and embellishments could be made by those skilled in the art without departing from the principle of the present invention, and these are also considered to be within the scope of the present invention.

Claims (7)

1. A Monte Carlo parallel algorithm based on a GPU programming model is characterized in that: the method comprises the following steps:
step 1, selecting a random number generator;
step 2, adopting a simplest integral formula: the Monte Carlo method is adopted to quantify the area of a certain area in an equal probability casting manner, namely the number of casting points falling in the area can represent the size of the area, then the number of casting points is used as a medium, the size of a target value is obtained through a formula, and the area with a smaller area is selected as a statistical area, so that the calculation time can be reduced;
step 3, parallelizing integral codes, quantizing huge calculation times, optimizing parallel threads according to the GPU model, and solving a pi value by using a Monte Carlo integral algorithm;
and 4, solving the value of e by using a Monte Carlo integral algorithm.
2. The Monte Carlo parallel algorithm based on GPU programming model according to claim 1, characterized in that: the step 1 adopts an MT19937 random number generator, and the process is as follows: obtaining a random number seed; adding random number seeds to the MT19937 algorithm; setting random number type and range; random number generation is performed.
3. A Monte Carlo parallel algorithm based on GPU programming model according to claim 2, wherein: in the step 3, a Monte Carlo parallel integration algorithm for calculating pi based on CUDA is adopted, firstly two arrays are used for receiving random numbers generated by using an MT19937 random number generator, then a video memory is allocated to the array of the DEVICE end, the array for storing the random numbers is copied and transmitted to the array of the DEVICE end, a kernel function is called for calculation, and finally points meeting the conditions on each returned thread are summed and a result value is calculated.
4. A Monte Carlo parallel algorithm based on GPU programming model according to claim 3, wherein: in the step 3, a computational method for optimizing GPU threads and realizing e is adopted, a Monte Carlo parallel integration algorithm based on CUDA computing e is adopted, generation is carried out at a HOST terminal according to a random number generation mode, and then the generated random number is transmitted to a DEVICE terminal for use judgment; the random number generator selects an MT19937 algorithm, then allocates a video memory for the array of the DEVICE end, copies the array for storing the random number and transmits the array to the array of the DEVICE end for use, then calls a CUDA kernel function for calculation, and finally sums the point numbers which meet the conditions on each thread and calculates the result value.
5. The Monte Carlo parallel algorithm based on GPU programming model according to claim 1, characterized in that: step 1 adopts an MTGP32 random number generator, transmits the seed and the data set to a kernel function at a host end, and then a device end generates a random number by using a cruond function, the process is as follows: setting a host side, reformatting a predefined parameter set into a kernel format, and copying kernel parameters into a device memory; initializing the state of each thread block and storing the state into an array; calling a cruand function by the equipment terminal, and generating random numbers according to the seed value of each thread; since the MTGP32 random number generator can only generate unsigned integer data, the computation will use unsigned integer data in solving for pi and e.
6. The Monte Carlo parallel algorithm based on GPU programming model according to claim 5, characterized in that: in the step 3, a calculation method for optimizing GPU threads and realizing pi is adopted, a Monte Carlo parallel integration algorithm for calculating pi based on CUDA is adopted, firstly, a random number seed value required to be used at a DEVICE end, an array for storing a throw point number and a judgment condition value are declared for an MTGP32 random number generator at a HOST end, then a kernel function is called for calculation, and a random number is generated in each calculation of each thread; and finally summing the number of the points which meet the conditions on each returned thread and calculating a result value.
7. The Monte Carlo parallel algorithm based on GPU programming model according to claim 6, characterized in that: in the step 3, a GPU thread is optimized and a calculation method of e is realized, a Monte Carlo parallel integration algorithm based on CUDA calculation e is adopted, and the calculation method is generated in a random number generation mode, namely directly at a DEVICE end, and then generated before condition judgment, and covered by a new random number after the condition judgment; and the random number generator selects an MTGP32 algorithm, calls a CUDA kernel function in HOST for calculation, and finally sums the number of points which meet the conditions on each thread and calculates the result value.
CN201910848788.4A 2019-09-09 2019-09-09 Monte Carlo parallel algorithm based on GPU (graphics processing Unit) programming model Pending CN110727413A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910848788.4A CN110727413A (en) 2019-09-09 2019-09-09 Monte Carlo parallel algorithm based on GPU (graphics processing Unit) programming model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910848788.4A CN110727413A (en) 2019-09-09 2019-09-09 Monte Carlo parallel algorithm based on GPU (graphics processing Unit) programming model

Publications (1)

Publication Number Publication Date
CN110727413A true CN110727413A (en) 2020-01-24

Family

ID=69217964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910848788.4A Pending CN110727413A (en) 2019-09-09 2019-09-09 Monte Carlo parallel algorithm based on GPU (graphics processing Unit) programming model

Country Status (1)

Country Link
CN (1) CN110727413A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317655A (en) * 2014-10-11 2015-01-28 华中科技大学 Cluster GPU acceleration-based multi-source full path Monte-Carlo simulation method
WO2017134512A1 (en) * 2016-02-03 2017-08-10 Universitat Rovira I Virgili A computer implemented method of generation of statistically uncorrelated molecule's conformations and computer programs
CN107038212A (en) * 2017-02-27 2017-08-11 中山大学 A kind of algorithm based on the Converse solved PageRank of monte carlo method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317655A (en) * 2014-10-11 2015-01-28 华中科技大学 Cluster GPU acceleration-based multi-source full path Monte-Carlo simulation method
WO2017134512A1 (en) * 2016-02-03 2017-08-10 Universitat Rovira I Virgili A computer implemented method of generation of statistically uncorrelated molecule's conformations and computer programs
CN107038212A (en) * 2017-02-27 2017-08-11 中山大学 A kind of algorithm based on the Converse solved PageRank of monte carlo method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张钦 等: "运用OpenMP和CUDA优化蒙特卡洛算法", 《安阳师范学院学报》 *
毕艳辉 等: "用随机试验法计算欧拉常数e", 《统计与管理》 *
贾永红 等: "《数字图像处理实习教程》", 30 November 2016, 武汉大学出版社 *

Similar Documents

Publication Publication Date Title
US20210089864A1 (en) Sparse convolutional neural network accelerator
Yin et al. A high energy efficient reconfigurable hybrid neural network processor for deep learning applications
Moons et al. Energy-efficient convnets through approximate computing
Jin et al. Split-cnn: Splitting window-based operations in convolutional neural networks for memory system optimization
Chacón et al. Thread-cooperative, bit-parallel computation of levenshtein distance on GPU
Halyo et al. GPU Enhancement of the Trigger to Extend Physics Reach at the LHC
Chang et al. A mixed-pruning based framework for embedded convolutional neural network acceleration
CN111967608A (en) Data processing method, device, equipment and storage medium
Liu et al. Time warp on the GPU: Design and assessment
Shao et al. Accelerating transfer entropy computation
CN105183562A (en) Method for conducting degree drawing on grid data on basis of CUDA technology
Kim et al. Las: locality-aware scheduling for GEMM-accelerated convolutions in GPUs
CN110727413A (en) Monte Carlo parallel algorithm based on GPU (graphics processing Unit) programming model
Calazan et al. Swarm grid: a proposal for high performance of parallel particle swarm optimization using GPGPU
Zhou et al. Sagitta: An energy-efficient sparse 3D-CNN accelerator for real-time 3D understanding
Daoudi et al. A Comparative study of parallel CPU/GPU implementations of the K-Means Algorithm
Zhou et al. A Parallel Scheme for Large‐scale Polygon Rasterization on CUDA‐enabled GPUs
Li et al. Paralleled fast search and find of density peaks clustering algorithm on GPUs with CUDA
Sousa et al. Tensor slicing and optimization for multicore NPUs
Nie et al. Adaptive sparse matrix-vector multiplication on CPU-GPU heterogeneous architecture
Yudanov et al. Scalable multi-precision simulation of spiking neural networks on GPU with OpenCL
Suciu et al. Statistical testing of random number sequences using CUDA
Hwang et al. Hardware Interrupt and CPU Contention aware CPU/GPU Co-Scheduling on Multi-Cluster System
Pirjan Optimization techniques for data sorting algorithms
Zong et al. Parallel monte carlo integration algorithm based on gpu

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200124