CN102999316A - Parallel implementation method of orthogonal tracking algorithm in GPU (Graphics Processing Unit) - Google Patents

Parallel implementation method of orthogonal tracking algorithm in GPU (Graphics Processing Unit) Download PDF

Info

Publication number
CN102999316A
CN102999316A CN2012104657992A CN201210465799A CN102999316A CN 102999316 A CN102999316 A CN 102999316A CN 2012104657992 A CN2012104657992 A CN 2012104657992A CN 201210465799 A CN201210465799 A CN 201210465799A CN 102999316 A CN102999316 A CN 102999316A
Authority
CN
China
Prior art keywords
matrix
gpu
original signal
observing matrix
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012104657992A
Other languages
Chinese (zh)
Inventor
张颢
陈帅
孟华东
王希勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN2012104657992A priority Critical patent/CN102999316A/en
Publication of CN102999316A publication Critical patent/CN102999316A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a parallel implementation method of an orthogonal tracking algorithm in a GPU (Graphics Processing Unit). The parallel implementation method specifically comprises the following steps of: S1. generating an observation matrix on the GPU; S2. repeatedly iterating by using the orthogonal tracking algorithm so as to estimate an original signal, calculating observation data corresponding to the original signal by using the observation matrix, and comparing the observation data with real observation data so as to judge whether the iteration operation is terminated or not; S3. calculating a row which has the maximum relevance with residual errors in the observation matrix, and complementing the row into a base matrix, wherein the base matrix is one part of the observation matrix; and S4. estimating non-zero elements of the original signal in the base matrix by using a least square method, updating the original signal, and continuing the step S2. By utilizing the method, the operation time of the orthogonal tracking algorithm can be shortened, and the purpose of improving the data processing efficiency and reducing the cost are achieved.

Description

The Parallel Implementation method of quadrature tracing algorithm on GPU
Technical field
The present invention relates to the signal processing technology field, be specifically related to the Parallel Implementation method of a kind of OMP (OrthogonalMatching Pursuit, quadrature tracing algorithm) on GPU.
Background technology
In recent years, compressed sensing (CS) theory obtains extensive concern, and it satisfies under the prerequisite of sparse property at signal, uses much smaller than the sample frequency of nyquist sampling rate data are sampled, and namely can recover original signal fully.Compressed sensing is illustrated as with following mathematic(al) representation:
For original signal x ∈ R N, by observing matrix Φ ∈ R M * N, obtain observation vector y ∈ R M:
y=Φx (1)
Wherein M<<N, among the x significantly element number be S, S<<N.The CS theoretical research be: known observation y, estimate to satisfy the sparse solution x of formula (1), namely find one
Figure BDA00002418200500011
Satisfy:
min | | x | | ~ 0 , s . t . y = Φ x ~ - - - ( 2 )
Wherein, ‖ ‖ 0Expression L 0Norm is namely calculated the nonzero element number.
At present, for the optimization problem of formula (2), proposed a series of derivation algorithm, comprised approximate L1 optimization, greedy algorithm, Focuss algorithm etc., these algorithms can both effectively recover sparse signal under special scenes.Yet the common feature of this class algorithm is that computation complexity is high, and when finding the solution large-scale data, traditional C PU serial implementation long operational time can't go out the original sparse signal by real-time recovery; Although and can realize quick calculating by mainframe computer or cluster, required cost is high, can not satisfy the demand that engineering is used.
In recent years, graphic process unit (Graphics Processing Unit, GPU) develops into multinuclear, the multithreading common application platform of a high-speed parallel, has very high cost performance in solution computation-intensive problem.The present invention attempts utilizing this platform of GPU to improve the execution speed of OMP algorithm.
Following Introduction of Literatures the main background technology in this field:
1.Tropp J A,Gilbert A C.Signal recovery from random measurementsvia orthogonal matching pursuit[J].IEEE Transactions on InformationTheory,2007,53(12):4655-4666.
In the document, proposed a kind of algorithm that zero Norm minimum is optimized of finding the solution based on greedy algorithm, this algorithm is with respect to less based on the approximate convex optimized algorithm computation complexity of a norm, and resolution is higher.With respect to traditional coupling track algorithm, rectangular projection has increased probability and the speed of convergence of successful recovery in each iterative process.
2.Sangkyun Lee S W.Implementing algorithms for signal and imagereconstruction on graphical processing units.Computer SciencesDepartment,University of Wisconsin-Madison,Tech.Rep.,November,2008.
In the document, the people such as the Sangkyun Lee of Wisconsin university have realized the SpaRSA algorithm of compressed sensing at the GPU platform.The SpaRSA algorithm is a kind of of convex optimized algorithm, and computation complexity is larger, even still to need to realize computing time of growing at the GPU platform.Simultaneously, the SpaRSA algorithm has the common shortcoming of protruding optimization class algorithm, is exactly to have higher secondary lobe.
3.Andrecut M.Fast GPU implementation of sparse signal recoveryfrom random projections[J].Engineering Letters.2009,17(3):151-158.
In this document, the people such as Andrecut of Calgary university have realized the GPU parallelization of match tracing (Matching Pursuit, MP) algorithm.The shortcoming that the method exists is exactly that the speed of convergence of MP algorithm itself is slow, and when its correlativity was larger, the probability of success recovery was little.
Summary of the invention
The technical matters that (one) will solve
The present invention mainly solves existing algorithm when finding the solution large-scale data, traditional C PU serial implementation long operational time, the technical matters that cost is high.
(2) technical scheme
For addressing the above problem, the invention provides the Parallel Implementation method of a kind of quadrature tracing algorithm on GPU, may further comprise the steps:
S1, generate observing matrix at GPU;
S2, use the quadrature tracing algorithm estimation original signal that iterates, utilize above-mentioned observing matrix to calculate observation data corresponding to described original signal, and compare with true observation data, judge whether to stop described iterative operation;
S3, calculate in the described observing matrix row with residual error correlativity maximum, it is added in the basis matrix, described basis matrix is the part of described observing matrix;
S4, utilize least square method in described basis matrix, to estimate the nonzero element of described original signal, upgrade original signal, continue step S2.
In step S1, described observing matrix is for taking out at random the matrix that row obtains to DCT discrete cosine transform matrix, and its element calculates according to following formula:
Φ ( m , n ) = 1 N , h ( m ) = 0,0 ≤ n ≤ N - 1 2 N cos π ( 2 n + 1 ) h ( m ) 2 N , 1 ≤ h ( m ) ≤ N - 1,0 ≤ n ≤ N - 1
Wherein, Φ is observing matrix, and m is line number, m ∈ (0,1,2 ..., N-1), n is columns, N is the length that the quadrature tracing algorithm is treated restoring signal, h is the pseudo-random number sequence that computing machine generates, and M element arranged in this sequence, and M is the observation number in the compressed sensing, and M<N.
Further, GPU is assigned to the generation task of described element in 64 threads, and wherein i thread is responsible for generating (Φ (i, 0), Φ (i, 1), Φ (i, 2), ..., Φ (i, N-1)), a plurality of threads are in the parallel generation of finishing observing matrix Φ of a plurality of processors.
In step S2, to be true observation data be lower than the appointed threshold value with the variance of the observation data of utilizing described observing matrix to calculate to the condition that described iteration stops, and is described as with mathematical formulae:
| | y - &Phi; x ^ k | | 2 < &epsiv; | | y | | 2
Wherein y is true observation data, and Φ is observing matrix,
Figure BDA00002418200500033
Be the original signal that estimates after k step iteration, ε is relative error, and it is relevant with observation noise, ‖ ‖ 2Two norms of expression vector.
Further, finish in that the multiplication of the above observing matrix of GPU platform and described original signal vector is parallel, each stream handle is responsible for delegation and the described original signal vector of described observing matrix and is done inner product, in single stream handle, multi-threaded parallel carries out multiply operation to the Partial Elements of described original signal vector, and the calculating of described two norms is carried out by a plurality of thread parallels.
In step S3, in the correlation process of GPU each row and residual error in calculating described observing matrix, each stream handle of GPU inside is carried out the correlativity of row and residual error, compares at last the result of every flow processor, and the row of correlativity maximum are expanded in the basis matrix.
In step S4, finish least-squares estimation by the cublasDger function that calls cublas.
(3) beneficial effect
The inventive method can shorten the working time of OMP algorithm, reaches the purpose that improves data-handling efficiency, reduces cost.
Description of drawings
Fig. 1 is the process flow diagram of the inventive method;
Fig. 2 is the process flow diagram of the embodiment of the invention;
Fig. 3 is matrix-vector multiplication two-stage Parallel Implementation principle schematic on GPU;
Fig. 4 is the computing time comparison diagram of OMP algorithm on GPU and CPU.
Embodiment
Below in conjunction with drawings and Examples, the specific embodiment of the present invention is described in further detail.Following examples are used for explanation the present invention, but are not used for limiting the scope of the invention.
Fig. 1 is the process flow diagram of the inventive method, the invention provides the Parallel Implementation method of a kind of quadrature tracing algorithm on GPU, may further comprise the steps:
S1, generate observing matrix at GPU;
S2, use the quadrature tracing algorithm estimation original signal that iterates, utilize above-mentioned observing matrix to calculate observation data corresponding to described original signal, and compare with true observation data, judge whether to stop described iterative operation;
S3, calculate in the described observing matrix row with residual error correlativity maximum, it is added in the basis matrix, described basis matrix is the part of described observing matrix;
S4, utilize least square method in described basis matrix, to estimate the nonzero element of described original signal, upgrade original signal, continue step S2.
In step S1, institute's observing matrix is for taking out at random the matrix that row obtains to DCT discrete cosine transform matrix, and its element calculates according to following formula:
&Phi; ( m , n ) = 1 N , h ( m ) = 0,0 &le; n &le; N - 1 2 N cos &pi; ( 2 n + 1 ) h ( m ) 2 N , 1 &le; h ( m ) &le; N - 1,0 &le; n &le; N - 1
Wherein, Φ is observing matrix, and m is line number, m ∈ (0,1,2 ..., N-1), n is columns, N is the length that quadrature hindcast method is treated restoring signal, h is the pseudo-random number sequence that computing machine generates, and M element arranged in this sequence, and M is the observation number in the compressed sensing, and M<N.
Further, GPU is assigned to the generation task of described element in 64 threads, and wherein i thread is responsible for generating (Φ (i, 0), Φ (i, 1), Φ (i, 2), Φ (i, N-1)), a plurality of threads are in the parallel generation of finishing observing matrix Φ of a plurality of processors.
In step S2, to be true observation data be lower than the appointed threshold value with the variance of the observation data of utilizing described observing matrix to calculate to the condition that described iteration stops, and is described as with mathematical formulae:
| | y - &Phi; x ^ k | | 2 < &epsiv; | | y | | 2
Wherein y is true observation data, and Φ is observing matrix,
Figure BDA00002418200500053
Be the original signal that estimates after k step iteration, ε is relative error, and it is relevant with observation noise, ‖ ‖ 2Two norms of expression vector.
Further, finish in that the multiplication of the above observing matrix of GPU platform and described original signal vector is parallel, each stream handle is responsible for delegation and the described original signal vector of described observing matrix and is done inner product, in single stream handle, multi-threaded parallel carries out multiply operation to the Partial Elements of described original signal vector, and the calculating of described two norms is carried out by a plurality of thread parallels.
In step S3, in the correlation process of GPU each row and residual error in calculating described observing matrix, each stream handle of GPU inside is carried out the correlativity of row and residual error, compares at last the result of every flow processor, and the row of correlativity maximum are expanded in the basis matrix.
In step S4, finish least-squares estimation by the cublasDger function that calls cublas.
Embodiment
Fig. 2 is the process flow diagram of the embodiment of the invention, may further comprise the steps:
Step S1: generate observing matrix Φ at GPU.The observing matrix that adopts among the present invention is the capable battle array of taking out at random of DCT (Discrete Cosine Transform, discrete cosine transform) matrix.Wherein, take out at random line operate and generated by computer simulation, determine to take out line position by producing a series of pseudo random number.According to the characteristics of DCT matrix, it is the pseudo-random sequence h=(h of M that computer simulation generates length 0, h 1, h 2..., h M-1), h i∈ (0,1,2 ..., N-1), it determines the randomness of stochastic sampling.Thereby, generate and owe fixed observing matrix Φ.Therefore the element of Φ calculates according to following formula:
&Phi; ( m , n ) = 1 N , h ( m ) = 0,0 &le; n &le; N - 1 2 N cos &pi; ( 2 n + 1 ) h ( m ) 2 N , 1 &le; h ( m ) &le; N - 1,0 &le; n &le; N - 1
Wherein N is the length that the OMP algorithm waits to recover vector, and M is the observation number in the OMP algorithm, M<N.For the convenient detection that realizes performance, generate at random sparse signal x, wherein the number of remarkable element is S among the x, S is defined as degree of rarefication in the compressed sensing problem, S<<N, significantly the amplitude of element generates at random.Calculate observation data y by Φ and x, be used for the recovery of OMP algorithm.
In the GPU implementation procedure, at first need to distribute at GPU the storage space of the M of Φ * N float type, the generation task is assigned in 64 threads, thread i is responsible for generating (Φ (i, 0), Φ (i, 1), Φ (i, 2) ..., Φ (i, N-1)), (Φ (i, 0), Φ (i, 1), Φ (i, 2) ..., Φ (i, N-1)) ..., a plurality of threads highly-parallel on a plurality of processors is finished the generation of observing matrix.Simultaneously, pseudo-random sequence h needs repeatedly access, and all is read-only operation, can by the characteristics of the GPU parallel programming model of Nvidia, h be stored as the constant storage unit.The IO access delay can effectively be reduced to the Access Optimization of constant storage unit in GPU inside, thereby reduces the overall operation time.
Step S2: check whether finishing iteration operates.Data are transferred to GPU from CPU, initialization data.Before carrying out the OMP algorithm steps, at first need to give observation data at GPU, intermediate variable storage allocation space, and observation data is transferred among the GPU.In the specific implementation, the interface cublasAlloc that calculates the storehouse by the cublas vector finishes the Memory Allocation to variable, finishes that by cublasSetVector observation data is transferred to the internal memory of GPU from CPU.
The OMP algorithm estimation original signal that need to iterate, the end condition of iteration be true observation with the observation of estimated signal calculating between poor energy be lower than a certain thresholding, be described as with mathematical formulae:
| | y - &Phi; x ^ k | | 2 < &epsiv; | | y | | 2
Wherein y is true observation,
Figure BDA00002418200500072
Be the restoration result after k step iteration, ε is relative error, ‖ ‖ 2Two norms of expression vector.
The multiplication highly-parallel of matrix and vector on the GPU platform, each stream handle is responsible for delegation and the vector of matrix and is done inner product, and in single stream handle, multi-threaded parallel carries out multiply operation to the Partial Elements of vector.The visible accompanying drawing 3 of concrete operations, the multiplication highly-parallel of matrix and vector on the GPU platform, each stream handle is responsible for delegation and the vector of matrix and is done inner product, and in single stream handle, multi-threaded parallel carries out multiply operation to the Partial Elements of vector.The calculating of two norms, a plurality of thread parallels are carried out, and each thread is finished square calculating of part vector, with this part vector summation, finishes at last the read group total of each several part.
According to the characteristics of CUDA model (Computing Unified Device Architecture, unified calculation framework model) the parallel granularity of secondary, the Parallel Implementation of matrix and vector multiplication is divided into coarse grain parallelism and fine grained parallel.At v=Φ TAmong the r, the execute vector multiplication is coarse grain parallelism between each row of matrix and the r, and (Thread Block) finishes by thread block, and thread block i is responsible for execution: v i=<φ i, r 〉, wherein<, two vector calculation inner products of expression.At v i=<φ i, r〉in the computation process element and element to carry out multiply operation be that multi-threaded parallel is finished, namely the thread j of thread block i is responsible for execution:
Figure BDA00002418200500073
T wherein iBe the intermediate result of calculating, T is each thread block center line number of passes.Guarantee that by calling syncthreads all threads in the same thread block are all complete, then calculate
Figure BDA00002418200500074
Namely finished the multiply operation of a matrix and vector.Because can be by shared drive communication, with the resource v of same thread block access between the thread in the same thread block i, t iBe stored in the shared drive and can effectively reduce access delay.
Step S3: with the row of residual error correlativity maximum, expand basis matrix in the parallel computation observing matrix.The OMP algorithm has been inherited the characteristics of greedy class algorithm, and in each iteration, the row with residual error correlativity maximum in the selection observing matrix add in the basis matrix.GPU is in each row of compute matrix and the correlation process of residual error, and each stream handle of GPU inside is carried out the correlativity of row and residual error, the last result of every flow processor relatively, with the row of correlativity maximum expand to support concentrated.Simultaneously, the index value of the row of record correlativity maximum, the vector that front k step index value consists of is v.
Step S4: utilize least square method to estimate the nonzero element of original signal at the basis matrix in k step.Finish least-squares estimation by the cublasDger function that calls cublas, obtain current estimated signal, continue step S2.
The inventive method can shorten the working time of OMP algorithm, reaches the purpose that improves data-handling efficiency, reduces cost.Fig. 4 is the computing time comparison diagram of OMP algorithm on GPU and CPU.
The above only is preferred implementation of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the technology of the present invention principle; can also make some improvement and replacement, these improvement and replacement also should be considered as protection scope of the present invention.

Claims (7)

1. the Parallel Implementation method of a quadrature tracing algorithm on GPU is characterized in that, may further comprise the steps:
S1, generate observing matrix at GPU;
S2, use the quadrature tracing algorithm estimation original signal that iterates, utilize above-mentioned observing matrix to calculate observation data corresponding to described original signal, and compare with true observation data, judge whether to stop described iterative operation;
S3, calculate in the described observing matrix row with residual error correlativity maximum, it is added in the basis matrix, described basis matrix is the part of described observing matrix;
S4, utilize least square method in described basis matrix, to estimate the nonzero element of described original signal, upgrade original signal, continue step S2.
2. the method for claim 1 is characterized in that, in step S1, described observing matrix is for taking out at random the matrix that row obtains to DCT discrete cosine transform matrix, and its element calculates according to following formula:
&Phi; ( m , n ) = 1 N , h ( m ) = 0,0 &le; n &le; N - 1 2 N cos &pi; ( 2 n + 1 ) h ( m ) 2 N , 1 &le; h ( m ) &le; N - 1,0 &le; n &le; N - 1
Wherein, Φ is observing matrix, and m is line number, m ∈ (0,1,2 ..., N-1), n is columns, N is the length that the quadrature tracing algorithm is treated restoring signal, h is the pseudo-random number sequence that computing machine generates, and M element arranged in this sequence, and M is the observation number in the compressed sensing, and M<N.
3. method as claimed in claim 2 is characterized in that, GPU is assigned to the generation task of described element in 64 threads, wherein i thread is responsible for generating (Φ (i, 0), Φ (i, 1), Φ (i, 2) ..., Φ (i, N-1)), a plurality of threads are in the parallel generation of finishing observing matrix Φ of a plurality of processors.
4. the method for claim 1 is characterized in that, in step S2, to be true observation data be lower than the appointed threshold value with the variance of the observation data of utilizing described observing matrix to calculate to the condition that described iteration stops, and is described as with mathematical formulae:
| | y - &Phi; x ^ k | | 2 < &epsiv; | | y | | 2
Wherein y is true observation data, and Φ is observing matrix,
Figure FDA00002418200400022
Be the original signal that estimates after k step iteration, ε is relative error, and it is relevant with observation noise, ‖ ‖ 2Two norms of expression vector.
5. method as claimed in claim 4, it is characterized in that, finish in that the multiplication of the above observing matrix of GPU platform and described original signal vector is parallel, each stream handle is responsible for delegation and the described original signal vector of described observing matrix and is done inner product, in single stream handle, multi-threaded parallel carries out multiply operation to the Partial Elements of described original signal vector, and the calculating of described two norms is carried out by a plurality of thread parallels.
6. the method for claim 1, it is characterized in that, in step S3, in the correlation process of GPU each row and residual error in calculating described observing matrix, each stream handle of GPU inside is carried out the correlativity of row and residual error, the last result of every flow processor relatively expands to the row of correlativity maximum in the basis matrix.
7. the method for claim 1 is characterized in that, in step S4, finishes least-squares estimation by the cublasDger function that calls cublas.
CN2012104657992A 2012-11-16 2012-11-16 Parallel implementation method of orthogonal tracking algorithm in GPU (Graphics Processing Unit) Pending CN102999316A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012104657992A CN102999316A (en) 2012-11-16 2012-11-16 Parallel implementation method of orthogonal tracking algorithm in GPU (Graphics Processing Unit)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012104657992A CN102999316A (en) 2012-11-16 2012-11-16 Parallel implementation method of orthogonal tracking algorithm in GPU (Graphics Processing Unit)

Publications (1)

Publication Number Publication Date
CN102999316A true CN102999316A (en) 2013-03-27

Family

ID=47927927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012104657992A Pending CN102999316A (en) 2012-11-16 2012-11-16 Parallel implementation method of orthogonal tracking algorithm in GPU (Graphics Processing Unit)

Country Status (1)

Country Link
CN (1) CN102999316A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104182209A (en) * 2014-08-27 2014-12-03 中国科学院软件研究所 PETSc-based GCRO-DR algorithm parallel processing method
CN117434511A (en) * 2023-12-13 2024-01-23 广东大湾区空天信息研究院 Multi-target angle disambiguation method based on millimeter wave radar and related equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6618800B1 (en) * 2000-01-18 2003-09-09 Systemonic Ag Procedure and processor arrangement for parallel data processing
CN102750262A (en) * 2012-06-26 2012-10-24 清华大学 Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6618800B1 (en) * 2000-01-18 2003-09-09 Systemonic Ag Procedure and processor arrangement for parallel data processing
CN102750262A (en) * 2012-06-26 2012-10-24 清华大学 Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SANGKYUN LEE等: "Implementing algorithms for signal and imagereconstruction on graphical processing units", 《COMPUTER SINECESDEPARTMENT》 *
陈帅等: "SAR图像压缩采样恢复的GPU并行实现", 《电子与信息学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104182209A (en) * 2014-08-27 2014-12-03 中国科学院软件研究所 PETSc-based GCRO-DR algorithm parallel processing method
CN104182209B (en) * 2014-08-27 2017-06-16 中国科学院软件研究所 A kind of GCRO DR algorithm method for parallel processing based on PETSc
CN117434511A (en) * 2023-12-13 2024-01-23 广东大湾区空天信息研究院 Multi-target angle disambiguation method based on millimeter wave radar and related equipment
CN117434511B (en) * 2023-12-13 2024-03-01 广东大湾区空天信息研究院 Multi-target angle disambiguation method based on millimeter wave radar and related equipment

Similar Documents

Publication Publication Date Title
CN102750262A (en) Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm
Bai et al. High-speed compressed sensing reconstruction on FPGA using OMP and AMP
Kepner et al. Graphs, matrices, and the GraphBLAS: Seven good reasons
Mishra et al. Multi-level Monte Carlo finite volume methods for nonlinear systems of conservation laws in multi-dimensions
Martinsson Randomized methods for matrix computations
Tang et al. Guaranteed tensor decomposition: A moment approach
Buzbee A fast Poisson solver amenable to parallel computation
Yamazaki et al. One-sided dense matrix factorizations on a multicore with multiple GPU accelerators
Zhao et al. Adaptive stochastic alternating direction method of multipliers
CN104133200A (en) Orthogonal matching pursuit method based on FPGA
Zhang et al. High performance GPU tensor completion with tubal-sampling pattern
CN102999316A (en) Parallel implementation method of orthogonal tracking algorithm in GPU (Graphics Processing Unit)
Pope et al. Real-time principal component pursuit
Cariow et al. Algorithm for multiplying two octonions
Wakam et al. Parallelism and robustness in GMRES with the Newton basis and the deflated restarting
CN103942805A (en) Rapid image sparse decomposition method based on partial polyatomic matching pursuit
Bischof QR factorization algorithms for coarse-grained distributed systems
Sun et al. A novel nonlocal MRI reconstruction algorithm with patch-based low rank regularization
Atanassov et al. Tuning the generation of Sobol sequence with Owen scrambling
Soltani et al. Stable recovery of sparse vectors from random sinusoidal feature maps
Lu et al. High-performance homomorphic matrix completion on GPUs
Hopfer et al. Solving the ghost–gluon system of Yang–Mills theory on GPUs
Cai et al. GPU-accelerated restricted boltzmann machine for collaborative filtering
Li et al. Fast compressive spectral clustering
Kaloorazi et al. Randomized ulv decomposition for approximating low-rank matrices

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C05 Deemed withdrawal (patent law before 1993)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130327