CN102750262A - Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm - Google Patents

Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm Download PDF

Info

Publication number
CN102750262A
CN102750262A CN2012102162247A CN201210216224A CN102750262A CN 102750262 A CN102750262 A CN 102750262A CN 2012102162247 A CN2012102162247 A CN 2012102162247A CN 201210216224 A CN201210216224 A CN 201210216224A CN 102750262 A CN102750262 A CN 102750262A
Authority
CN
China
Prior art keywords
matrix
gpu
observation
algorithm
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012102162247A
Other languages
Chinese (zh)
Inventor
张颢
陈帅
孟华东
王希勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN2012102162247A priority Critical patent/CN102750262A/en
Publication of CN102750262A publication Critical patent/CN102750262A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method for realizing sparse signal recovery on a CPU (Central Processing Unit) based on an OMP (Orthogonal Matching Pursuit) algorithm. The method comprises the following steps of: generating an observation matrix on the CPU, and selecting a column with the greatest relevancy to the residual in the observation matrix to complement a basis matrix, wherein the residual is the difference between the observations generated by an actual observation signal and an estimation signal, and the basis matrix is a matrix formed by nonzero element index values in corresponding column vectors in the observation matrix; by use of a method of least squares, estimating the nonzero elements of an original signal on the basis matrix of the kth step; continuing to select the column with the greatest relevancy to the residual in the observation matrix on the CPU to complement the basis matrix, when the variance between the real observation and the estimation observation is lower than a specified threshold, ending the iterative operation. The method provided by the invention has the following advantages: in parallel realization of the OMP algorithm by the CPU, the advantages of low computational complexity and high convergence rate of the OMP algorithm are combined, and simultaneously, the characteristic of remarkable acceleration performance of the CPU algorithm to the vector computation is fully used, and the running speed of the sparse recovery algorithm is improved effectively.

Description

On GPU, realize the method that sparse signal recovers based on the OMP algorithm
Technical field
The invention belongs to the signal processing technology field, particularly a kind of method that on GPU, realizes the sparse signal recovery based on the OMP algorithm.
Background technology
In recent years, the compressed sensing theory obtains extensive concern, and its explanation is satisfied under the prerequisite of sparse property at signal, uses much smaller than the SF of nyquist sampling rate data are sampled, and promptly can recover original signal fully.Compressed sensing is illustrated as with following mathematic(al) representation:
For original signal x ∈ R N, through observing matrix Φ ∈ R M * N, obtain observation vector y ∈ R M:
y=Φx (1)
Wherein M<<N, among the x significantly element number be S, S<<N.The CS theoretical research be: known observation y; Estimate to satisfy the sparse solution x of formula (1), promptly find one
Figure BDA00001809895900011
to satisfy:
min | | x ~ | | 0 , s . t . y = Φ x ~
Wherein, || || 0Expression L 0Norm is promptly calculated the nonzero element number.
At present, over against the optimization problem of formula (2), proposed a series of derivation algorithm, comprised approximate L1 optimization, greedy algorithm, Focuss algorithm etc., these algorithms can both effectively recover sparse signal under special scenes.Yet the common feature of this type algorithm is that computation complexity is high, and when finding the solution large-scale data, traditional C PU serial realizes long operational time, can't go out the original sparse signal by real-time recovery; Though and can realize quick calculating by mainframe computer or cluster, required cost is high, can not satisfy the demand of practical applications.
In recent years, (Graphics Processing Unit GPU) develops into multinuclear, the multithreading common application platform of a high-speed parallelization to graphic process unit, has very high cost performance solving on the computation-intensive problem.The present invention attempts utilizing this platform of GPU to improve the execution speed of OMP algorithm.
Following article and patent documentation have covered the main background technology in this field basically.In order to explain out the evolution of technology, we arrange in chronological order, and introduce the main contribution and the shortcoming of document one by one.
1.Tropp?J?A,Gilbert?A?C.Signal?recovery?from?random?measurements?via?orthogonal?matching?pursuit[J].IEEE?Transactions?on?Information?Theory,2007,53(12):4655-4666.
In the document, proposed a kind of algorithm of finding the solution zero norm minimum optimization based on greedy algorithm, this algorithm is with respect to littler based on the approximate convex optimized algorithm computation complexity of a norm, and resolution is higher.With respect to traditional coupling track algorithm, rectangular projection has increased the probability and the speed of convergence of successful recovery in each iterative process.
2.Sangkyun?Lee?S?W.Implementing?algorithms?for?signal?and?image?reconstruction?on?graphical?processing?units.Computer?Sciences?Department,University?of?Wisconsin-Madison,Tech.Rep.,November,2008.
In the document, people such as the Sangkyun Lee of Wisconsin university have realized the SpaRSA algorithm of compressed sensing on the GPU platform.The SpaRSA algorithm is a kind of of convex optimized algorithm, and computation complexity is bigger, even still needing on the GPU platform, to realize long computing time.Simultaneously, the SpaRSA algorithm has the protruding drawback of optimizing type algorithm, has higher secondary lobe exactly.
3.Andrecut?M.Fast?GPU?implementation?of?sparse?signal?recovery?from?random?projections[J].Engineering?Letters.2009,17(3):151-158.
In this document, the people such as Andrecut of Calgary university have realized match tracing (Matching Pursuit, MP) the GPU parallelization of algorithm.The shortcoming that this method exists is exactly that the speed of convergence of MP algorithm itself is slow, and when basic correlativity was big, the probability of success recovery was little.
Summary of the invention
In order to overcome the deficiency of above-mentioned prior art, the object of the present invention is to provide and a kind ofly on GPU, realize the method that sparse signal recovers based on the OMP algorithm, with OMP algorithm Parallel Implementation on GPU, thereby sparse signal is recovered.
To achieve these goals, the technical scheme of the present invention's employing is:
On GPU, realize the method that sparse signal recovers based on the OMP algorithm, may further comprise the steps:
Step 1: on GPU, generate observing matrix Φ, element calculates according to following formula in the matrix:
Φ ( m , n ) = 1 N , h ( m ) = 0,0 ≤ n ≤ N - 1 2 N cos π ( 2 n + 1 ) h ( m ) 2 N , 1 ≤ h ( m ) ≤ N - 1,0 ≤ n ≤ N - 1
Wherein, h=(h 0, h 1, h 2..., h M-1), h i∈ (0,1,2 ..., the pseudo-random number sequence that N-1) generates for computing machine, N treats the length of restoring signal for the OMP algorithm, M is the observation number in the compressed sensing, M<n;
Step 2: in GPU, select to add in the basis matrix with the maximum row of residual error correlativity among the observing matrix Φ; Wherein, Residual error is defined as the difference between the observation that actual observation and estimated signal produce, and the definition basis matrix is the nonzero element index value matrix that corresponding column vector is formed in observing matrix Φ to mathematical expression for
Figure BDA00001809895900032
;
Said GPU in each row of compute matrix and the correlation process of residual error, v=Φ TR, wherein
Figure BDA00001809895900033
Each inner stream handle of GPU is carried out the correlativity of row and residual error, promptly The result who compares each stream handle at last, the row that correlativity is maximum expand to be supported to concentrate, simultaneously, and the index value of the row of record correlativity maximum, preceding k step index value constitutes vector v;
Each stream handle among the said GPU is responsible for vector and vectorial r does inner product; In each stream handle; Through with
Figure BDA00001809895900036
and r be divided into corresponding multistage, a plurality of thread parallels carry out multiply operation to each segmentation;
Step 3: utilize least square method on the k basis matrix in step, to estimate the nonzero element of original signal, find the solution through least-squares estimation, the realization of least square is decomposed realization through QR;
Step 4: continue step 2, the variance of observing when true observation and estimation is lower than the appointed threshold value, promptly
Figure BDA00001809895900037
The finishing iteration operation, wherein y is true observation,
Figure BDA00001809895900038
Be the restoration result after k step iteration, ε is a relative error, and is relevant with observation noise, || a|| 2Two norms of representing vectorial a.
Observing matrix Φ is the capable battle array of taking out at random of DCT matrix in the said step 1, wherein, takes out line operate at random and is generated by computer simulation, confirms to take out line position through producing a series of pseudo random number.
Be that said observing matrix Φ allocated size is M * N on GPU; The storage space of float type.
Generate observing matrix Φ parallel carrying out on GPU in the said step 1; Specifically be with in this generation Task Distribution to 64 thread, thread i is responsible for generating the parallel generation of accomplishing observing matrix Φ on a plurality of processors of
Figure BDA00001809895900041
a plurality of threads.
The Parallel Implementation of said multiply operation is divided into coarse grain parallelism and fine granularity is parallel, at matrix and vector multiplication Φ TAmong the r, matrix Φ TThe execute vector multiplication is a coarse grain parallelism between each row and the r, is accomplished by thread block, and thread block i is responsible for execution: v i=<φ i, r>, wherein<..; Represent two vector calculation inner products; At v i=<φ i, r>Element and element execution multiply operation is accomplished by multi-threaded parallel in the computation process, and this is fine-grained parallel.Specifically be embodied as, the thread j of thread block i is responsible for execution
Figure BDA00001809895900042
T wherein iBe the intermediate result of calculating, T is a Thread Count in each thread block, guarantees that through syncthreads function among the SDK that calls the GPU concurrent development all threads in the same thread block are all complete, calculates then
Figure BDA00001809895900043
Promptly accomplished Φ TR.
Said two norm calculation are carried out by a plurality of thread parallels, and concrete realization can be with reference to Φ TFine-grained Parallel Implementation among the r.
Compared with prior art; Advantage of the present invention is: GPU is to the Parallel Implementation of OMP algorithm; In conjunction with the advantage of little, the fast convergence rate of OMP algorithm computation complexity; Give full play to the GPU algorithm simultaneously and calculate the acceleration outstanding feature, effectively improved the travelling speed of sparse recovery algorithms for vector.
Description of drawings
Fig. 1 is parallel OMP algorithm flow chart.
Fig. 2 is matrix-vector multiplication coarse grain parallelism on GPU.
Fig. 3 is that matrix-vector multiplication fine granularity on GPU is parallel
Fig. 4 is that the OMP algorithm compares the computing time on GPU and CPU.
Embodiment
Below in conjunction with accompanying drawing and embodiment the present invention is explained further details.
In the parallel OMP algorithm flow chart of Fig. 1, at first need be on GPU the storage allocation space, and carry out initialization.Carry out iterative operation then, this part is divided into following four steps:
Step 1 is calculated residual energy, whether checks termination of iterations.
Step 2, parallel computation observing matrix and vectorial correlativity are selected the maximum column index of the degree of correlation then concurrently.
Step 3, the column vector that the column index of a last generating step is corresponding is added in the basis matrix.
Step 4 based on the method for the employing of the basis matrix after expansion least square, is estimated restoration result.Jump to step 1.
Behind termination of iterations, the restoration result of last iteration is the algorithm execution result.
Specifically, method of the present invention comprises the steps:
Step 1: on GPU, generate observing matrix Φ.It is M pseudo-random sequence h=(h that computer simulation generates length 0, h 1, h 2..., h M-1), hi ∈ (0,1,2 ..., N-1), the randomness of its decision stochastic sampling.Thereby, generate and owe fixed observing matrix Φ.Thereby the element of Φ calculates according to following formula:
&Phi; ( m , n ) = 1 N , h ( m ) = 0,0 &le; n &le; N - 1 2 N cos &pi; ( 2 n + 1 ) h ( m ) 2 N , 1 &le; h ( m ) &le; N - 1,0 &le; n &le; N - 1
N waits to recover the length of vector for the OMP algorithm, and M observes number in the OMP algorithm, M < N.For the convenient performance detection that realizes, generate sparse signal x at random, wherein the number of remarkable element is S among the x, and S is defined as degree of rarefication in the compressed sensing problem, and < < N, significantly the amplitude of element generates S at random.There are Φ and x to calculate observation data y, are used for the recovery of OMP algorithm.
In the GPU implementation procedure, at first need on GPU, distribute the storage space of M * N float type of Φ, will generate in Task Distribution to 64 thread, thread i is responsible for generating (Φ (i, 0), Φ (i; 1), Φ (i, 2) ..., Φ (i, N-1)); (Φ (i, 0), Φ (i, 1), Φ (i, 2); ..., Φ (i, N-1)) ..., a plurality of threads highly-parallel on a plurality of processors is accomplished the generation of observing matrix.Simultaneously, pseudo-random sequence h needs repeatedly visit, and all is read-only operation, can h be stored as the constant storage unit by the characteristics of the GPU multiple programming model of Nvidia.The IO access delay can effectively be reduced to the read access optimization of constant storage unit in GPU inside, thereby reduces the overall operation time.
Step 2: data are transferred to GPU from CPU, initialization data.Before carrying out the OMP algorithm steps, at first need on GPU, give observation data, intermediate variable storage allocation space, and observation data is transferred among the GPU.In concrete the realization, the interface cublasAlloc that calculates the storehouse by the cublas vector accomplishes the Memory Allocation to variable, accomplishes that through cublasSetVector observation data is transferred to the internal memory of GPU from CPU.
Step 3: check whether finishing iteration is operated.The OMP algorithm estimation original signal that need iterate, the end condition of iteration be true observation with the observation of estimated signal calculating between poor energy be lower than a certain thresholding, be described as with mathematical formulae:
| | y - &Phi; x ^ k | | 2 < &epsiv; | | y | | 2
Wherein y is true observation,
Figure BDA00001809895900062
Be the restoration result after k step iteration, ε is a relative error, || || 2Two norms of expression vector.
Matrix and vectorial multiplication highly-parallelization on the GPU platform, each stream handle is responsible for the delegation and the vector of matrix and is done inner product, and in single stream handle, multi-threaded parallel carries out multiply operation to the part element of vector.The visible accompanying drawing 2 of concrete operations.Two norm calculation, a plurality of thread parallels are carried out, and each thread is accomplished square calculating of part vector, with this part vector summation, accomplishes the anded of each several part at last.
Step 4: with the maximum row of residual error correlativity, expand basis matrix in the parallel computation observing matrix.The OMP algorithm has been inherited the characteristics of greedy type of algorithm, in each iteration, selects to add in the basis matrix with the maximum row of residual error correlativity in the observing matrix.GPU is in each row of compute matrix and the correlation process of residual error, and each inner stream handle of GPU is carried out the correlativity of row and residual error, the result of each stream handle relatively at last, the row that correlativity is maximum expand to support concentrated.Simultaneously, the index value of the row that the record correlativity is maximum, the vector that preceding k step index value constitutes is v.
Step 5: utilize least square method on the k basis matrix in step, to estimate the nonzero element of original signal.
CublasDger function through calling cublas is accomplished least-squares estimation, obtains current estimated signal.Continue step 3.
Fig. 2 is expressed as in the realization of matrix-vector multiplication coarse grain parallelism on GPU, matrix Φ TThe execute vector multiplication is a coarse grain parallelism between each row and the r, is accomplished by thread block, and thread block i is responsible for execution: v i=<φ i, r>, wherein<..; Represent two vector calculation inner products;
Fig. 3 is expressed as in the parallel realization of matrix-vector multiplication fine granularity on GPU, v i=<φ i, r>Element and element execution multiply operation is accomplished by multi-threaded parallel in the computation process.Specifically be embodied as, the thread j of thread block i is responsible for execution
Figure BDA00001809895900071
T wherein iThe intermediate result of be calculating, T be a Thread Count in each thread block, through among the SDK that calls the GPU concurrent development _ the syncthreads function guarantees that all threads in the same thread block are all complete, calculating then Promptly accomplished Φ TR.
As shown in Figure 4, the computing time of OMP algorithm on GPU and CPU relatively in, special hour of data scale, GPU was because length consuming time in the start-up course, and little data scale can't embody its parallel advantage, so overall computing time is long; Along with data scale increases, the parallel advantage of GPU progressively embodies, and the mistiming that GPU realizes and traditional C PU realizes is exponential increase.

Claims (6)

1. on GPU, realize the method that sparse signal recovers based on the OMP algorithm, it is characterized in that, may further comprise the steps:
Step 1: on GPU, generate observing matrix Φ, element calculates according to following formula in the matrix:
&Phi; ( m , n ) = 1 N , h ( m ) = 0,0 &le; n &le; N - 1 2 N cos &pi; ( 2 n + 1 ) h ( m ) 2 N , 1 &le; h ( m ) &le; N - 1,0 &le; n &le; N - 1
Wherein, h=(h 0, h 1, h 2..., h M-1), hi ∈ (0,1,2 ..., the pseudo-random number sequence that N-1) generates for computing machine, N treats the length of restoring signal for the OMP algorithm, M is the observation number in the compressed sensing, M<n;
Step 2: in GPU, select to add in the basis matrix with the maximum row of residual error correlativity among the observing matrix Φ; Wherein, Residual error is defined as the difference between the observation that actual observation and estimated signal produce, and the definition basis matrix is the nonzero element index value matrix that corresponding column vector is formed in observing matrix Φ to mathematical expression for
Figure FDA00001809895800012
;
Said GPU in each row of compute matrix and the correlation process of residual error, v=Φ TR, wherein
Figure FDA00001809895800013
Each inner stream handle of GPU is carried out the correlativity of row and residual error, promptly
Figure FDA00001809895800014
The result who compares each stream handle at last, the row that correlativity is maximum expand to be supported to concentrate, simultaneously, and the index value of the row of record correlativity maximum, preceding k step index value constitutes vector v;
Each stream handle among the said GPU is responsible for vector and vectorial r does inner product; In each stream handle; Through with
Figure FDA00001809895800016
and r be divided into corresponding multistage, a plurality of thread parallels carry out multiply operation to each segmentation;
Step 3: utilize least square method on the k basis matrix in step, to estimate the nonzero element of original signal, find the solution through least-squares estimation, the realization of least square is decomposed realization through QR;
Step 4: continue step 2, the variance of observing when true observation and estimation is lower than the appointed threshold value, promptly
Figure FDA00001809895800017
The finishing iteration operation, wherein y is true observation,
Figure FDA00001809895800018
Be the restoration result after k step iteration, ε is a relative error, and is relevant with observation noise, || a|| 2Two norms of representing vectorial a.
2. according to the said method that on GPU, realizes the sparse signal recovery of claim 1; It is characterized in that observing matrix Φ is the capable battle array of taking out at random of DCT matrix in the said step 1, wherein; Take out line operate at random and generate, confirm to take out line position through producing a series of pseudo random number by computer simulation.
3. according to the said method that on GPU, realizes the sparse signal recovery of claim 1, it is characterized in that, is that said observing matrix Φ allocated size is M * N on GPU; The storage space of float type.
4. according to the said method that on GPU, realizes the sparse signal recovery of claim 1; It is characterized in that; Generate observing matrix Φ parallel carrying out on GPU in the said step 1; Specifically be with in this generation Task Distribution to 64 thread, thread i is responsible for generating the parallel generation of accomplishing observing matrix Φ on a plurality of processors of
Figure FDA00001809895800021
a plurality of threads.
5. according to the said method that on GPU, realizes the sparse signal recovery of claim 1, it is characterized in that the Parallel Implementation of said multiply operation is divided into coarse grain parallelism and fine granularity is parallel, at matrix and vector multiplication Φ TAmong the r, matrix Φ TThe execute vector multiplication is a coarse grain parallelism between each row and the r, is accomplished by thread block, and thread block i is responsible for execution: v i=<φ i, r>, wherein<..; Represent two vector calculation inner products; At v i=<φ i, r>Element and element execution multiply operation is accomplished by multi-threaded parallel in the computation process, and this is fine-grained parallel; Specifically be embodied as, the thread j of thread block i is responsible for execution
Figure FDA00001809895800022
T wherein iBe the intermediate result of calculating, T is a Thread Count in each thread block, guarantees that through syncthreads function among the SDK that calls the GPU concurrent development all threads in the same thread block are all complete, calculates then
Figure FDA00001809895800023
Promptly accomplished Φ TR.
6. according to the said method that on GPU, realizes the sparse signal recovery of claim 1, it is characterized in that said two norm calculation are carried out by a plurality of thread parallels.
CN2012102162247A 2012-06-26 2012-06-26 Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm Pending CN102750262A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012102162247A CN102750262A (en) 2012-06-26 2012-06-26 Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012102162247A CN102750262A (en) 2012-06-26 2012-06-26 Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm

Publications (1)

Publication Number Publication Date
CN102750262A true CN102750262A (en) 2012-10-24

Family

ID=47030458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012102162247A Pending CN102750262A (en) 2012-06-26 2012-06-26 Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm

Country Status (1)

Country Link
CN (1) CN102750262A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999316A (en) * 2012-11-16 2013-03-27 清华大学 Parallel implementation method of orthogonal tracking algorithm in GPU (Graphics Processing Unit)
CN103532567A (en) * 2013-11-01 2014-01-22 哈尔滨工业大学 Signal reconstruction method of OMP (orthogonal matching pursuit) based on rapid inner product calculation under distributed type CS (compressed sensing) framework
CN104318522A (en) * 2014-10-08 2015-01-28 苏州新视线文化科技发展有限公司 Graphics processing unit-based sparse representation fast calculation method
CN105474195A (en) * 2013-10-21 2016-04-06 华为技术有限公司 Method for recovering a sparse communication signal from a receive signal
CN105488767A (en) * 2015-11-30 2016-04-13 盐城工学院 Rapid reconstructing method of compressed sensing image based on least square optimization
CN106204501A (en) * 2016-07-29 2016-12-07 上海科技大学 A kind of compressed sensing restoration methods
CN107592115A (en) * 2017-09-12 2018-01-16 西北工业大学 A kind of sparse signal restoration methods based on non-homogeneous norm constraint
CN108573262A (en) * 2018-05-08 2018-09-25 南京大学 A kind of higher-dimension sparse vector reconstructing method based on IGR_OMP
CN108664448A (en) * 2018-05-08 2018-10-16 南京大学 A kind of higher-dimension sparse vector reconstructing method based on IQR_OMP
CN112967167A (en) * 2019-12-12 2021-06-15 中国科学院深圳先进技术研究院 GPU-based image rapid reconstruction method, computer-readable medium and computing device
US11494463B2 (en) 2020-04-14 2022-11-08 Microsoft Technology Licensing, Llc Set operations using multi-core processing unit
CN115408653A (en) * 2022-11-01 2022-11-29 泰山学院 Highly-extensible parallel processing method and system for IDRstab algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164405A1 (en) * 2007-12-21 2009-06-25 Honda Motor Co., Ltd. Online Sparse Matrix Gaussian Process Regression And Visual Applications
CN101640541A (en) * 2009-09-04 2010-02-03 西安电子科技大学 Reconstruction method of sparse signal
CN101908890A (en) * 2010-07-30 2010-12-08 哈尔滨工业大学 Blind reconstructing method of block sparse signal with unknown block size

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164405A1 (en) * 2007-12-21 2009-06-25 Honda Motor Co., Ltd. Online Sparse Matrix Gaussian Process Regression And Visual Applications
CN101640541A (en) * 2009-09-04 2010-02-03 西安电子科技大学 Reconstruction method of sparse signal
CN101908890A (en) * 2010-07-30 2010-12-08 哈尔滨工业大学 Blind reconstructing method of block sparse signal with unknown block size

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈帅 等: "SAR图像压缩采样恢复的GPU并行实现", 《电子与信息学报》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999316A (en) * 2012-11-16 2013-03-27 清华大学 Parallel implementation method of orthogonal tracking algorithm in GPU (Graphics Processing Unit)
CN105474195A (en) * 2013-10-21 2016-04-06 华为技术有限公司 Method for recovering a sparse communication signal from a receive signal
CN105474195B (en) * 2013-10-21 2018-11-20 华为技术有限公司 Method for restoring sparse signal of communication from reception signal
CN103532567A (en) * 2013-11-01 2014-01-22 哈尔滨工业大学 Signal reconstruction method of OMP (orthogonal matching pursuit) based on rapid inner product calculation under distributed type CS (compressed sensing) framework
CN103532567B (en) * 2013-11-01 2016-12-07 哈尔滨工业大学 Signal reconfiguring method based on the orthogonal matching pursuit algorithm quickly calculating inner product under distributed compression perception framework
CN104318522A (en) * 2014-10-08 2015-01-28 苏州新视线文化科技发展有限公司 Graphics processing unit-based sparse representation fast calculation method
CN105488767A (en) * 2015-11-30 2016-04-13 盐城工学院 Rapid reconstructing method of compressed sensing image based on least square optimization
CN105488767B (en) * 2015-11-30 2018-08-07 盐城工学院 A kind of compressed sensing image fast reconstructing method based on Least-squares minimization
CN106204501B (en) * 2016-07-29 2019-03-19 上海科技大学 A kind of compressed sensing restoration methods
CN106204501A (en) * 2016-07-29 2016-12-07 上海科技大学 A kind of compressed sensing restoration methods
CN107592115A (en) * 2017-09-12 2018-01-16 西北工业大学 A kind of sparse signal restoration methods based on non-homogeneous norm constraint
CN107592115B (en) * 2017-09-12 2020-09-08 西北工业大学 Sparse signal recovery method based on non-uniform norm constraint
CN108664448A (en) * 2018-05-08 2018-10-16 南京大学 A kind of higher-dimension sparse vector reconstructing method based on IQR_OMP
CN108573262A (en) * 2018-05-08 2018-09-25 南京大学 A kind of higher-dimension sparse vector reconstructing method based on IGR_OMP
CN108664448B (en) * 2018-05-08 2021-06-01 南京大学 High-dimensional sparse vector reconstruction method based on IQR _ OMP
CN108573262B (en) * 2018-05-08 2021-06-25 南京大学 IGR-OMP-based high-dimensional sparse vector reconstruction method
CN112967167A (en) * 2019-12-12 2021-06-15 中国科学院深圳先进技术研究院 GPU-based image rapid reconstruction method, computer-readable medium and computing device
CN112967167B (en) * 2019-12-12 2023-04-28 中国科学院深圳先进技术研究院 GPU-based image quick reconstruction method, computer-readable medium and computing device
US11494463B2 (en) 2020-04-14 2022-11-08 Microsoft Technology Licensing, Llc Set operations using multi-core processing unit
CN115408653A (en) * 2022-11-01 2022-11-29 泰山学院 Highly-extensible parallel processing method and system for IDRstab algorithm

Similar Documents

Publication Publication Date Title
CN102750262A (en) Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm
Kepner et al. Graphs, matrices, and the GraphBLAS: Seven good reasons
Plimpton et al. Mapreduce in MPI for large-scale graph algorithms
Koanantakool et al. Communication-avoiding parallel sparse-dense matrix-matrix multiplication
Fukaya et al. CholeskyQR2: a simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system
Sun et al. Optimizing SpMV for diagonal sparse matrices on GPU
Wang et al. An FPGA implementation of the Hestenes-Jacobi algorithm for singular value decomposition
CN103345580A (en) Parallel CFD method based on lattice Boltzmann method
Margaris et al. Parallel implementations of the jacobi linear algebraic systems solve
Shi et al. Efficient sparse-dense matrix-matrix multiplication on GPUs using the customized sparse storage format
Cariow et al. Algorithm for multiplying two octonions
Wakam et al. Parallelism and robustness in GMRES with the Newton basis and the deflated restarting
Mansour et al. A fast randomized Kaczmarz algorithm for sparse solutions of consistent linear systems
Weiße Divide and conquer the Hilbert space of translation-symmetric spin systems
CN102999316A (en) Parallel implementation method of orthogonal tracking algorithm in GPU (Graphics Processing Unit)
Fukazawa et al. Performance measurement of magnetohydrodynamic code for space plasma on the various scalar-type supercomputer systems
CN115034360A (en) Processing method and processing device for three-dimensional convolution neural network convolution layer
Li et al. Paralleled fast search and find of density peaks clustering algorithm on gpus with CUDA
Miki et al. Highly scalable implementation of an N-body code on a GPU cluster
Bischof QR factorization algorithms for coarse-grained distributed systems
Zhang et al. Accelerating lattice QCD on sunway many-core processor
Oancea et al. Developing a high performance software library with MPI and CUDA for matrix computations
Hülsemann et al. Hierarchical hybrid grids as basis for parallel numerical solution of PDE
Leal Souza et al. A novel competitive quantum-behaviour evolutionary multi-swarm optimizer algorithm based on cuda architecture applied to constrained engineering design
Kuznetsov An approach of the QR factorization for tall-and-skinny matrices on multicore platforms

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121024