CN104318522A - Graphics processing unit-based sparse representation fast calculation method - Google Patents

Graphics processing unit-based sparse representation fast calculation method Download PDF

Info

Publication number
CN104318522A
CN104318522A CN201410524734.XA CN201410524734A CN104318522A CN 104318522 A CN104318522 A CN 104318522A CN 201410524734 A CN201410524734 A CN 201410524734A CN 104318522 A CN104318522 A CN 104318522A
Authority
CN
China
Prior art keywords
dictionary
processing unit
atom
array
calculation method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410524734.XA
Other languages
Chinese (zh)
Inventor
田岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUZHOU NEW VISION CULTURE TECHNOLOGY DEVELOPMENT Co Ltd
Original Assignee
SUZHOU NEW VISION CULTURE TECHNOLOGY DEVELOPMENT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUZHOU NEW VISION CULTURE TECHNOLOGY DEVELOPMENT Co Ltd filed Critical SUZHOU NEW VISION CULTURE TECHNOLOGY DEVELOPMENT Co Ltd
Priority to CN201410524734.XA priority Critical patent/CN104318522A/en
Publication of CN104318522A publication Critical patent/CN104318522A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Processing (AREA)

Abstract

The invention relates to the technical field of super-resolution reconstruction of video images, and particularly relates to a graphics processing unit-based sparse representation fast calculation method. The method comprises steps: atoms in a dictionary matrix are classified to form a dictionary tree, and the dictionary tree is adjusted to obtain a continuous dictionary array; the dictionary array and a to-be-processed signal matrix are sent to the graphics processing unit; the signal matrix is internally divided into a plurality of parallel units, wherein the parallel units comprise a plurality of feature vectors; in the graphics processing unit, one thread is built for each feature vector; in each thread, the feature vector is traversed to obtain sparse representation of the feature vector corresponding to the thread. As the plurality of feature vectors are processed at the same time via the parallel units, the parallel calculation ability of hardware is effectively used, the speed of extracting the graphics features can be improved, and the processing process of converting a standard-definition graphics into a high-definition graphics is quickened.

Description

A kind of rarefaction representation quick calculation method of graphic based processing unit
Technical field
The present invention relates to the super-resolution rebuilding technical field of video image, particularly relate to a kind of rarefaction representation quick calculation method of graphic based processing unit.
Background technology
Image Super-resolution technology can be divided into single-frame images super-resolution technique and multiple image super-resolution technique, and the former only needs the piece image in scene, and the latter needs the video sequence of same scene.
Super-resolution rebuilding is one and obtains the process of more high-resolution video image by utilizing complementary information in the interframe of low-resolution video image or frame, in recent years, a lot of achievement in research is created both at home and abroad in super-resolution rebuilding research, from the angle of research methodology, can be divided into based on the method for interpolation, based on the method for rebuilding and the method based on study.Regardless of which kind of method, all relate to a large amount of matrix operations, therefore, reach real-time, parallel processing must be used accelerate.
In recent decades, the development of CPU reduces the cost of computing machine, and the lifting of adding performance facilitates the development of whole computer industry.But last decade comes, at hardware aspect by the impact of power consumption with heat dissipation problem, the arithmetic capability of single cpu is promoted be restricted, therefore the parallel computation application of multi-core CPU receives publicity, but the computation capability of multi-core CPU is still difficult to meet present society to large data and the requirement such as high definition, real time data processing.
Summary of the invention
The object of the invention is to the rarefaction representation quick calculation method proposing a kind of graphic based processing unit, effectively utilize the computation capability of hardware, improve the speed of image characteristics extraction.
For reaching this object, the present invention by the following technical solutions:
A rarefaction representation quick calculation method for graphic based processing unit, comprising:
Step 110, classify to the atom in dictionary matrix, structure dictionary tree, adjusts described dictionary tree, obtains continuous print dictionary array;
Step 120, described dictionary array and pending signal matrix are sent in Graphics Processing Unit;
Step 130, be split as several Parallel Unit by described signal matrix, described Parallel Unit comprises several proper vectors; In Graphics Processing Unit, for proper vector described in each sets up a thread;
In step 140, thread described in each, travel through described dictionary array, obtain the rarefaction representation of described thread characteristic of correspondence vector.
Wherein, described step 140 comprises:
In step 141, thread described in each, give initial residual error by described proper vector assignment;
Step 142, travel through described dictionary array, from described dictionary array, select an atom mated most with described initial residual error, build a sparse bayesian learning;
Step 143, obtain the residual error of described sparse bayesian learning and described proper vector, judge whether described sparse bayesian learning meets the condition preset; If so, then the rarefaction representation of described proper vector is made up of the linear and described residual error of the atom selected from dictionary array; Otherwise, give described initial residual error by this residual error assignment, return step 142.
Wherein, described default condition is:
The iterations of described sparse bayesian learning reaches default number of times; Or,
The error of described residual error is less than default threshold value.
Wherein, after described step 142, also comprise before step 143:
Step 1421, orthogonalization process is carried out to the described atom mated most.
Wherein, described Parallel Unit is corresponding with the columns of described signal matrix.
Wherein, describedly to classify to the atom in dictionary matrix, the step being configured to dictionary tree is specially:
Adopt K means clustering algorithm to classify to the atom in dictionary matrix, obtain several category nodes, several category nodes described are configured to dictionary tree.
Wherein, the data of described category node comprise: the atom number that class central atom, category node comprise, atom columns, pointer data in described dictionary matrix.
Wherein, described category node comprises: intermediate node and leaf node.
Wherein, described dictionary array also comprises: the first array being used for preserving class central atom data in intermediate node, the second array being used for depositing the left sibling index value data of intermediate node, be used for depositing the index value data of leaf node the 3rd array, be used for depositing four array of the atom in leaf node in the position data of described dictionary matrix.
Wherein, the data in described signal matrix are low resolution characteristic signal; Atom in described dictionary matrix is high-resolution features signal.
Beneficial effect of the present invention is: a kind of rarefaction representation quick calculation method of graphic based processing unit, comprising: classify to the atom in dictionary matrix, and structure dictionary tree, adjusts described dictionary tree, obtain continuous print dictionary array; Described dictionary array and pending signal matrix are sent in Graphics Processing Unit; Be split as several Parallel Unit by described signal matrix, described Parallel Unit comprises several proper vectors; In Graphics Processing Unit, for proper vector described in each sets up a thread; In thread described in each, travel through described dictionary array, obtain the rarefaction representation of described thread characteristic of correspondence vector, the present invention is processed multiple proper vector by Parallel Unit simultaneously, effectively utilize the computation capability of hardware, improve the speed of image characteristics extraction, thus accelerate the processing procedure that SD image turns high-definition image.
Accompanying drawing explanation
Fig. 1 is the rarefaction representation quick calculation method process flow diagram of a kind of graphic based processing unit that the specific embodiment of the invention provides.
Embodiment
Technical scheme of the present invention is further illustrated by embodiment below in conjunction with Fig. 1.
Fig. 1 is the rarefaction representation quick calculation method process flow diagram of a kind of graphic based processing unit that the specific embodiment of the invention provides.
A rarefaction representation quick calculation method for graphic based processing unit, comprising:
Step 110, classify to the atom in dictionary matrix, structure dictionary tree, adjusts described dictionary tree, obtains continuous print dictionary array;
Step 120, described dictionary array and pending signal matrix be sent in Graphics Processing Unit (Graphic Processing Unit is called for short GPU);
Step 130, be split as several Parallel Unit by described signal matrix, described Parallel Unit comprises several proper vectors; In Graphics Processing Unit, for proper vector described in each sets up a thread;
In step 140, thread described in each, travel through described dictionary array, obtain the rarefaction representation of described thread characteristic of correspondence vector.
In the present embodiment, by Parallel Unit, multiple proper vector is processed simultaneously, effectively utilize the computation capability of hardware, improve the speed of image characteristics extraction, thus accelerate the processing procedure that SD image turns high-definition image.
In the present embodiment, the data structure more complicated of dictionary tree, is not suitable for the fast processing of GPU, and GPU can only be fairly simple for data type, and memory headroom continuous print data, therefore need dictionary tree to adjust, obtain continuous print dictionary array.
In the present embodiment, described step 140 comprises:
In step 141, thread described in each, give initial residual error by described proper vector assignment;
Step 142, travel through described dictionary array, from described dictionary array, select an atom mated most with described initial residual error, build a sparse bayesian learning;
Step 143, obtain the residual error of described sparse bayesian learning and described proper vector, judge whether described sparse bayesian learning meets the condition preset; If so, then the rarefaction representation of described proper vector is made up of the linear and described residual error of the atom selected from dictionary array; Otherwise, give described initial residual error by this residual error assignment, return step 142.
In the present embodiment, described default condition is:
The iterations of described sparse bayesian learning reaches default number of times; Or,
The error of described residual error is less than default threshold value.
In the present embodiment, after described step 142, also comprise before step 143:
Step 1421, orthogonalization process is carried out to the described atom mated most, to make convergence of algorithm speed faster.
In the present embodiment, described Parallel Unit is corresponding with the columns of described signal matrix.
In the present embodiment, in graphics processing unit hardware system each two-dimentional thread block comprise thread quantity be designated as: [T x, T y], the size of described signal matrix is designated as: mmn, then the number of the thread block of each Parallel Unit needs establishment is: expression rounds up, the corresponding thread of each proper vector of signal matrix.
In the present embodiment, describedly to classify to the atom in dictionary matrix, the step being configured to dictionary tree is specially:
Adopt K means clustering algorithm to classify to the atom in dictionary matrix, obtain several category nodes, several category nodes described are configured to dictionary tree.
In the present embodiment, the data of described category node comprise: the atom number that class central atom, category node comprise, atom columns, pointer data in described dictionary matrix.
In the present embodiment, described category node comprises: intermediate node and leaf node.
In the present embodiment, described dictionary array also comprises: the first array being used for preserving class central atom data in intermediate node, the second array being used for depositing the left sibling index value data of intermediate node, be used for depositing the index value data of leaf node the 3rd array, be used for depositing four array of the atom in leaf node in the position data of described dictionary matrix.
In the present embodiment, suppose that the upper limit of class central atom number is T, traversal dictionary array C, search wherein with the proper vector y also i.e. immediate atom of initial residual error r, be assumed to be i, if the left sibling index value CI [i] of this atom is non-vanishing, then represent that atom i is intermediate node, there is child node, continuation search index is the atom of CI [i] to CI [i]+3; If CI [i] is zero, then represent that atom i is leaf node, search terminates.Now, the index value of this leaf node is LI [i], it is numbered p, then the columns of this atom in dictionary matrix is LOI [(T+1) * p], the dictionary index of its correspondence is LOI [(T+1) * p+1]-LOI [(T+1) * p+T], finally proper vector y is once made inner product with these atoms, find out inner product the maximum, build a sparse bayesian learning.
In the present embodiment, the data in described signal matrix are low resolution characteristic signal, atom in described dictionary matrix is high-resolution features signal, the present invention finds corresponding rarefaction representation atom and weighting coefficient thereof in low resolution characteristics dictionary, then using the weighted array of the high-resolution features dictionary atom of correspondence as output, traditional account form tries to achieve its rarefaction representation atom and weighting coefficient to each row of matrix successively, and when the parallel processing of GPU, main frame is each row distribution thread of matrix, each thread computes rarefaction representation atom and weighting coefficient, multiple thread processes simultaneously, this efficiency of calculating that will greatly improve.
In the present embodiment, the double-core CPU of Pentium (R) Dual-Core CPU E6500@2.93GHz is furnished with one, the PC of a NVIDIA GeForce GT640 (1G video memory) is experiment test platform, and test result is as shown in the table:
Table 1
Degree of rarefication CPU realizes GPU realizes Speed-up ratio
8 7672.421ms 1674.776ms 4.58
12 11914.141ms 1749.547ms 6.81
16 17131.137ms 1759.980ms 9.73
In Table 1, the proper vector number of rarefaction representation is 515404, and the dimension of single proper vector is 26, and as shown in Table 1, adopt the rarefaction representation computing method in the present invention, compared with prior art, it is less consuming time, and average speedup is more than 5.
Table 2
Degree of rarefication CPU realizes GPU realizes Speed-up ratio
8 1576.592ms 353.123ms 4.46
12 2619.522ms 361.147ms 7.25
16 3763.591ms 376.842ms 9.99
In table 2, the proper vector number of rarefaction representation is 110564, and the dimension of single proper vector is 26, and as shown in Table 2, adopt the rarefaction representation computing method in the present invention, compared with prior art, it is less consuming time, and average speedup is more than 5.
The foregoing is only the specific embodiment of the present invention, these describe just in order to explain principle of the present invention, and can not be interpreted as limiting the scope of the invention by any way.Based on explanation herein, those skilled in the art does not need to pay performing creative labour can associate other specific implementation method of the present invention, and these modes all will fall within protection scope of the present invention.

Claims (10)

1. a rarefaction representation quick calculation method for graphic based processing unit, is characterized in that, comprising:
Step 110, classify to the atom in dictionary matrix, structure dictionary tree, adjusts described dictionary tree, obtains continuous print dictionary array;
Step 120, described dictionary array and pending signal matrix are sent in Graphics Processing Unit;
Step 130, be split as several Parallel Unit by described signal matrix, described Parallel Unit comprises several proper vectors; In Graphics Processing Unit, for proper vector described in each sets up a thread;
In step 140, thread described in each, travel through described dictionary array, obtain the rarefaction representation of described thread characteristic of correspondence vector.
2. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 1, it is characterized in that, described step 140 comprises:
In step 141, thread described in each, give initial residual error by described proper vector assignment;
Step 142, travel through described dictionary array, from described dictionary array, select an atom mated most with described initial residual error, build a sparse bayesian learning;
Step 143, obtain the residual error of described sparse bayesian learning and described proper vector, judge whether described sparse bayesian learning meets the condition preset; If so, then the rarefaction representation of described proper vector is made up of the linear and described residual error of the atom selected from dictionary array; Otherwise, give described initial residual error by this residual error assignment, return step 142.
3. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 2, is characterized in that, described default condition is:
The iterations of described sparse bayesian learning reaches default number of times; Or,
The error of described residual error is less than default threshold value.
4. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 2, is characterized in that, after described step 142, also comprises before step 143:
Step 1421, orthogonalization process is carried out to the described atom mated most.
5. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 1, is characterized in that, described Parallel Unit corresponding with the columns of described signal matrix.
6. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 1, is characterized in that, describedly classifies to the atom in dictionary matrix, and the step being configured to dictionary tree is specially:
Adopt K means clustering algorithm to classify to the atom in dictionary matrix, obtain several category nodes, several category nodes described are configured to dictionary tree.
7. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 6, it is characterized in that, the data of described category node comprise: the atom number that class central atom, category node comprise, atom columns, pointer data in described dictionary matrix.
8. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 7, it is characterized in that, described category node comprises: intermediate node and leaf node.
9. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 8, it is characterized in that, described dictionary array also comprises: the first array being used for preserving class central atom data in intermediate node, the second array being used for depositing the left sibling index value data of intermediate node, be used for depositing the index value data of leaf node the 3rd array, be used for depositing four array of the atom in leaf node in the position data of described dictionary matrix.
10. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 1, is characterized in that, the data in described signal matrix are low resolution characteristic signal; Atom in described dictionary matrix is high-resolution features signal.
CN201410524734.XA 2014-10-08 2014-10-08 Graphics processing unit-based sparse representation fast calculation method Pending CN104318522A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410524734.XA CN104318522A (en) 2014-10-08 2014-10-08 Graphics processing unit-based sparse representation fast calculation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410524734.XA CN104318522A (en) 2014-10-08 2014-10-08 Graphics processing unit-based sparse representation fast calculation method

Publications (1)

Publication Number Publication Date
CN104318522A true CN104318522A (en) 2015-01-28

Family

ID=52373748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410524734.XA Pending CN104318522A (en) 2014-10-08 2014-10-08 Graphics processing unit-based sparse representation fast calculation method

Country Status (1)

Country Link
CN (1) CN104318522A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078226A1 (en) * 2009-09-30 2011-03-31 International Business Machines Corporation Sparse Matrix-Vector Multiplication on Graphics Processor Units
CN102750262A (en) * 2012-06-26 2012-10-24 清华大学 Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm
CN103853835A (en) * 2014-03-14 2014-06-11 西安电子科技大学 GPU (graphic processing unit) acceleration-based network community detection method
CN104063714A (en) * 2014-07-20 2014-09-24 詹曙 Fast human face recognition algorithm used for video monitoring and based on CUDA parallel computing and sparse representing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078226A1 (en) * 2009-09-30 2011-03-31 International Business Machines Corporation Sparse Matrix-Vector Multiplication on Graphics Processor Units
CN102750262A (en) * 2012-06-26 2012-10-24 清华大学 Method for realizing sparse signal recovery on CPU (Central Processing Unit) based on OMP (Orthogonal Matching Pursuit) algorithm
CN103853835A (en) * 2014-03-14 2014-06-11 西安电子科技大学 GPU (graphic processing unit) acceleration-based network community detection method
CN104063714A (en) * 2014-07-20 2014-09-24 詹曙 Fast human face recognition algorithm used for video monitoring and based on CUDA parallel computing and sparse representing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
吴炜: "《基于学习的图像增强技术》", 28 February 2013 *
汪洋: "若干分类字典下形态分量分析算法与图像修补应用研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
陈湘骥等: "基于 GPU 加速的实时视频超分辨率重建", 《计算机应用》 *

Similar Documents

Publication Publication Date Title
CN113822284B (en) RGBD image semantic segmentation method based on boundary attention
CN101841730A (en) Real-time stereoscopic vision implementation method based on FPGA
WO2019085709A1 (en) Pooling method and system applied to convolutional neural network
CN108205703B (en) Multi-input multi-output matrix average value pooling vectorization implementation method
CN112257844B (en) Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
CN111860398A (en) Remote sensing image target detection method and system and terminal equipment
CN111814626A (en) Dynamic gesture recognition method and system based on self-attention mechanism
CN103369326B (en) Be suitable to the transform coder of high-performance video coding standard HEVC
CN111210016A (en) Pruning a neural network containing element-level operations
CN110598673A (en) Remote sensing image road extraction method based on residual error network
US20220129523A1 (en) Method, circuit, and soc for performing matrix multiplication operation
CN104850533A (en) Constrained nonnegative matrix decomposing method and solving method
US11874898B2 (en) Streaming-based artificial intelligence convolution processing method and apparatus, readable storage medium and terminal
CN109416743A (en) A kind of Three dimensional convolution device artificially acted for identification
Gao et al. Multi-branch aware module with channel shuffle pixel-wise attention for lightweight image super-resolution
CN104318522A (en) Graphics processing unit-based sparse representation fast calculation method
Ying et al. Multi-directional broad learning system for the unsupervised stereo matching method
Xu et al. Design and implementation of an efficient CNN accelerator for low-cost FPGAs
CN104992425A (en) DEM super-resolution method based on GPU acceleration
CN104683817B (en) Parallel transformation and inverse transform method based on AVS
CN113705784A (en) Neural network weight coding method based on matrix sharing and hardware system
CN106454382A (en) Quantum image preparation method
Yanbiao et al. Flower recognition based on an improved convolutional neural network mobilenetv3
Zhang et al. Yolov3-tiny Object Detection SoC Based on FPGA Platform
Yu et al. A Low-Latency Framework With Algorithm-Hardware Co-Optimization for 3-D Point Cloud

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150128

RJ01 Rejection of invention patent application after publication