CN104318522A

CN104318522A - Graphics processing unit-based sparse representation fast calculation method

Info

Publication number: CN104318522A
Application number: CN201410524734.XA
Authority: CN
Inventors: 田岩
Original assignee: SUZHOU NEW VISION CULTURE TECHNOLOGY DEVELOPMENT Co Ltd
Current assignee: SUZHOU NEW VISION CULTURE TECHNOLOGY DEVELOPMENT Co Ltd
Priority date: 2014-10-08
Filing date: 2014-10-08
Publication date: 2015-01-28

Abstract

The invention relates to the technical field of super-resolution reconstruction of video images, and particularly relates to a graphics processing unit-based sparse representation fast calculation method. The method comprises steps: atoms in a dictionary matrix are classified to form a dictionary tree, and the dictionary tree is adjusted to obtain a continuous dictionary array; the dictionary array and a to-be-processed signal matrix are sent to the graphics processing unit; the signal matrix is internally divided into a plurality of parallel units, wherein the parallel units comprise a plurality of feature vectors; in the graphics processing unit, one thread is built for each feature vector; in each thread, the feature vector is traversed to obtain sparse representation of the feature vector corresponding to the thread. As the plurality of feature vectors are processed at the same time via the parallel units, the parallel calculation ability of hardware is effectively used, the speed of extracting the graphics features can be improved, and the processing process of converting a standard-definition graphics into a high-definition graphics is quickened.

Description

A kind of rarefaction representation quick calculation method of graphic based processing unit

Technical field

The present invention relates to the super-resolution rebuilding technical field of video image, particularly relate to a kind of rarefaction representation quick calculation method of graphic based processing unit.

Background technology

Image Super-resolution technology can be divided into single-frame images super-resolution technique and multiple image super-resolution technique, and the former only needs the piece image in scene, and the latter needs the video sequence of same scene.

Super-resolution rebuilding is one and obtains the process of more high-resolution video image by utilizing complementary information in the interframe of low-resolution video image or frame, in recent years, a lot of achievement in research is created both at home and abroad in super-resolution rebuilding research, from the angle of research methodology, can be divided into based on the method for interpolation, based on the method for rebuilding and the method based on study.Regardless of which kind of method, all relate to a large amount of matrix operations, therefore, reach real-time, parallel processing must be used accelerate.

In recent decades, the development of CPU reduces the cost of computing machine, and the lifting of adding performance facilitates the development of whole computer industry.But last decade comes, at hardware aspect by the impact of power consumption with heat dissipation problem, the arithmetic capability of single cpu is promoted be restricted, therefore the parallel computation application of multi-core CPU receives publicity, but the computation capability of multi-core CPU is still difficult to meet present society to large data and the requirement such as high definition, real time data processing.

Summary of the invention

The object of the invention is to the rarefaction representation quick calculation method proposing a kind of graphic based processing unit, effectively utilize the computation capability of hardware, improve the speed of image characteristics extraction.

For reaching this object, the present invention by the following technical solutions:

A rarefaction representation quick calculation method for graphic based processing unit, comprising:

Step 110, classify to the atom in dictionary matrix, structure dictionary tree, adjusts described dictionary tree, obtains continuous print dictionary array;

Step 120, described dictionary array and pending signal matrix are sent in Graphics Processing Unit;

Step 130, be split as several Parallel Unit by described signal matrix, described Parallel Unit comprises several proper vectors; In Graphics Processing Unit, for proper vector described in each sets up a thread;

In step 140, thread described in each, travel through described dictionary array, obtain the rarefaction representation of described thread characteristic of correspondence vector.

Wherein, described step 140 comprises:

In step 141, thread described in each, give initial residual error by described proper vector assignment;

Step 142, travel through described dictionary array, from described dictionary array, select an atom mated most with described initial residual error, build a sparse bayesian learning;

Step 143, obtain the residual error of described sparse bayesian learning and described proper vector, judge whether described sparse bayesian learning meets the condition preset; If so, then the rarefaction representation of described proper vector is made up of the linear and described residual error of the atom selected from dictionary array; Otherwise, give described initial residual error by this residual error assignment, return step 142.

Wherein, described default condition is:

The iterations of described sparse bayesian learning reaches default number of times; Or,

The error of described residual error is less than default threshold value.

Wherein, after described step 142, also comprise before step 143:

Step 1421, orthogonalization process is carried out to the described atom mated most.

Wherein, described Parallel Unit is corresponding with the columns of described signal matrix.

Wherein, describedly to classify to the atom in dictionary matrix, the step being configured to dictionary tree is specially:

Adopt K means clustering algorithm to classify to the atom in dictionary matrix, obtain several category nodes, several category nodes described are configured to dictionary tree.

Wherein, the data of described category node comprise: the atom number that class central atom, category node comprise, atom columns, pointer data in described dictionary matrix.

Wherein, described category node comprises: intermediate node and leaf node.

Wherein, described dictionary array also comprises: the first array being used for preserving class central atom data in intermediate node, the second array being used for depositing the left sibling index value data of intermediate node, be used for depositing the index value data of leaf node the 3rd array, be used for depositing four array of the atom in leaf node in the position data of described dictionary matrix.

Wherein, the data in described signal matrix are low resolution characteristic signal; Atom in described dictionary matrix is high-resolution features signal.

Beneficial effect of the present invention is: a kind of rarefaction representation quick calculation method of graphic based processing unit, comprising: classify to the atom in dictionary matrix, and structure dictionary tree, adjusts described dictionary tree, obtain continuous print dictionary array; Described dictionary array and pending signal matrix are sent in Graphics Processing Unit; Be split as several Parallel Unit by described signal matrix, described Parallel Unit comprises several proper vectors; In Graphics Processing Unit, for proper vector described in each sets up a thread; In thread described in each, travel through described dictionary array, obtain the rarefaction representation of described thread characteristic of correspondence vector, the present invention is processed multiple proper vector by Parallel Unit simultaneously, effectively utilize the computation capability of hardware, improve the speed of image characteristics extraction, thus accelerate the processing procedure that SD image turns high-definition image.

Accompanying drawing explanation

Fig. 1 is the rarefaction representation quick calculation method process flow diagram of a kind of graphic based processing unit that the specific embodiment of the invention provides.

Embodiment

Technical scheme of the present invention is further illustrated by embodiment below in conjunction with Fig. 1.

Step 120, described dictionary array and pending signal matrix be sent in Graphics Processing Unit (Graphic Processing Unit is called for short GPU);

In the present embodiment, by Parallel Unit, multiple proper vector is processed simultaneously, effectively utilize the computation capability of hardware, improve the speed of image characteristics extraction, thus accelerate the processing procedure that SD image turns high-definition image.

In the present embodiment, the data structure more complicated of dictionary tree, is not suitable for the fast processing of GPU, and GPU can only be fairly simple for data type, and memory headroom continuous print data, therefore need dictionary tree to adjust, obtain continuous print dictionary array.

In the present embodiment, described step 140 comprises:

In the present embodiment, described default condition is:

The error of described residual error is less than default threshold value.

In the present embodiment, after described step 142, also comprise before step 143:

Step 1421, orthogonalization process is carried out to the described atom mated most, to make convergence of algorithm speed faster.

In the present embodiment, described Parallel Unit is corresponding with the columns of described signal matrix.

In the present embodiment, in graphics processing unit hardware system each two-dimentional thread block comprise thread quantity be designated as: [T _x, T _y], the size of described signal matrix is designated as: mmn, then the number of the thread block of each Parallel Unit needs establishment is: expression rounds up, the corresponding thread of each proper vector of signal matrix.

In the present embodiment, describedly to classify to the atom in dictionary matrix, the step being configured to dictionary tree is specially:

In the present embodiment, the data of described category node comprise: the atom number that class central atom, category node comprise, atom columns, pointer data in described dictionary matrix.

In the present embodiment, described category node comprises: intermediate node and leaf node.

In the present embodiment, described dictionary array also comprises: the first array being used for preserving class central atom data in intermediate node, the second array being used for depositing the left sibling index value data of intermediate node, be used for depositing the index value data of leaf node the 3rd array, be used for depositing four array of the atom in leaf node in the position data of described dictionary matrix.

In the present embodiment, suppose that the upper limit of class central atom number is T, traversal dictionary array C, search wherein with the proper vector y also i.e. immediate atom of initial residual error r, be assumed to be i, if the left sibling index value CI [i] of this atom is non-vanishing, then represent that atom i is intermediate node, there is child node, continuation search index is the atom of CI [i] to CI [i]+3; If CI [i] is zero, then represent that atom i is leaf node, search terminates.Now, the index value of this leaf node is LI [i], it is numbered p, then the columns of this atom in dictionary matrix is LOI [(T+1) * p], the dictionary index of its correspondence is LOI [(T+1) * p+1]-LOI [(T+1) * p+T], finally proper vector y is once made inner product with these atoms, find out inner product the maximum, build a sparse bayesian learning.

In the present embodiment, the data in described signal matrix are low resolution characteristic signal, atom in described dictionary matrix is high-resolution features signal, the present invention finds corresponding rarefaction representation atom and weighting coefficient thereof in low resolution characteristics dictionary, then using the weighted array of the high-resolution features dictionary atom of correspondence as output, traditional account form tries to achieve its rarefaction representation atom and weighting coefficient to each row of matrix successively, and when the parallel processing of GPU, main frame is each row distribution thread of matrix, each thread computes rarefaction representation atom and weighting coefficient, multiple thread processes simultaneously, this efficiency of calculating that will greatly improve.

In the present embodiment, the double-core CPU of Pentium (R) Dual-Core CPU E6500@2.93GHz is furnished with one, the PC of a NVIDIA GeForce GT640 (1G video memory) is experiment test platform, and test result is as shown in the table:

Table 1

Degree of rarefication	CPU realizes	GPU realizes	Speed-up ratio
				8	7672.421ms	1674.776ms	4.58
12	11914.141ms	1749.547ms	6.81
				16	17131.137ms	1759.980ms	9.73

In Table 1, the proper vector number of rarefaction representation is 515404, and the dimension of single proper vector is 26, and as shown in Table 1, adopt the rarefaction representation computing method in the present invention, compared with prior art, it is less consuming time, and average speedup is more than 5.

Table 2

Degree of rarefication	CPU realizes	GPU realizes	Speed-up ratio
				8	1576.592ms	353.123ms	4.46
12	2619.522ms	361.147ms	7.25
				16	3763.591ms	376.842ms	9.99

In table 2, the proper vector number of rarefaction representation is 110564, and the dimension of single proper vector is 26, and as shown in Table 2, adopt the rarefaction representation computing method in the present invention, compared with prior art, it is less consuming time, and average speedup is more than 5.

The foregoing is only the specific embodiment of the present invention, these describe just in order to explain principle of the present invention, and can not be interpreted as limiting the scope of the invention by any way.Based on explanation herein, those skilled in the art does not need to pay performing creative labour can associate other specific implementation method of the present invention, and these modes all will fall within protection scope of the present invention.

Claims

1. a rarefaction representation quick calculation method for graphic based processing unit, is characterized in that, comprising:

2. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 1, it is characterized in that, described step 140 comprises:

3. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 2, is characterized in that, described default condition is:

The error of described residual error is less than default threshold value.

4. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 2, is characterized in that, after described step 142, also comprises before step 143:

5. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 1, is characterized in that, described Parallel Unit corresponding with the columns of described signal matrix.

6. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 1, is characterized in that, describedly classifies to the atom in dictionary matrix, and the step being configured to dictionary tree is specially:

7. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 6, it is characterized in that, the data of described category node comprise: the atom number that class central atom, category node comprise, atom columns, pointer data in described dictionary matrix.

8. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 7, it is characterized in that, described category node comprises: intermediate node and leaf node.

9. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 8, it is characterized in that, described dictionary array also comprises: the first array being used for preserving class central atom data in intermediate node, the second array being used for depositing the left sibling index value data of intermediate node, be used for depositing the index value data of leaf node the 3rd array, be used for depositing four array of the atom in leaf node in the position data of described dictionary matrix.

10. the rarefaction representation quick calculation method of a kind of graphic based processing unit according to claim 1, is characterized in that, the data in described signal matrix are low resolution characteristic signal; Atom in described dictionary matrix is high-resolution features signal.