CN104851081A

CN104851081A - GPU-based parallel Laplacian image sharpening method

Info

Publication number: CN104851081A
Application number: CN201510248951.5A
Authority: CN
Inventors: 马廷淮; 李璐; 郑钰辉; 田伟; 王兴; 苗春生
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2015-05-15
Filing date: 2015-05-15
Publication date: 2015-08-19

Abstract

The invention discloses a GPU-based parallel Laplacian image sharpening method. a powerful parallel calculation ability of the GPU is used, serial processing on all pixel points is converted into parallel processing, time complexity is greatly reduced, a reasonable pixel point access strategy is designed in combination with a Laplacian calculation process and features of shared memory on the basis, and the algorithm execution efficiency is further improved.

Description

A kind of parallel laplacian image sharpening method based on GPU

Technical field

The present invention relates to a kind of image processing method, particularly relate to a kind of parallel laplacian image sharpening method based on GPU, belong to image processing field.

Background technology

Along with the appearance of various application, all there is the mass data trend sharply expanded in every field.In digital image processing field, along with the resolution of image improves constantly, its data volume comprised sharply expands, and new image processing algorithm constantly proposes, and these factors make the calculating of image procossing become increasingly complex.In this context, research large-scale image being carried out accelerating to process just seems very urgent, and GPU, as a kind of stream handle of highly-parallel, has stronger Floating-point Computation ability.Along with the universal of GPU general-purpose computations and the release of CUDA, GPU is no longer confined to traditional image rendering task, also plays more and more important effect in general-purpose computations.At present, main flow GPU have employed unified shader unit, and by means of powerful programmable flow processor quantity, GPU establishes huge advantage relative to CPU in single-precision floating point computing.CUDA is its threading model and memory organization most importantly, and thread process element, some threads calculate simultaneously.These some threads are with the form tissue of thread, thread block, thread grid, and a thread block comprises some threads, and a thread grid comprises some thread block, does not interfere with each other between thread, and the thread in a thread block can communicate.CUDA comprises several internal memory, as global memory, shared drive, texture memory, constant internal memory etc., features different is separately had in various, as maximum in global memory's capacity, speed is the slowest, shared drive speed is faster than global memory, but capacity is less than global memory, and the thread in each block can only access the shared drive that this block has.

Image sharpening is exactly details and the profile of strengthening scenery in image, makes image become more clear.Laplace operator is the second order business operator of a picture engraving gray scale, it is the simplest isotropy differentiating operator, there is rotational invariance, relatively be applicable to improve because the diffuse reflection of light cause image blurring in traditional laplacian spectral radius, need to process one by one pixel, and there is no correlativity between them, there is not sequencing, such feature makes laplacian spectral radius be particularly suitable for GPU parallelization, greatly improves its sharpening efficiency.

Existing some researchers are devoted to study the image procossing acceleration field based on GPU at present.Li Yingmin carries out data parallelism analysis to 3 d medical images homogeneity filtering algorithm consuming time, and optimize storage is accessed, and achieves the parallel accelerate of Canny operator.Data parallelism analysis is carried out to Canny operator, all selects suitable parallel optimization strategy for each step, achieve the acceleration that program is overall to greatest extent.Feng Huang have employed decimation in frequency algorithm on GPU, achieves Fast Fourier Transform (FFT) fft algorithm, achieve the parallel algorithm of the convolution algorithm in spatial domain simultaneously, its image adopted is all gray level image, compare in the adaptability in scan picture, FFT and convolution algorithm realized with regard to performance and GPU, no matter result display is convolution or FFT, and the performance on GPU is better than it all far away and realizes version on CPU.Zhang Wei etc. for obscurity boundary after low definition photo or Nonlinear magnify, image quality difference and people to the actual demand of high-definition image, the image proposing a double-layer structure based on CUDA walks abreast sharpening method, ground floor adopts parallel linear method of interpolation, repeatedly the non-boundary member of image is calculated and fringe region Edge contrast, the second layer adopts the gradient method improved to optimize further image, this algorithm is all better than algorithm popular at present in efficiency and image quality, and the method for proposition can be applicable to conventional images and photo amplifies aftertreatment.

Summary of the invention

Technical matters to be solved by this invention provides a kind of parallel laplacian image sharpening method based on GPU for the deficiency of background technology.

The present invention is for solving the problems of the technologies described above by the following technical solutions

Based on a parallel laplacian image sharpening method of GPU, specifically comprise the following steps:

Step 1), input pending image;

Step 2), obtain the gray-scale value of each pixel of pending image;

Step 3), definition graphic process unit GPU holds parameter, definition thread block and thread grid;

Step 4), gray value data step 2 obtained is sent to GPU end from CPU end;

Step 5), definition laplacian spectral radius masterplate;

Step 6), be each thread computes index;

Step 7), every bar thread according to the formula of laplacian spectral radius recalculate its gray-scale value of pixel be responsible for, and be stored in new array;

Step 8), by step 7) in the gray-scale value of pixel that recalculates pass CPU end back, and then draw the image after sharpening.

As the further preferred version of parallel laplacian image sharpening method that the present invention is based on GPU, in step 2) in, obtain each pixel gray-scale value according to row major order, and be stored in one-dimension array.

As the further preferred version of parallel laplacian image sharpening method that the present invention is based on GPU, in step 3) in, thread block and multithreaded network are all defined as one dimension form.

As the further preferred version of parallel laplacian image sharpening method that the present invention is based on GPU, in step 4) in, by cudaMemcpy function, the gray-scale value of pixel is transferred to GPU end from CPU end.

As the further preferred version of parallel laplacian image sharpening method that the present invention is based on GPU, in step 6) in, a thread is once responsible for the gray-scale value of a pixel, the parallel computation simultaneously of all threads.

As the further preferred version of parallel laplacian image sharpening method that the present invention is based on GPU, in step 7) in, adopt shared drive to carry out the reading of the gray-scale value of pixel, and then complete sharpening and calculate.

The present invention adopts above technical scheme compared with prior art, has following technique effect:

1, first the present invention reads original image, the gray-scale value of each pixel is read and is stored in a certain array, the gray-scale value array of pixel is reached GPU end, sharpening computation process is performed at GPU end, when carrying out sharpening and calculating, owing to needing to read some pixel gray-scale values around to the renewal of a pixel gray-scale value, according to the access order of pixel and rule, make full use of access speed shared drive faster;

2, the present invention's computation capability of utilizing GPU powerful, parallel processing is changed into by the serial processing of all pixels, time complexity reduces greatly, and on this basis, in conjunction with the pixel access strategy that the feature of Laplce's computation process and shared drive is reasonable in design, improve algorithm execution efficiency further.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the parallel laplacian image sharpening algorithm based on GPU;

Two kinds of block dividing mode schematic diagram of Fig. 2 to be size be picture of 8*8;

Fig. 3 is schematic diagram continuous print pixel being divided into a thread block;

Fig. 4 is the schematic diagram discontinuous pixel being divided into a thread block;

Fig. 5 is that the block of the more image of pixel divides effect schematic diagram.

Embodiment

Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:

Step 1), input pending image;

Step 2), obtain the gray-scale value of each pixel of pending image;

Step 4), gray value data step 2 obtained is sent to GPU end from CPU end;

Step 5), definition laplacian spectral radius masterplate;

Step 6), be each thread computes index;

Wherein, the parallel laplacian image sharpening algorithm based on GPU of the present invention, step 3) in, need the parameter defining GPU end, and be GPU end parametric distribution space.Rule of thumb, when comprising 256 or 512 threads in a block, counting yield is higher, in the present invention thread block and thread grid is all defined as one dimension form, can design different thread block sizes according to picture size.

Parallel laplacian image sharpening algorithm based on GPU of the present invention, step 4) in, use cudaMemcpy (A, B, C, cudaMemcpyHostToDevice) function by gray-scale value array from CPU end transfer to GPU end, wherein, A is that GPU holds parameter, B is that CPU holds parameter, the size of C representative transmission array, the transmission direction of cudaMemcpyHostToDevice representative data, Host end refers to that CPU holds, and Device refers to that GPU holds.

Parallel laplacian image sharpening algorithm based on GPU of the present invention, step 7) in, for improving computing velocity, operating speed is shared drive faster, it is different according to affiliated thread block that each thread block defines its each thread of shared drive array, the gray-scale value be stored in global memory's array of its correspondence is read in shared drive array, and subsequently, each bar thread reads required gray-scale value to carry out sharpening calculating from shared drive array;

Parallel laplacian image sharpening algorithm based on GPU of the present invention, step 8) in, use cudaMemcpy (A, B, C, cudaMemcpyToDeviceHost), now A is the array that CPU end deposits result gray-scale value, and B is the array that GPU end deposits end value, the transmission direction of cudaMemcpyDeviceToHost representative data, here represent and be transferred to CPU end from GPU end, be namely copied to B from A.

In conjunction with process flow diagram and case study on implementation, the parallel laplacian image sharpening algorithm based on GPU of the present invention is described in further detail.

The implementation case adopts GPU to improve laplacian spectral radius algorithm, and then improves the performance that algorithm carries out sharpening.As shown in Figure 1, this method comprises following steps:

Step 10, input needs image to be processed, and the original-gray image of input is .jpg form.

Step 20, reads each grey scale pixel value successively according to row major order, is stored in one-dimension array p, if image is N*M pixel, so the size of array p is N*M.

Step 30, need the parameter d ev_p defining GPU end, dev_q, dev_p and p is corresponding, dev_q is then used for depositing the later gray-scale value of sharpening, and uses cudaMalloc () function to be that GPU holds parametric distribution space, as being dev_p allocation space cudaMalloc (& dev_p, N*M*sizeof (int)), in bracket, second value represents the size of allocation space.Rule of thumb, when comprising 256 or 512 threads in a block, counting yield is the highest, in the present invention thread block and thread grid is all defined as one dimension form, to arrange 256 threads in each thread block, (N*M+255)/256 thread block is set so altogether.

Step 40, use cudaMemcpy (dev_p, p, N*M*sizeof (int), cudaMemcpyHostToDevice) gray-scale value array is transferred to GPU end from CPU end by function, wherein, dev_p is that GPU holds parameter, p is that CPU holds parameter, the size of N*M*sizeof (int) representative transmission array, the transmission direction of cudaMemcpyHostToDevice representative data, Host end refers to that CPU holds, Device refers to that GPU holds, and HostToDevice shows that data reach dev_p by p.

Step 50, the laplacian spectral radius template that the present invention uses is {-1 ,-1 ,-1 ,-1,8 ,-1 ,-1 ,-1 ,-1}.

Step 60, for each thread computes goes out its sequence number in whole thread grid:

Tid=threadIdx.x+blockIdx.x*blockDim.x, because thread block and thread grid are all one dimension forms, so wherein threadIdx.x is the call number of thread in block, blockIdx.x is the sequence number of block, blockDim.x is the size of block, tid is the sequence number of thread in whole grid, and tid can so that the reading of internal poke group, because the pixel value that all threads are corresponding in overall array is all Coutinuous store.

Step 70, when performing sharpening and calculating,

Step 701, each thread block defines its shared drive array shared_p and the gray-scale value be stored in global memory dev_p corresponding for each pixel is read in shared_p array, definition statement is _ _ shared__intshared_p [blockDim.x], and here, array size is the size of block.Fig. 2-5 describes block division methods, for the picture of Fig. 2 8*8, suppose that the size of block is 4*4=16, if now according to the storage mode of row major, so corresponding in Fig. 3 16 contiguous pixels being positioned at Fig. 1 dash area will be stored into shared_p, owing to needing to use the gray-scale value of himself and around 8 pixels to the renewal of a pixel value, and these values are share in block, so now cannot upgrade these 16 pixels any one.And if according to Fig. 4,16 the discontinuous pixels being arranged in dot-and-dash line frame in Fig. 1 are divided into a block, then can upgrade and be numbered 9,10,17, the grey scale pixel value of 18, when the size of block becomes large, renewable pixel count can increase, as shown in Figure 5, black circle is all computable pixel value, and the length of side corresponding to the dividing mode of Fig. 4 is designated as blockside, as blockside value is 4 in the diagram.

Step 702, subsequently, each bar thread reads required gray-scale value to carry out sharpening calculating from shared_p, when calculating new gray-scale value, first according to formula below by certain pixel and around it 8 neighbor pixels be multiplied in the value of masterplate correspondence position and sue for peace, calculate temporary variable p1:

p1＝-shared_p[tx-blockside-1]-shared_p[tx-blockside]-shared_p[tx-blockside+1]

-shared_p[tx-1]+8*shared_p[tx]-shared_p[tx+1]

-shared_p[tx+blockside-1]-shared_p[tx+blockside]-shared_p[tx+blockside+1]

Wherein, tx=threadIdx.x, then, the gray-scale value after upgrading according to formulae discovery below, first formula shows that need to add former gray-scale value with p1, latter two formula then shows that gray-scale value is in [0,255] interval when masterplate center is timing.

dev_q[tid]＝p1+shared_p[tx]；dev_q[tid]＝dev_q[tid]>255？255:dev_q[tid]；

dev_q[tid]＝dev_q[tid]<0？0:dev_q[tid]；

Step 80, use cudaMemcpy (q, dev_q, N*M*sizeof (int), cudaMemcpyDeviceToHost), q is the array that CPU end deposits result gray-scale value, and dev_q is the array that GPU end deposits end value, the transmission direction of cudaMemcpyDeviceToHost representative data.

Above-described specific embodiments; further detailed description has been carried out to object of the present invention, technical scheme and beneficial effect; be understood that; the foregoing is only specific embodiment of the invention scheme; and be not used to limit scope of the present invention; any those skilled in the art, the equivalent variations made under the prerequisite not departing from design of the present invention and principle and amendment, all should belong to the scope of protection of the invention.

Claims

1., based on a parallel laplacian image sharpening method of GPU, it is characterized in that, specifically comprise the following steps:

Step 1), inputs pending image;

Step 2), obtain the gray-scale value of each pixel of pending image;

Step 4), gray value data step 2 obtained is sent to GPU end from CPU end;

Step 5), definition laplacian spectral radius masterplate;

Step 6) is each thread computes index;

Step 8), passes CPU end back, and then draws the image after sharpening by the gray-scale value of the pixel recalculated in step 7).

2., according to claim 1 based on the parallel laplacian image sharpening method of GPU, it is characterized in that: in step 2) in, obtain each pixel gray-scale value according to row major order, and be stored in one-dimension array.

3., according to claim 1 based on the parallel laplacian image sharpening method of GPU, it is characterized in that: in step 3), thread block and multithreaded network are all defined as one dimension form.

4. according to claim 1 based on the parallel laplacian image sharpening method of GPU, it is characterized in that: in step 4), by cudaMemcpy function, the gray-scale value of pixel is transferred to GPU end from CPU end.

5. according to claim 1 based on the parallel laplacian image sharpening method of GPU, it is characterized in that: in step 6), a thread is once responsible for the gray-scale value of a pixel, the parallel computation simultaneously of all threads.

6. according to claim 1 based on the parallel laplacian image sharpening method of GPU, it is characterized in that: in step 7), adopt shared drive to carry out the reading of the gray-scale value of pixel, and then complete sharpening calculating.