CN104851081A - GPU-based parallel Laplacian image sharpening method - Google Patents

GPU-based parallel Laplacian image sharpening method Download PDF

Info

Publication number
CN104851081A
CN104851081A CN201510248951.5A CN201510248951A CN104851081A CN 104851081 A CN104851081 A CN 104851081A CN 201510248951 A CN201510248951 A CN 201510248951A CN 104851081 A CN104851081 A CN 104851081A
Authority
CN
China
Prior art keywords
gpu
gray
pixel
parallel
scale value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510248951.5A
Other languages
Chinese (zh)
Inventor
马廷淮
李璐
郑钰辉
田伟
王兴
苗春生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201510248951.5A priority Critical patent/CN104851081A/en
Publication of CN104851081A publication Critical patent/CN104851081A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Processing (AREA)
  • Facsimile Image Signal Circuits (AREA)

Abstract

The invention discloses a GPU-based parallel Laplacian image sharpening method. a powerful parallel calculation ability of the GPU is used, serial processing on all pixel points is converted into parallel processing, time complexity is greatly reduced, a reasonable pixel point access strategy is designed in combination with a Laplacian calculation process and features of shared memory on the basis, and the algorithm execution efficiency is further improved.

Description

A kind of parallel laplacian image sharpening method based on GPU
Technical field
The present invention relates to a kind of image processing method, particularly relate to a kind of parallel laplacian image sharpening method based on GPU, belong to image processing field.
Background technology
Along with the appearance of various application, all there is the mass data trend sharply expanded in every field.In digital image processing field, along with the resolution of image improves constantly, its data volume comprised sharply expands, and new image processing algorithm constantly proposes, and these factors make the calculating of image procossing become increasingly complex.In this context, research large-scale image being carried out accelerating to process just seems very urgent, and GPU, as a kind of stream handle of highly-parallel, has stronger Floating-point Computation ability.Along with the universal of GPU general-purpose computations and the release of CUDA, GPU is no longer confined to traditional image rendering task, also plays more and more important effect in general-purpose computations.At present, main flow GPU have employed unified shader unit, and by means of powerful programmable flow processor quantity, GPU establishes huge advantage relative to CPU in single-precision floating point computing.CUDA is its threading model and memory organization most importantly, and thread process element, some threads calculate simultaneously.These some threads are with the form tissue of thread, thread block, thread grid, and a thread block comprises some threads, and a thread grid comprises some thread block, does not interfere with each other between thread, and the thread in a thread block can communicate.CUDA comprises several internal memory, as global memory, shared drive, texture memory, constant internal memory etc., features different is separately had in various, as maximum in global memory's capacity, speed is the slowest, shared drive speed is faster than global memory, but capacity is less than global memory, and the thread in each block can only access the shared drive that this block has.
Image sharpening is exactly details and the profile of strengthening scenery in image, makes image become more clear.Laplace operator is the second order business operator of a picture engraving gray scale, it is the simplest isotropy differentiating operator, there is rotational invariance, relatively be applicable to improve because the diffuse reflection of light cause image blurring in traditional laplacian spectral radius, need to process one by one pixel, and there is no correlativity between them, there is not sequencing, such feature makes laplacian spectral radius be particularly suitable for GPU parallelization, greatly improves its sharpening efficiency.
Existing some researchers are devoted to study the image procossing acceleration field based on GPU at present.Li Yingmin carries out data parallelism analysis to 3 d medical images homogeneity filtering algorithm consuming time, and optimize storage is accessed, and achieves the parallel accelerate of Canny operator.Data parallelism analysis is carried out to Canny operator, all selects suitable parallel optimization strategy for each step, achieve the acceleration that program is overall to greatest extent.Feng Huang have employed decimation in frequency algorithm on GPU, achieves Fast Fourier Transform (FFT) fft algorithm, achieve the parallel algorithm of the convolution algorithm in spatial domain simultaneously, its image adopted is all gray level image, compare in the adaptability in scan picture, FFT and convolution algorithm realized with regard to performance and GPU, no matter result display is convolution or FFT, and the performance on GPU is better than it all far away and realizes version on CPU.Zhang Wei etc. for obscurity boundary after low definition photo or Nonlinear magnify, image quality difference and people to the actual demand of high-definition image, the image proposing a double-layer structure based on CUDA walks abreast sharpening method, ground floor adopts parallel linear method of interpolation, repeatedly the non-boundary member of image is calculated and fringe region Edge contrast, the second layer adopts the gradient method improved to optimize further image, this algorithm is all better than algorithm popular at present in efficiency and image quality, and the method for proposition can be applicable to conventional images and photo amplifies aftertreatment.
Summary of the invention
Technical matters to be solved by this invention provides a kind of parallel laplacian image sharpening method based on GPU for the deficiency of background technology.
The present invention is for solving the problems of the technologies described above by the following technical solutions
Based on a parallel laplacian image sharpening method of GPU, specifically comprise the following steps:
Step 1), input pending image;
Step 2), obtain the gray-scale value of each pixel of pending image;
Step 3), definition graphic process unit GPU holds parameter, definition thread block and thread grid;
Step 4), gray value data step 2 obtained is sent to GPU end from CPU end;
Step 5), definition laplacian spectral radius masterplate;
Step 6), be each thread computes index;
Step 7), every bar thread according to the formula of laplacian spectral radius recalculate its gray-scale value of pixel be responsible for, and be stored in new array;
Step 8), by step 7) in the gray-scale value of pixel that recalculates pass CPU end back, and then draw the image after sharpening.
As the further preferred version of parallel laplacian image sharpening method that the present invention is based on GPU, in step 2) in, obtain each pixel gray-scale value according to row major order, and be stored in one-dimension array.
As the further preferred version of parallel laplacian image sharpening method that the present invention is based on GPU, in step 3) in, thread block and multithreaded network are all defined as one dimension form.
As the further preferred version of parallel laplacian image sharpening method that the present invention is based on GPU, in step 4) in, by cudaMemcpy function, the gray-scale value of pixel is transferred to GPU end from CPU end.
As the further preferred version of parallel laplacian image sharpening method that the present invention is based on GPU, in step 6) in, a thread is once responsible for the gray-scale value of a pixel, the parallel computation simultaneously of all threads.
As the further preferred version of parallel laplacian image sharpening method that the present invention is based on GPU, in step 7) in, adopt shared drive to carry out the reading of the gray-scale value of pixel, and then complete sharpening and calculate.
The present invention adopts above technical scheme compared with prior art, has following technique effect:
1, first the present invention reads original image, the gray-scale value of each pixel is read and is stored in a certain array, the gray-scale value array of pixel is reached GPU end, sharpening computation process is performed at GPU end, when carrying out sharpening and calculating, owing to needing to read some pixel gray-scale values around to the renewal of a pixel gray-scale value, according to the access order of pixel and rule, make full use of access speed shared drive faster;
2, the present invention's computation capability of utilizing GPU powerful, parallel processing is changed into by the serial processing of all pixels, time complexity reduces greatly, and on this basis, in conjunction with the pixel access strategy that the feature of Laplce's computation process and shared drive is reasonable in design, improve algorithm execution efficiency further.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the parallel laplacian image sharpening algorithm based on GPU;
Two kinds of block dividing mode schematic diagram of Fig. 2 to be size be picture of 8*8;
Fig. 3 is schematic diagram continuous print pixel being divided into a thread block;
Fig. 4 is the schematic diagram discontinuous pixel being divided into a thread block;
Fig. 5 is that the block of the more image of pixel divides effect schematic diagram.
Embodiment
Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:
Based on a parallel laplacian image sharpening method of GPU, specifically comprise the following steps:
Step 1), input pending image;
Step 2), obtain the gray-scale value of each pixel of pending image;
Step 3), definition graphic process unit GPU holds parameter, definition thread block and thread grid;
Step 4), gray value data step 2 obtained is sent to GPU end from CPU end;
Step 5), definition laplacian spectral radius masterplate;
Step 6), be each thread computes index;
Step 7), every bar thread according to the formula of laplacian spectral radius recalculate its gray-scale value of pixel be responsible for, and be stored in new array;
Step 8), by step 7) in the gray-scale value of pixel that recalculates pass CPU end back, and then draw the image after sharpening.
Wherein, the parallel laplacian image sharpening algorithm based on GPU of the present invention, step 3) in, need the parameter defining GPU end, and be GPU end parametric distribution space.Rule of thumb, when comprising 256 or 512 threads in a block, counting yield is higher, in the present invention thread block and thread grid is all defined as one dimension form, can design different thread block sizes according to picture size.
Parallel laplacian image sharpening algorithm based on GPU of the present invention, step 4) in, use cudaMemcpy (A, B, C, cudaMemcpyHostToDevice) function by gray-scale value array from CPU end transfer to GPU end, wherein, A is that GPU holds parameter, B is that CPU holds parameter, the size of C representative transmission array, the transmission direction of cudaMemcpyHostToDevice representative data, Host end refers to that CPU holds, and Device refers to that GPU holds.
Parallel laplacian image sharpening algorithm based on GPU of the present invention, step 7) in, for improving computing velocity, operating speed is shared drive faster, it is different according to affiliated thread block that each thread block defines its each thread of shared drive array, the gray-scale value be stored in global memory's array of its correspondence is read in shared drive array, and subsequently, each bar thread reads required gray-scale value to carry out sharpening calculating from shared drive array;
Parallel laplacian image sharpening algorithm based on GPU of the present invention, step 8) in, use cudaMemcpy (A, B, C, cudaMemcpyToDeviceHost), now A is the array that CPU end deposits result gray-scale value, and B is the array that GPU end deposits end value, the transmission direction of cudaMemcpyDeviceToHost representative data, here represent and be transferred to CPU end from GPU end, be namely copied to B from A.
In conjunction with process flow diagram and case study on implementation, the parallel laplacian image sharpening algorithm based on GPU of the present invention is described in further detail.
The implementation case adopts GPU to improve laplacian spectral radius algorithm, and then improves the performance that algorithm carries out sharpening.As shown in Figure 1, this method comprises following steps:
Step 10, input needs image to be processed, and the original-gray image of input is .jpg form.
Step 20, reads each grey scale pixel value successively according to row major order, is stored in one-dimension array p, if image is N*M pixel, so the size of array p is N*M.
Step 30, need the parameter d ev_p defining GPU end, dev_q, dev_p and p is corresponding, dev_q is then used for depositing the later gray-scale value of sharpening, and uses cudaMalloc () function to be that GPU holds parametric distribution space, as being dev_p allocation space cudaMalloc (& dev_p, N*M*sizeof (int)), in bracket, second value represents the size of allocation space.Rule of thumb, when comprising 256 or 512 threads in a block, counting yield is the highest, in the present invention thread block and thread grid is all defined as one dimension form, to arrange 256 threads in each thread block, (N*M+255)/256 thread block is set so altogether.
Step 40, use cudaMemcpy (dev_p, p, N*M*sizeof (int), cudaMemcpyHostToDevice) gray-scale value array is transferred to GPU end from CPU end by function, wherein, dev_p is that GPU holds parameter, p is that CPU holds parameter, the size of N*M*sizeof (int) representative transmission array, the transmission direction of cudaMemcpyHostToDevice representative data, Host end refers to that CPU holds, Device refers to that GPU holds, and HostToDevice shows that data reach dev_p by p.
Step 50, the laplacian spectral radius template that the present invention uses is {-1 ,-1 ,-1 ,-1,8 ,-1 ,-1 ,-1 ,-1}.
Step 60, for each thread computes goes out its sequence number in whole thread grid:
Tid=threadIdx.x+blockIdx.x*blockDim.x, because thread block and thread grid are all one dimension forms, so wherein threadIdx.x is the call number of thread in block, blockIdx.x is the sequence number of block, blockDim.x is the size of block, tid is the sequence number of thread in whole grid, and tid can so that the reading of internal poke group, because the pixel value that all threads are corresponding in overall array is all Coutinuous store.
Step 70, when performing sharpening and calculating,
Step 701, each thread block defines its shared drive array shared_p and the gray-scale value be stored in global memory dev_p corresponding for each pixel is read in shared_p array, definition statement is _ _ shared__intshared_p [blockDim.x], and here, array size is the size of block.Fig. 2-5 describes block division methods, for the picture of Fig. 2 8*8, suppose that the size of block is 4*4=16, if now according to the storage mode of row major, so corresponding in Fig. 3 16 contiguous pixels being positioned at Fig. 1 dash area will be stored into shared_p, owing to needing to use the gray-scale value of himself and around 8 pixels to the renewal of a pixel value, and these values are share in block, so now cannot upgrade these 16 pixels any one.And if according to Fig. 4,16 the discontinuous pixels being arranged in dot-and-dash line frame in Fig. 1 are divided into a block, then can upgrade and be numbered 9,10,17, the grey scale pixel value of 18, when the size of block becomes large, renewable pixel count can increase, as shown in Figure 5, black circle is all computable pixel value, and the length of side corresponding to the dividing mode of Fig. 4 is designated as blockside, as blockside value is 4 in the diagram.
Step 702, subsequently, each bar thread reads required gray-scale value to carry out sharpening calculating from shared_p, when calculating new gray-scale value, first according to formula below by certain pixel and around it 8 neighbor pixels be multiplied in the value of masterplate correspondence position and sue for peace, calculate temporary variable p1:
p1=-shared_p[tx-blockside-1]-shared_p[tx-blockside]-shared_p[tx-blockside+1]
-shared_p[tx-1]+8*shared_p[tx]-shared_p[tx+1]
-shared_p[tx+blockside-1]-shared_p[tx+blockside]-shared_p[tx+blockside+1]
Wherein, tx=threadIdx.x, then, the gray-scale value after upgrading according to formulae discovery below, first formula shows that need to add former gray-scale value with p1, latter two formula then shows that gray-scale value is in [0,255] interval when masterplate center is timing.
dev_q[tid]=p1+shared_p[tx];dev_q[tid]=dev_q[tid]>255?255:dev_q[tid];
dev_q[tid]=dev_q[tid]<0?0:dev_q[tid];
Step 80, use cudaMemcpy (q, dev_q, N*M*sizeof (int), cudaMemcpyDeviceToHost), q is the array that CPU end deposits result gray-scale value, and dev_q is the array that GPU end deposits end value, the transmission direction of cudaMemcpyDeviceToHost representative data.
Above-described specific embodiments; further detailed description has been carried out to object of the present invention, technical scheme and beneficial effect; be understood that; the foregoing is only specific embodiment of the invention scheme; and be not used to limit scope of the present invention; any those skilled in the art, the equivalent variations made under the prerequisite not departing from design of the present invention and principle and amendment, all should belong to the scope of protection of the invention.

Claims (6)

1., based on a parallel laplacian image sharpening method of GPU, it is characterized in that, specifically comprise the following steps:
Step 1), inputs pending image;
Step 2), obtain the gray-scale value of each pixel of pending image;
Step 3), definition graphic process unit GPU holds parameter, definition thread block and thread grid;
Step 4), gray value data step 2 obtained is sent to GPU end from CPU end;
Step 5), definition laplacian spectral radius masterplate;
Step 6) is each thread computes index;
Step 7), every bar thread according to the formula of laplacian spectral radius recalculate its gray-scale value of pixel be responsible for, and be stored in new array;
Step 8), passes CPU end back, and then draws the image after sharpening by the gray-scale value of the pixel recalculated in step 7).
2., according to claim 1 based on the parallel laplacian image sharpening method of GPU, it is characterized in that: in step 2) in, obtain each pixel gray-scale value according to row major order, and be stored in one-dimension array.
3., according to claim 1 based on the parallel laplacian image sharpening method of GPU, it is characterized in that: in step 3), thread block and multithreaded network are all defined as one dimension form.
4. according to claim 1 based on the parallel laplacian image sharpening method of GPU, it is characterized in that: in step 4), by cudaMemcpy function, the gray-scale value of pixel is transferred to GPU end from CPU end.
5. according to claim 1 based on the parallel laplacian image sharpening method of GPU, it is characterized in that: in step 6), a thread is once responsible for the gray-scale value of a pixel, the parallel computation simultaneously of all threads.
6. according to claim 1 based on the parallel laplacian image sharpening method of GPU, it is characterized in that: in step 7), adopt shared drive to carry out the reading of the gray-scale value of pixel, and then complete sharpening calculating.
CN201510248951.5A 2015-05-15 2015-05-15 GPU-based parallel Laplacian image sharpening method Pending CN104851081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510248951.5A CN104851081A (en) 2015-05-15 2015-05-15 GPU-based parallel Laplacian image sharpening method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510248951.5A CN104851081A (en) 2015-05-15 2015-05-15 GPU-based parallel Laplacian image sharpening method

Publications (1)

Publication Number Publication Date
CN104851081A true CN104851081A (en) 2015-08-19

Family

ID=53850708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510248951.5A Pending CN104851081A (en) 2015-05-15 2015-05-15 GPU-based parallel Laplacian image sharpening method

Country Status (1)

Country Link
CN (1) CN104851081A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537085A (en) * 2018-03-07 2018-09-14 阿里巴巴集团控股有限公司 A kind of barcode scanning image-recognizing method, device and equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999046731A1 (en) * 1998-03-13 1999-09-16 The University Of Houston System Methods for performing daf data filtering and padding
CN102073982A (en) * 2011-01-10 2011-05-25 西安电子科技大学 Method for realizing acceleration of anisotropic diffusion filtration of overlarge synthetic aperture radar (SAR) image by graphic processing unit (GPU)
CN102609921A (en) * 2012-03-05 2012-07-25 天津天地伟业物联网技术有限公司 Image sharpening system and method based on laplace operator
CN102819831A (en) * 2012-08-16 2012-12-12 江南大学 Camera source evidence obtaining method based on mode noise big component
CN102999889A (en) * 2012-12-04 2013-03-27 四川虹微技术有限公司 Image noise reduction processing method for protecting remarkable edge
CN103390267A (en) * 2013-07-11 2013-11-13 华为技术有限公司 Image processing method and device
WO2014078985A1 (en) * 2012-11-20 2014-05-30 Thomson Licensing Method and apparatus for image regularization
CN104469083A (en) * 2013-09-22 2015-03-25 联咏科技股份有限公司 Image sharpening method and image processing device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999046731A1 (en) * 1998-03-13 1999-09-16 The University Of Houston System Methods for performing daf data filtering and padding
CN102073982A (en) * 2011-01-10 2011-05-25 西安电子科技大学 Method for realizing acceleration of anisotropic diffusion filtration of overlarge synthetic aperture radar (SAR) image by graphic processing unit (GPU)
CN102609921A (en) * 2012-03-05 2012-07-25 天津天地伟业物联网技术有限公司 Image sharpening system and method based on laplace operator
CN102819831A (en) * 2012-08-16 2012-12-12 江南大学 Camera source evidence obtaining method based on mode noise big component
WO2014078985A1 (en) * 2012-11-20 2014-05-30 Thomson Licensing Method and apparatus for image regularization
CN102999889A (en) * 2012-12-04 2013-03-27 四川虹微技术有限公司 Image noise reduction processing method for protecting remarkable edge
CN103390267A (en) * 2013-07-11 2013-11-13 华为技术有限公司 Image processing method and device
CN104469083A (en) * 2013-09-22 2015-03-25 联咏科技股份有限公司 Image sharpening method and image processing device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537085A (en) * 2018-03-07 2018-09-14 阿里巴巴集团控股有限公司 A kind of barcode scanning image-recognizing method, device and equipment
WO2019169965A1 (en) * 2018-03-07 2019-09-12 阿里巴巴集团控股有限公司 Code-scanning image recognition method, apparatus and device
TWI769360B (en) * 2018-03-07 2022-07-01 開曼群島商創新先進技術有限公司 A scanning code image recognition method, device and equipment

Similar Documents

Publication Publication Date Title
CN110570440A (en) Image automatic segmentation method and device based on deep learning edge detection
CN110097582B (en) Point cloud optimal registration and real-time display system and working method
CN107851088B (en) The method and system of Discrete Fourier Transform is executed to object image data
CN111275633A (en) Point cloud denoising method, system and device based on image segmentation and storage medium
CN104732490A (en) CUDA-based quick bilateral filtering method
CN103413273A (en) Method for rapidly achieving image restoration processing based on GPU
CN110298817A (en) Object statistical method, device, equipment and storage medium based on image procossing
CN108897616B (en) Non-downsampling contourlet transform optimization method based on parallel operation
CN104899840B (en) A kind of guiding filtering acceleration optimization method based on CUDA
CN104851081A (en) GPU-based parallel Laplacian image sharpening method
CN104239874B (en) A kind of organ blood vessel recognition methods and device
CN106780360A (en) Quick full variation image de-noising method based on OpenCL standards
CN104992425A (en) DEM super-resolution method based on GPU acceleration
CN113344765B (en) Frequency domain astronomical image target detection method and system
Liu et al. Image enlargement method based on cubic surfaces with local features as constraints
CN103927721A (en) Moving object edge enhancement method based on GPU
Qiu et al. Parallel fast pencil drawing generation algorithm based on GPU
Pan et al. A visibility-based surface reconstruction method on the GPU
Barina et al. Accelerating discrete wavelet transforms on GPUs
Feng et al. An improved image super resolution and its parallel implementation based on CUDA
Jonsson et al. Parallel discrete convolutions on adaptive particle representations of images
Jibao et al. Research on cubic convolution interpolation parallel algorithm based on GPU
Jain et al. Efficient single image super resolution using enhanced learned group convolutions
CN111415407B (en) Method for improving performance of three-dimensional reconstruction image by adopting multi-template system
Yazdanpanah et al. A CUDA based implementation of locally-and feature-adaptive diffusion based image denoising algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150819

RJ01 Rejection of invention patent application after publication