CN104899840B - A kind of guiding filtering acceleration optimization method based on CUDA - Google Patents

A kind of guiding filtering acceleration optimization method based on CUDA Download PDF

Info

Publication number
CN104899840B
CN104899840B CN201510324806.0A CN201510324806A CN104899840B CN 104899840 B CN104899840 B CN 104899840B CN 201510324806 A CN201510324806 A CN 201510324806A CN 104899840 B CN104899840 B CN 104899840B
Authority
CN
China
Prior art keywords
image
kernel function
navigational
mean value
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510324806.0A
Other languages
Chinese (zh)
Other versions
CN104899840A (en
Inventor
何凯
王新磊
王晓文
葛云峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIANJIN BOHUA ANCHUANG TECHNOLOGY Co.,Ltd.
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201510324806.0A priority Critical patent/CN104899840B/en
Publication of CN104899840A publication Critical patent/CN104899840A/en
Application granted granted Critical
Publication of CN104899840B publication Critical patent/CN104899840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Processing (AREA)

Abstract

The invention discloses a kind of, and the guiding filtering based on CUDA accelerates optimization method, the guiding filtering accelerates optimization method the following steps are included: input picture p and navigational figure I is read in global storage by host end memory, by constructing the first kernel function, respectively obtain input picture p, navigational figure I, image I*P, image I*I neighborhood window Image neighborhood mean value;The covariance that the second kernel function successively seeks image (I, p), the variance of navigational figure I are constructed, and then seeks filtering key parameter a and b;It calls the first kernel function to obtain neighboring mean value mean_a, the neighboring mean value mean_b of parameter b of parameter a, and then obtains final filter result q, result is saved in corresponding global storage, host end memory is arrived in outflow.This method effectively increases the execution efficiency of guiding filtering algorithm, has fast implemented guiding filtering algorithm using the advantage of GPU Floating-point Computation ability, parallel computation etc. while guaranteeing image filtering effect.

Description

A kind of guiding filtering acceleration optimization method based on CUDA
Technical field
The present invention relates to Computer Applied Technologies and field of image processing, more particularly to a kind of CUDA that is based on (to unifiedly calculate Equipment framework) guiding filtering accelerate optimization method.
Background technique
Image filtering is the important means of image procossing, is had great importance and researching value.Due to imaging system, pass Defeated medium and recording equipment etc. it is not perfect, digital picture formed at it, transmission log during often by a variety of noises Pollution.And image filtering, i.e., the noise of target image is inhibited under conditions of retaining image minutia as far as possible, is Indispensable operation in image preprocessing, treatment effect quality will directly influence subsequent image processing and analysis have Effect property and reliability.
Image filtering method can be divided into two kinds: one is linearly moving constant filtering, filtering core weight and input picture Content is unrelated, is represented as gaussian filtering, mean filter, Laplce's filtering etc.;Another kind is Linear shift variant filtering, is represented as drawing Filtering is led, the content information for being included using original image, referred to as guidance figure information are needed in filtering.Bilateral filter Wave kernel function considers the information of pixel space difference in image template, it is contemplated that margin of image element value information, wherein guiding Figure and input figure are same piece image, therefore two-sided filter is regarded as a kind of simple form of guiding filtering.Combine bilateral The guidance figure and input figure of filter are different, can obtain more preferably filter effect.But two-sided filter and joint are double Side filter is there is also some apparent defects, the application compressed such as two-sided filter in details enhancing and high dynamic range images In, an apparent edge gradient flop phenomenon can all occur, so the algorithm of filter itself and structurally need into One step is improved.Guiding filtering concept is formal from 2010 to be proposed, one side has the characteristics that bilateral filtering is protected side and denoised, and gram The influence of artifact is taken, simultaneously because the substantial connection between Laplacian Matrix, guiding filtering increases in image denoising, image By force, HDR (high dynamic range images) compression, flash/noflash denoising[1], scratch figure, defogging and cascade sampling etc. fields obtain It is widely applied.The algorithm is simple and effective, but needs to calculate complicated matrix and solve large linear systems, results in and draws It leads filtering algorithm and consumes a large amount of operation time and space, the needs being unable to satisfy in practical application.
To sum up, navigational figure filtering algorithm calculation amount is larger, it is difficult to improve while guaranteeing algorithm accuracy and calculates The execution efficiency of method.It is therefore traditional that based on the framework of CPU, people handle to algorithm accuracy and in real time wants being difficult to meet It asks, only goes meet the needs of in practical application using image processor (GPU).
Summary of the invention
The guiding filtering based on CUDA that the present invention provides a kind of accelerates optimization method, and the present invention is guaranteeing image filtering matter While amount, and computational efficiency is improved, reduces computation complexity, described below:
A kind of guiding filtering acceleration optimization method based on CUDA, it includes following step that the guiding filtering, which accelerates optimization method, It is rapid:
Input picture p and navigational figure I is read in into global storage by host end memory, by constructing the first kernel letter Number, respectively obtain input picture p, navigational figure I, image I*P, image I*I neighborhood window Image neighborhood mean value;
The covariance that the second kernel function successively seeks image (I, p), the variance of navigational figure I are constructed, and then seeks filtering Wave key parameter a and b;
The first kernel function is called to seek the neighboring mean value mean_a of parameter a, the neighboring mean value mean_b of parameter b, in turn Final filter result is obtained, result is saved in corresponding global storage, host end memory is arrived in outflow.
It is described by constructing the first kernel function, respectively obtain input picture p, navigational figure I, image I*P, image I*I The Image neighborhood mean value of neighborhood window the step of specifically:
Input picture p, navigational figure I, image I*P, image I*I are divided in the calculating of the Image neighborhood mean value of neighborhood window The read group total of Image neighborhood window pixel value is not converted to;
The summation of Image neighborhood window pixel value is calculated separately by constructing the first kernel function.
The described the step of summation of Image neighborhood window pixel value is calculated separately by the first kernel function of building specifically:
The summation of neighborhood window pixel value is realized using integrogram, carries out CUDA by 4 kernel functions in the first kernel function Parallel optimization.
The guiding filtering accelerates optimization method further include: calls 4 kernel functions in the first kernel function, obtains neighborhood window Mouth number of pixels N, and it is saved in constant storage.
The beneficial effect of the technical scheme provided by the present invention is that:
The present invention realizes guiding filtering algorithm on the basis of furtheing investigate guiding filtering algorithm, based on CUDA programming, Image smoothing, image are sprouted wings, image enhancement and image flash denoise four instance aspects, and are based on c program and Matlab program Carry out Experimental comparison.The advantages of the present invention over the prior art are that:
(1) thinking is novel, guides filtering algorithm using CUDA framework and designs, breaks through the time restriction of serial programming, With larger innovative significance.
(2) execution efficiency is high, can reach real-time processing to a certain extent.This method utilizes GPU Floating-point Computation ability, simultaneously The advantage of row calculating etc. effectively increases the execution efficiency of guiding filtering algorithm while guaranteeing image filtering effect, Guiding filtering algorithm is fast implemented.
(3) it realizes that simply hardware requirement is low, the calling to GPU parallel architecture, written in code is completed under C language environment It is easy, while achieving that the processing of large-scale data in the GPU hardware of consumer level.
Detailed description of the invention
Fig. 1 guiding filtering algorithm flow chart of the present invention;
Fig. 2 image smoothing effect contrast figure;
(a) input picture, (b) c program exports image, and (c) CUDA program exports image;
Fig. 3 image feather effect comparison diagram;
(a) input picture, (b) navigational figure, (c) c program exports image, and (d) CUDA program exports image;
Fig. 4 image enhancement effects comparison diagram of the present invention;
(a) input picture, (b) c program exports image, and (c) CUDA program exports image;
Fig. 5 image flash of the present invention denoises effect contrast figure.
(a) input picture, (b) navigational figure, (c) c program exports image, and (d) CUDA program exports image.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, embodiment of the present invention is made below further Ground detailed description.
Nearly ten years, computer graphics processor (Graphics Pocessing Unit, GPU) is only handled by script The special equipment of computer graphical develops into the processor of high degree of parallelism, multithreading, multicore.The operation energy of mainstream GPU at present Power is more than mainstream universal cpu already, and future, gap can be increasing from development trend.Unified calculation equipment framework be by NVIDIA (tall and handsome to reach) company releases a kind of using GPU as the software and hardware architecture system of data parallel, it is one complete Whole GPGPU (graphics processing unit) solution.The appearance of CUDA reduces programmer and carries out general-purpose computations using GPU Development difficulty.Due to CUDA special programming model and storing data method, so that a large amount of and complicated similar operations can be with It is handled simultaneously by thread, greatly reduces the execution time of program.
For this purpose, the present invention proposes that a kind of guiding filtering based on CUDA accelerates optimization method, using CUDA building CPU and GPU cooperative working environment, using CPU as being responsible for carrying out the host of the strong issued transaction and serial computing of logicality, using GPU as It is responsible for executing the coprocessor of the threading parallel processing of height, realizes that Image neighborhood window pixel value is asked using CUDA multiple programming With, and then Image neighborhood mean value being obtained, while utilizing register and Texture memory, optimization algorithm step obtains guiding filtering Key parameter, and then realize the global optimization to algorithm.Technical scheme is as follows:
Embodiment 1
A kind of guiding filtering based on CUDA accelerates optimization method, referring to Fig. 1, guiding filtering accelerate optimization method include with Lower step:
101: input picture p and navigational figure I being read in into global storage by host end memory, by constructing the first kernel Function, respectively obtain input picture p, navigational figure I, image I*P, image I*I neighborhood window Image neighborhood mean value;
102: the second kernel function of building successively seeks the covariance of image (I, p), the variance of navigational figure I, Jin Erqiu Take filtering key parameter a and b;
103: the first kernel function being called to seek the neighboring mean value mean_a of parameter a, the neighboring mean value of parameter b
Mean_b, and then final filter result is obtained, result is saved in corresponding global storage, host side is arrived in outflow Memory.
Wherein, by constructing the first kernel function, input picture p, navigational figure I, image I*P, image I*I are obtained respectively The Image neighborhood mean value of neighborhood window the step of specifically:
Input picture p, navigational figure I, image I*P, image I*I are divided in the calculating of the Image neighborhood mean value of neighborhood window The read group total of Image neighborhood window pixel value is not converted to;
The summation of Image neighborhood window pixel value is calculated separately by constructing the first kernel function.
Further, by constructing the step of the first kernel function calculates separately the summation of Image neighborhood window pixel value tool Body are as follows:
The summation of neighborhood window pixel value is realized using integrogram, carries out CUDA by 4 kernel functions in the first kernel function Parallel optimization.
The guiding filtering accelerates optimization method further include:
4 kernel functions in the first kernel function are called, obtain neighborhood window pixel number N, and be saved in constant storage.
This method is programmed using CUDA and carries out parallel optimization to guiding filtering algorithm, is guaranteeing the same of filter result effect When, and the execution efficiency of guiding filtering algorithm can be greatly improved, the real-time place of guiding filtering algorithm is realized to a certain extent Reason.
The method in embodiment 1 is described below with reference to specific calculation formula, calculating step, it is as detailed below to retouch It states:
Embodiment 2
Guiding filtering algorithm is realized based on a Local Linear Model, in Local Linear Model, if input figure As being p, navigational figure I, filtering output image is q, and Local Linear Model is assumed with the neighborhood window ω of center pixel kk There are following linear relationships:
(1)
Wherein, ωkIt is the square window with side length for r, akAnd bkIt is neighborhood window ωkIn linear coefficient, IiFor guidance Image is in neighborhood window ωkIn pixel value, qiFor neighborhood window ωkIn filtering output.Coefficient akAnd bkIt can be defeated by seeking Enter image p and exports the minimum difference of image q to determine, i.e., so that formula (2) reaches minimum.
E (a in formula (2)k,bk) it is neighborhood window ωkIn cost function output, piIt is input picture in neighborhood window ωkIn pixel value, ε be a chastening variance adjusting parameter, the purpose is to prevent akValue is excessive.Linear regression solves Above formula can obtain:
In formula, μkAnd σk 2It is navigational figure I respectively in neighborhood window ωkMean value and variance.| ω | it is neighborhood window ωk In number of pixels,It is input picture p in neighborhood window ωkIn mean value.
Since each pixel can be included in multiple neighborhood window ωkIn, in different neighborhood window ωkIn be calculated QiAlso different, so need to qiIt is averaging processing, by calculating a in all windowskAnd bk, filtering output is such as formula (5)。
(5)
Wherein, Respectively ak, bkAt point i The average value of all overlapping neighborhood windows.
By analyzing formula (3), formula (4) it is found that μkIipiRespectively represent navigational figure I, output Image p, I × p are in its neighborhood window ωkIn mean value, σk 2It is I in neighborhood window ωkIn variance.It is deposited between variance and mean value In DX=E (X2)-(EX)2Relationship, calculated using mean value.Therefore in guiding filtering algorithm, Image neighborhood mean value needs It calculates repeatedly, is part most time-consuming in entire algorithm, therefore, how quickly to seek image of the image in certain vertex neighborhood window Neighboring mean value just becomes a key for realizing guiding filtering algorithm, and an emphasis link of CUDA of the present invention optimization.
The present invention constructs the first kernel function using formula (6), realizes the calculating of image domains mean value.
Mean_p=boxfilter (p, r)/N (6)
Wherein, the neighboring mean value of mean_p representing input images p, p is in neighborhood for boxfilter (p, r) representing input images The sum of pixel value in window, N represent number of pixels in neighborhood window, and r represents neighborhood window side length.Wherein neighborhood window pixel Number N, can be by seeking neighborhood window pixel to all 1's matrix identical with required image size and obtaining.The calculating step is ability Well known to field technique personnel, the embodiment of the present invention does not repeat them here this.
Using the above method, the calculating of Image neighborhood mean value can be changed into the summation meter of Image neighborhood window pixel value It calculates, is convenient for CUDA parallel processing.The present invention realizes that neighborhood window pixel value is summed using integrogram, passes through the first kernel 4 kernel functions carry out CUDA parallel optimization in function, and the specific implementation steps are as follows (assuming that data used have been located in GPU In video memory):
(I) the 1st kernel function is responsible for parallel computation image i-th and arranges (1≤i≤picture traverse) from the 1st row to jth (1 ≤ j≤picture altitude) row pixel and, start-up parameter be block dimension be 1024 × 1, grid dimension be 1 × 1.Each line Journey completes the calculating of a column data in image by recursive call, uses register to save intermediate data in circulation, at this time data Reading meets global storage and merges access.
The data that (II) the 1st kernel function generates need to carry out the processing of data boundary, the 2nd kernel function with Behavior processes data in units border issue, start-up parameter be block dimension be 16 × 16, grid dimension be ((picture traverse+ DimBlock.x-1)/dimBlock.x) × ((picture altitude+dimBlock.y-1)/dimBlock.y) a block.Wherein DimBlock.x indicates thread block in the dimension of x-axis, and dimBlock.y indicates thread block in the dimension of y-axis.
(III) the 3rd kernel function is responsible for parallel computation image jth row (1≤j≤picture altitude) and is arranged from the 1st to i-th Arrange (1≤i≤picture traverse) pixel and.The restriction of non-merged access when to eliminate reading data, using first to this Kernel function input data carries out matrix permutation, then the 1st kernel function is called to be calculated, and adopts in data storage Storage mode is write with by column.
The data that (IV) the 3rd kernel function generates are also required to carry out the processing of data boundary, the 4th kernel function To arrange the border issue for processes data in units, start-up parameter is identical as the 2nd kernel function, and output data is the 1st at this time The neighborhood window pixel value of a kernel function input picture and, and be saved into corresponding global storage.
And so on, it can successively acquire the neighboring mean value mean_I of navigational figure I, the neighboring mean value mean_ of image I*P The neighboring mean value mean_II of Ip, image I*I.
Here it is worth noting that, the programming model of CUDA is that CPU and GPU cooperates.Traditional CPU architecture is hard by it The influence of part framework effectively cannot carry out general-purpose computations using resource, and can make GPU that can not only execute tradition using CUDA Graphics calculations, moreover it is possible to efficiently execute general-purpose computations.It is time-consuming in order to reduce data transmission as far as possible, arithmetic speed is improved, this 2 data transmission are only carried out between invention setting CPU memory and GPU video memory, i.e. input picture p and navigational figure I are by host side Memory is passed to equipment end video memory, and output image q is passed to host end memory by equipment end video memory, the specific steps of which are as follows:
(I) constructs CPU and GPU cooperative working environment using CUDA;
Input picture p and navigational figure I by the global storage of host memory reading device video memory, and is tied to by (II) Texture memory.
(III) distributes number of threads, sets kernel start-up parameter as each block distribution 16 × 16, each grid has ((figure Image width degree+dimBlock.x-1)/dimBlock.x) × ((picture altitude+dimBlock.y-1)/dimBlock.y) a Image is carried out chessboard division by block.Wherein dimBlock.x indicates thread block in the dimension of x-axis, and dimBlock.y indicates line Dimension of the journey block in y-axis.
(IV) calls the first kernel function (including 4 kernel functions), by complete 1 square identical with required image size Battle array seeks neighborhood window pixel and obtains neighborhood window pixel number N, and be saved into constant storage.
N successively seeks the neighborhood of input picture p and navigational figure I in (V) first kernel function of calling and constant storage Mean value, the neighboring mean value mean_II of the neighboring mean value mean_Ip of image I*P, image I*I, and result is successively stored in correspondence Global storage.
(VI) constructs covariance cov_Ip, the variance var_I of image I that the second kernel function successively seeks image (I, p), And then it constructs kernel function and seeks filtering key parameter a and b.
That is, constructing covariance kernel function according to formula cov_Ip=mean_Ip-mean_I.*mean_p seeks image The covariance of (I, p);
The variance that variance kernel function seeks image I is constructed according to formula var_I=mean_II-mean_I.*mean_I;
Parameter a kernel function, which is constructed, according to formula a=cov_Ip./(var_I+ ε) seeks filtering key parameter a;
Parameter b kernel function, which is constructed, according to formula b=mean_p-a.*mean_I seeks filtering key parameter b.
(VII) calls the first kernel function to seek neighborhood window mean value to filtering key parameter a and b, and acquires final filtering knot Fruit q, and result is saved in corresponding global storage.
That is, calling the first kernel function to acquire the neighbour of key parameter a according to formula mean_a=boxfilter (a, r)/N Domain mean value;
According to formula mean_b=boxfilter (b, r)/N, the first kernel function is called to acquire the neighborhood of key parameter b Mean value;
Output q kernel function is constructed according to formula q=mean_a.*I+mean_b and acquires final filter result q, and will knot Fruit is saved in corresponding global storage.
(VIII) spreads out of the filter result image in the global storage for being stored in equipment video memory to host memory.
In addition, guiding filtering algorithm is when realizing image emergence algorithm, it is different from above-mentioned process:
When r, g, b component data of (I) input picture p and navigational figure I copies into the video memory of GPU by the memory of CPU, by In being related to multiple data transmission, the present invention is flowed using CUDA, and such data assignment operation and kernel function execute when intersecting progress, can Improve the utilization rate of GPU resource;Especially when the amount of data is large, the advantage of CUDA stream is obvious;
(II) when solving key parameter a, the present invention realizes in a 3 components r, g, b using a kernel function It calculates, it is that 16 × 16, block is arranged with two-dimensional address that start-up parameter, which is set as block dimension,.Per thread successively will first Data in global storage var_I_rr, var_I_rg, var_I_rb, var_I_gg, var_I_gb, var_I_b are saved in Register, the Sigma matrix of building 3 × 3, and determinant computation equations are utilized, result is stored in register;Secondly each Sigma matrix inversion and cov_Ip matrix are multiplied unified calculation by thread with the inverse matrix, to increase the calculating of program execution Closeness makes full use of the calculated performance of GPU.
Embodiment 3
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific example to the present invention Technical solution be described in further detail.
Present example use 7 operating system of windows, CPU be Intel Core i5-3470, dominant frequency 3.2GHz, Installed System Memory is 4GB;GPU is NVIDIA GeForce GTX660, and which includes 5 stream multiprocessors (SMS), each SMS Containing 192 CUDA cores, onboard global memory is 2048Mbytes, and memory bandwidth is 192 bits, supports CUDA Compute Capability is 3.0.The present invention is analyzed using the Visual Profile that CUDA Toolkit is carried simultaneously All data realizes the quantitative analysis to program feature.
For the validity for verifying this method, present example is to guiding filtering algorithm in image smoothing, image emergence, image 4 application fields such as enhancing and flash denoising carry out CUDA parallel optimization, and filter effect figure and speed-up ratio table are as follows:
1 image smoothing of example
In this example, filter radius r is 16, and filtering parameter eps is 0.04.Input picture p and navigational figure I are set as same Piece image, output image q are final output, and 1 effect of example is as shown in Figure 2.As can be seen from Figure 2: input picture p In details, mutation, edge and noise all obtained a degree of inhibition, obtain more satisfactory image smoothing effect.
2 image of example is sprouted wings
In this example, filter radius r is 60, and filtering parameter eps is 0.000001.Input picture p and navigational figure I are set For the different image of two width, output image q is final output, and 2 effect of example is as shown in Figure 3.As can be seen from Figure 3: Output image feather effect is obvious, and marginal portion realizes asymptotic variation, has achieved the effect that natural sparse model.
3 image enhancement of example
In this example, filter radius r is 16, and filtering parameter eps is 0.01.Input picture p and navigational figure I are set as same Piece image, output image q are final output, and 3 effect of example is as shown in Figure 4.As seen from Figure 4: exporting image Entirety or local feature have all obtained apparent enhancing, effectively increase the identification capability of image detail part.
4 flash of example denoising
In this example, filter radius r is 8, and filtering parameter eps is 0.0004.Input picture p and navigational figure I are set as The different image of two width, output image q are final output, and 4 effect of example is as shown in Figure 5.As seen from Figure 5: output Image q coloring denoising effect is coordinated naturally, has obtained ideal treatment effect.
In addition to this, from Fig. 2~5 as can be seen that the present invention it is smooth, sprout wings, in terms of enhancing, flash denoise 4 all It is with the former algorithm effect based on c program essentially identical, it was demonstrated that accuracy of the invention.In order to which acceleration more of the invention is imitated Fruit is based respectively on Matlab programming, realizes guiding filtering algorithm based on c program and CUDA programming, and carried out Experimental comparison.No Time loss and speed-up ratio with image in different resolution processing is as shown in table 1:
Table 1 is based on distinct program programming and realizes guiding filtering algorithm time loss (ms) and speed-up ratio
From table 1 it follows that realizing guiding filtering algorithm, base of the present invention compared to based on Matlab program and c program It is greatly shortened in the time loss of CUDA Parallel Implementation;Wherein, the acceleration effect that image is sprouted wings is particularly evident, may be implemented more than 60 Speed-up ratio again;Simultaneously it can also be seen that being continuously increased with image resolution ratio, acceleration effect of the invention are also more obvious.
Bibliography:
[1]Petschnigg G,Szeliski R,Agrawala M,et al.Digital photography with flash and no-flash image pairs[J].ACM transactions on graphics(TOG),2004,23 (3):664-672.
It will be appreciated by those skilled in the art that attached drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention Serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (1)

1. a kind of guiding filtering based on CUDA accelerates optimization method, which is characterized in that the guiding filtering accelerates optimization method The following steps are included:
Input picture p and navigational figure I is read in into global storage by host end memory, by constructing the first kernel function, point Not Huo Qu input picture p, navigational figure I, image I*p, image I*I neighborhood window Image neighborhood mean value;
The covariance that the second kernel function successively seeks image (I, p), the variance of navigational figure I are constructed, and then seeks filtering and closes Bond parameter a and b;
It calls the first kernel function to seek the neighboring mean value mean_a of parameter a, the neighboring mean value mean_b of parameter b, and then obtains Result is saved in corresponding global storage by final filter result, and host end memory is arrived in outflow;
When realizing image emergence function, image resolution ratio is 946 × 756, time-consuming 62.3ms, acceleration ratio 60.2;
When realizing image smoothing function, image resolution ratio is 942 × 659, time-consuming 21.2ms, acceleration ratio 11.0;
When realizing image enhancement functions, image resolution ratio is 1024 × 1024, time-consuming 59.7ms, acceleration ratio 16.2;
When realizing flash denoising function, image resolution ratio is 1024 × 1024, time-consuming 89.7ms, acceleration ratio 10.5;
It is described by constructing the first kernel function, obtain that input picture p, navigational figure I, image I*p, image I*I is in neighbour respectively The step of Image neighborhood mean value of domain window specifically:
Input picture p, navigational figure I, image I*p, image I*I are turned respectively in the calculating of the Image neighborhood mean value of neighborhood window It is changed to the read group total of Image neighborhood window pixel value;
The summation of Image neighborhood window pixel value is calculated separately by constructing the first kernel function;
The described the step of summation of Image neighborhood window pixel value is calculated separately by the first kernel function of building specifically:
The summation of neighborhood window pixel value is realized using integrogram, it is parallel to carry out CUDA by 4 kernel functions in the first kernel function Optimization;
The guiding filtering accelerates optimization method further include:
4 kernel functions in the first kernel function are called, obtain neighborhood window pixel number N, and be saved in constant storage;
The guiding filtering accelerates optimization method further include:
It realizes that neighborhood window pixel value is summed using integrogram, is carried out by 4 kernel functions in the first kernel function CUDA parallel optimization, the specific implementation steps are as follows:
(I) the 1st kernel function is responsible for pixel of the column of parallel computation image i-th from the 1st row to jth row and start-up parameter It is 1024 × 1, grid dimension for block dimension is 1 × 1;
Per thread completes the calculating of a column data in image by recursive call, uses register to save mediant in circulation According to reading data meets global storage and merges access at this time;
The data that (II) the 1st kernel function generates need to carry out the processing of data boundary, and the 2nd kernel function is with behavior Processes data in units border issue, start-up parameter be block dimension be 16 × 16, grid dimension be ((picture traverse+ DimBlock.x-1)/dimBlock.x) × ((picture altitude+dimBlock.y-1)/dimBlock.y) a block, wherein DimBlock.x indicates thread block in the dimension of x-axis, and dimBlock.y indicates thread block in the dimension of y-axis;
(III) the 3rd kernel function be responsible for parallel computation image jth row from the 1st column to i-th column pixel and;It is right first Kernel function input data carries out matrix permutation, then the 1st kernel function is called to be calculated, and adopts in data storage Storage mode is write with by column;
The data that (IV) the 3rd kernel function generates are also required to carry out the processing of data boundary, and the 4th kernel function is to arrange The border issue of data is handled for unit, start-up parameter is identical as the 2nd kernel function, and output data is the 1st at this time The neighborhood window pixel value of kernel function input picture and, and be saved into corresponding global storage;
And so on, the neighboring mean value mean_I of navigational figure I, the neighboring mean value mean_Ip of image I*p can be successively acquired, is schemed As the neighboring mean value mean_II of I*I.
CN201510324806.0A 2015-06-12 2015-06-12 A kind of guiding filtering acceleration optimization method based on CUDA Active CN104899840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510324806.0A CN104899840B (en) 2015-06-12 2015-06-12 A kind of guiding filtering acceleration optimization method based on CUDA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510324806.0A CN104899840B (en) 2015-06-12 2015-06-12 A kind of guiding filtering acceleration optimization method based on CUDA

Publications (2)

Publication Number Publication Date
CN104899840A CN104899840A (en) 2015-09-09
CN104899840B true CN104899840B (en) 2018-12-18

Family

ID=54032488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510324806.0A Active CN104899840B (en) 2015-06-12 2015-06-12 A kind of guiding filtering acceleration optimization method based on CUDA

Country Status (1)

Country Link
CN (1) CN104899840B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105096277B (en) * 2015-09-17 2017-08-01 华北电力大学(保定) A kind of image adaptive selected based on parameter instructs filtering method
CN106339993A (en) * 2016-08-26 2017-01-18 北京金山猎豹科技有限公司 Human face image polishing method and device and terminal device
CN109816595B (en) * 2017-11-20 2021-01-26 北京京东尚科信息技术有限公司 Image processing method and device
CN112381734A (en) * 2020-11-13 2021-02-19 海南众博数据科技有限公司 Two-dimensional guide filtering method, two-dimensional guide filter and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073982A (en) * 2011-01-10 2011-05-25 西安电子科技大学 Method for realizing acceleration of anisotropic diffusion filtration of overlarge synthetic aperture radar (SAR) image by graphic processing unit (GPU)
CN103745447A (en) * 2014-02-17 2014-04-23 东南大学 Fast parallel achieving method for non-local average filtering
CN104050637A (en) * 2014-06-05 2014-09-17 华侨大学 Quick image defogging method based on two times of guide filtration

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073982A (en) * 2011-01-10 2011-05-25 西安电子科技大学 Method for realizing acceleration of anisotropic diffusion filtration of overlarge synthetic aperture radar (SAR) image by graphic processing unit (GPU)
CN103745447A (en) * 2014-02-17 2014-04-23 东南大学 Fast parallel achieving method for non-local average filtering
CN104050637A (en) * 2014-06-05 2014-09-17 华侨大学 Quick image defogging method based on two times of guide filtration

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"引导滤波的OpenCV实现";aipiano;《http://blog.csdn.net/aichipmunk/article/details/21163543》;20140313;第1页第2-8段,第2页第1-4段,第3页代码第24-54行 *

Also Published As

Publication number Publication date
CN104899840A (en) 2015-09-09

Similar Documents

Publication Publication Date Title
Luo et al. Canny edge detection on NVIDIA CUDA
Zhang et al. Image parallel processing based on GPU
CN104899840B (en) A kind of guiding filtering acceleration optimization method based on CUDA
CN103077547A (en) CT (computerized tomography) on-line reconstruction and real-time visualization method based on CUDA (compute unified device architecture)
Wang et al. Empirical mode decomposition on surfaces
Daga et al. Implementation of parallel image processing using NVIDIA GPU framework
CN104952043A (en) Image filtering method and CT system
CN109300083A (en) A kind of even color method of piecemeal processing Wallis and device
CN110246201B (en) Pencil drawing generation method based on thread-level parallelism
CN105957028A (en) GPU acceleration patch-based bilateral filter method based on OpenCL
Moradifar et al. Performance improvement of Gaussian filter using SIMD technology
CN109410136A (en) Even color method and processing unit based on most short transmission path
US9772864B2 (en) Methods of and apparatus for multidimensional indexing in microprocessor systems
Preethi et al. Gaussian filtering implementation and performance analysis on GPU
Song et al. Unsharp masking image enhancement the parallel algorithm based on cross-platform
Cavus et al. Gpu based parallel image processing library for embedded systems
Das et al. A concise review of fast bilateral filtering
CN109087381A (en) A kind of unified shader rendering tinter based on double transmitting VLIW
Qiu et al. Parallel fast pencil drawing generation algorithm based on GPU
CN107622037A (en) The method and apparatus that a kind of Matrix Multiplication for improving graphics processing unit calculates performance
Wu et al. From coarse-to fine-grained implementation of edge-directed interpolation using a GPU
CN106878586A (en) The parallel image detail enhancing method and device of restructural
Xiao et al. Unsharp masking image enhancement the parallel algorithm based on cross-platform
Wang et al. Acceleration of the Retinex algorithm for image restoration by GPGPU/CUDA
Pallipuram et al. A multi-node GPGPU implementation of non-linear anisotropic diffusion filter

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210909

Address after: Room 109, no.1866, Bohai 12th Road, Port Economic Zone, Binhai New Area, Tianjin 300452

Patentee after: Tianjin Bohua Xinchuang Technology Co.,Ltd.

Address before: 300072 Tianjin City, Nankai District Wei Jin Road No. 92

Patentee before: Tianjin University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211018

Address after: 300452 room 121, No. 1866, Bohai 12th Road, Lingang Economic Zone, Binhai New Area, Tianjin

Patentee after: TIANJIN BOHUA ANCHUANG TECHNOLOGY Co.,Ltd.

Address before: Room 109, no.1866, Bohai 12th Road, Port Economic Zone, Binhai New Area, Tianjin 300452

Patentee before: Tianjin Bohua Xinchuang Technology Co.,Ltd.