CN104899840A - Guided-filtering optimization speed-up method based on CUDA - Google Patents

Guided-filtering optimization speed-up method based on CUDA Download PDF

Info

Publication number
CN104899840A
CN104899840A CN201510324806.0A CN201510324806A CN104899840A CN 104899840 A CN104899840 A CN 104899840A CN 201510324806 A CN201510324806 A CN 201510324806A CN 104899840 A CN104899840 A CN 104899840A
Authority
CN
China
Prior art keywords
image
kernel function
filtering
neighborhood
cuda
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510324806.0A
Other languages
Chinese (zh)
Other versions
CN104899840B (en
Inventor
何凯
王新磊
王晓文
葛云峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIANJIN BOHUA ANCHUANG TECHNOLOGY Co.,Ltd.
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201510324806.0A priority Critical patent/CN104899840B/en
Publication of CN104899840A publication Critical patent/CN104899840A/en
Application granted granted Critical
Publication of CN104899840B publication Critical patent/CN104899840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a guided-filtering optimization speed-up method based on CUDA, and the method comprises the following steps: enabling an input image p and a guide image I to be read into a global storage unit from a memory of a host end; respectively obtaining neighborhood mean values of the input image p, the guide image I, an image I*P and an image I*I at neighborhood windows through the construction of a first core function; constructing a second core function, and sequentially obtaining the covariance of images (I, p) and the covariance of the image I, thereby obtaining the key parameters a and b of filtering; performing the call of the first core function to obtain a neighborhood mean value mean_a of the parameter a and the neighborhood mean value mean_b of the parameter b, thereby obtaining the final filtering result q; enabling the result q to be stored in the corresponding global storage unit, and outputting the result q to the memory of the host end. The method employs the advantages in the floating point calculation and parallel computing capability of a GPU, guarantees the filtering effect of an image, and effectively improves the execution efficiency of a guided filtering algorithm, and quickly achieves the guided filtering algorithm.

Description

Optimization method is accelerated in a kind of guiding filtering based on CUDA
Technical field
The present invention relates to Computer Applied Technology and image processing field, particularly relate to a kind of guiding filtering based on CUDA (unified calculation equipment framework) and accelerate optimization method.
Background technology
Image filtering is the important means of image procossing, has great importance and researching value.Due to the imperfection of imaging system, transmission medium and recording unit etc., digital picture is often subject to the pollution of multiple noise in its formation, transmission log process.And image filtering, namely under the condition as far as possible retaining image detail feature, the noise of target image is suppressed, be indispensable operation in Image semantic classification, the quality of its treatment effect will directly have influence on the validity and reliability of successive image process and analysis.
Image filtering method can be divided into two kinds: one linearly moves constant filtering, and the content of its filtering core weights and input picture has nothing to do, and is represented as gaussian filtering, mean filter, Laplce's filtering etc.; Another kind is Linear shift variant filtering, is represented as guiding filtering, need to utilize original image to comprise in filtering content information, be referred to as guiding figure information.Namely bilateral filtering kernel function considers the information of pixel space difference in image template, considers again margin of image element value information, and wherein guide figure and input figure to be same piece image, therefore two-sided filter can think a kind of simple form guiding filtering.The guiding figure of associating two-sided filter is different with input figure, can obtain more preferably filter effect.But also there are some obvious defects in two-sided filter and associating two-sided filter, if two-sided filter is in the application that details strengthens and high dynamic range images compresses, there is an edge gradient flop phenomenon clearly in capital, so the algorithm of wave filter itself and texturally need further improvement.Filtering concept is guided formally to propose from 2010, it has the feature that bilateral filtering protects limit denoising on the one hand, overcome again the impact of artifact, simultaneously due to the substantial connection between Laplacian Matrix, guide filtering at image denoising, image enhaucament, HDR (high dynamic range images) compression, flash/noflash denoising [1], the field such as figure, mist elimination and cascade sampling of scratching is widely used.This algorithm is simply effective, but needs the matrix of calculation of complex and solve large linear systems, result in and guides filtering algorithm to consume a large amount of operation time and space, cannot meet the needs in practical application.
Generally speaking, navigational figure filtering algorithm calculated amount is comparatively large, is difficult to the execution efficiency improving algorithm while ensureing algorithm accuracy.Therefore traditional framework based on CPU, being difficult to meet people to algorithm accuracy and the real-time requirement processed, only has employing image processor (GPU) to go the demand met in practical application.
Summary of the invention
The invention provides a kind of guiding filtering based on CUDA and accelerate optimization method, the present invention, while guarantee image filtering quality, turn improves counting yield, reduces computation complexity, described below:
An optimization method is accelerated in guiding filtering based on CUDA, and described guiding filtering is accelerated optimization method and comprised the following steps:
Input picture p and navigational figure I being read in global storage by host side internal memory, by building the first kernel function, obtaining input picture p, navigational figure I, image I*P, image I*I respectively in the Image neighborhood average of neighborhood window;
Build the covariance that the second kernel function asks for image (I, p) successively, the variance of navigational figure I, and then ask for filtering key parameter a and b;
Call the neighboring mean value mean_a that the first kernel function asks for parameter a, the neighboring mean value mean_b of parameter b, and then obtain final filter result, result is saved in corresponding global storage, spreads out of host side internal memory.
Described by structure first kernel function, acquisition input picture p, navigational figure I, image I*P, image I*I are specially in the step of the Image neighborhood average of neighborhood window respectively:
Input picture p, navigational figure I, image I*P, image I*I are converted to respectively the read group total of Image neighborhood window pixel value in the calculating of the Image neighborhood average of neighborhood window;
By building the summation of the first kernel function difference computed image neighborhood window pixel value.
The described step by the summation of computed image neighborhood window pixel value respectively of structure first kernel function is specially:
Adopt integrogram to realize the summation of neighborhood window pixel value, carry out CUDA parallel optimization by 4 kernel functions in the first kernel function.
Described guiding filtering is accelerated optimization method and is also comprised: call 4 kernel functions in the first kernel function, obtains neighborhood window pixel number N, and is saved in constant storage.
The beneficial effect of technical scheme provided by the invention is:
The present invention guides on the basis of filtering algorithm in further investigation, filtering algorithm is guided based on CUDA programming realization, at image smoothing, image emergence, image enhaucament and image flash denoising four instance aspects, carry out Experimental comparison with based on c program and Matlab program.The present invention's advantage is compared with prior art:
(1) thinking is novel, utilizes CUDA framework to guide filtering algorithm design, breaks through the time restriction of serial programming, have larger innovative significance.
(2) execution efficiency is high, can reach real-time process to a certain extent.The method utilizes the advantage of the aspects such as GPU Floating-point Computation ability, parallel computation, while guarantee image filtering effect, effectively improves the execution efficiency guiding filtering algorithm, achieves guiding filtering algorithm fast.
(3) realize simply, hardware requirement is low, under C language environment, complete calling GPU parallel architecture, and code is write easily, just can realize the process of large-scale data in the GPU hardware of consumer level simultaneously.
Accompanying drawing explanation
Fig. 1 the present invention guides filtering algorithm process flow diagram;
Fig. 2 image smoothing effect contrast figure;
(a) input picture, (b) c program output image, (c) CUDA program output image;
Fig. 3 image feather effect comparison diagram;
(a) input picture, (b) navigational figure, (c) c program output image, (d) CUDA program output image;
Fig. 4 image enhancement effects comparison diagram of the present invention;
(a) input picture, (b) c program output image, (c) CUDA program output image;
Fig. 5 image flash of the present invention denoising effect comparison diagram.
(a) input picture, (b) navigational figure, (c) c program output image, (d) CUDA program output image.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below embodiment of the present invention is described further in detail.
Nearly ten years, computer graphics processor (Graphics Pocessing Unit, GPU) is developed into the processor of high degree of parallelism, multithreading, multinuclear by the specialized equipment being originally processing computer graphics.The arithmetic capability of current main flow GPU exceedes main flow universal cpu already, and from development trend, gap can be increasing in the future.Unified calculation equipment framework be by NVIDIA (tall and handsome reach) company release a kind of using the software and hardware architecture system of GPU as data parallel, it is complete GPGPU (graphics processing unit) solution.The appearance of CUDA reduces programmer and uses GPU to carry out the development difficulty of general-purpose computations.The programming model special due to CUDA and storage data method, make a large amount of and the similar computing of complexity can be processed by thread simultaneously, greatly reduce the execution time of program.
For this reason, the present invention proposes a kind of guiding filtering based on CUDA and accelerates optimization method, CUDA is utilized to build CPU and GPU cooperative working environment, using CPU as being responsible for the main frame carrying out the strong issued transaction of logicality and serial computing, using the coprocessor of GPU as the highly threading parallel processing of responsible execution, CUDA multiple programming is utilized to realize the summation of Image neighborhood window pixel value, and then obtain Image neighborhood average, utilize register and Texture memory simultaneously, optimized algorithm step, obtain and guide filtering key parameter, and then achieve the global optimization to algorithm.Technical scheme of the present invention is as follows:
Embodiment 1
An optimization method is accelerated in guiding filtering based on CUDA, see Fig. 1, guides filtering to accelerate optimization method and comprises the following steps:
101: input picture p and navigational figure I is read in global storage by host side internal memory, by building the first kernel function, obtain input picture p, navigational figure I, image I*P, image I*I respectively in the Image neighborhood average of neighborhood window;
102: build the covariance that the second kernel function asks for image (I, p) successively, the variance of navigational figure I, and then ask for filtering key parameter a and b;
103: call the neighboring mean value mean_a that the first kernel function asks for parameter a, the neighboring mean value of parameter b
Mean_b, and then obtain final filter result, result is saved in corresponding global storage, spreads out of host side internal memory.
Wherein, by building the first kernel function, acquisition input picture p, navigational figure I, image I*P, image I*I are specially in the step of the Image neighborhood average of neighborhood window respectively:
Input picture p, navigational figure I, image I*P, image I*I are converted to respectively the read group total of Image neighborhood window pixel value in the calculating of the Image neighborhood average of neighborhood window;
By building the summation of the first kernel function difference computed image neighborhood window pixel value.
Further, the step by building the summation of computed image neighborhood window pixel value respectively of the first kernel function is specially:
Adopt integrogram to realize the summation of neighborhood window pixel value, carry out CUDA parallel optimization by 4 kernel functions in the first kernel function.
This guiding filtering is accelerated optimization method and is also comprised:
Call 4 kernel functions in the first kernel function, obtain neighborhood window pixel number N, and be saved in constant storage.
This method utilizes CUDA to programme to guiding filtering algorithm to carry out parallel optimization, while guarantee filter result effect, greatly can improve again the execution efficiency guiding filtering algorithm, achieve the real-time process guiding filtering algorithm to a certain extent.
Below in conjunction with concrete computing formula, calculation procedure, the method in embodiment 1 is described, described below:
Embodiment 2
Guide filtering algorithm to realize based on a Local Linear Model, in Local Linear Model, if input picture is p, navigational figure is I, and filtering output image is q, and Local Linear Model hypothesis is with the neighborhood window ω of center pixel k kthere is following linear relationship:
q i = a k I i + b k , ∀ i ∈ ω k
(1)
Wherein, ω ktake the length of side as the square window of r, a kand b kneighborhood window ω kin linear coefficient, I ifor navigational figure is at neighborhood window ω kin pixel value, q ifor neighborhood window ω kin filtering export.Coefficient a kand b kby ask for input picture p and output image q minimize difference to determine, namely make formula (2) reach minimum.
E ( a k , b k ) = Σ i ∈ ω k [ ( a k I i + b k - p i ) 2 + ϵ a k 2 ] - - - ( 2 )
E (a in formula (2) k, b k) be neighborhood window ω kin the output of cost function, p ifor input picture is at neighborhood window ω kin pixel value, ε is a chastening variance adjustment parameter, its objective is and prevents a kvalue is excessive.Linear regression solves above formula and can obtain:
a k = 1 | ω | Σ i ∈ ω k I i p i - μ k p ‾ k σ k 2 + ϵ - - - ( 3 )
b k = p ‾ k - a k μ k - - - ( 4 )
In formula, μ kand σ k 2that navigational figure I is at neighborhood window ω respectively kaverage and variance.| ω | be neighborhood window ω kin number of pixels, that input picture p is at neighborhood window ω kin average.
Because each pixel can be included in multiple neighborhood window ω kin, at different neighborhood window ω kin the q that calculates ialso different, so need q ibe averaging processing, by calculating a in all windows kand b k, filtering exports such as formula (5).
q i = 1 | ω | Σ k : i ∈ ω k ( a k I i + b k ) = a ‾ i I i + b ‾ i
(5)
Wherein, a ‾ i = 1 | ω | Σ k ∈ ω k a k , b ‾ i = 1 | ω | Σ k ∈ ω k b k , be respectively a k, b kat the mean value of all overlapping neighborhood window at an i place.
By analyzing known to formula (3), formula (4), μ k, i ip irepresent navigational figure I, output image p, I × p respectively at its neighborhood window ω kin average, σ k 2that I is at neighborhood window ω kin variance.DX=E (X is there is between variance and average 2)-(EX) 2relation, average can be utilized to calculate.Therefore in guiding filtering algorithm, Image neighborhood average needs repeatedly to calculate, it is part the most consuming time in whole algorithm, therefore, how to ask for the Image neighborhood average of image in certain vertex neighborhood window fast, just becoming the key realizing guiding filtering algorithm, is also the emphasis link that CUDA of the present invention optimizes.
The present invention adopts formula (6) to build the first kernel function, realizes the calculating of image domains average.
mean_p=boxfilter(p,r)/N (6)
Wherein, the neighboring mean value of mean_p representing input images p, boxfilter (p, r) representing input images p is pixel value sum in neighborhood window, and N represents number of pixels in neighborhood window, and r represents the neighborhood window length of side.Wherein neighborhood window pixel number N, by asking neighborhood window pixel and obtaining to all 1's matrix identical with required image size.This calculation procedure is conventionally known to one of skill in the art, and the embodiment of the present invention does not repeat this.
Adopt said method, the calculating of Image neighborhood average can be changed into the read group total of Image neighborhood window pixel value, be convenient to carry out CUDA parallel processing.The present invention adopts integrogram to realize the summation of neighborhood window pixel value, carries out CUDA parallel optimization by 4 kernel functions in the first kernel function, its specific implementation step following (supposing that data used have been arranged in GPU video memory):
(I) the 1st kernel function be responsible for parallel computation image i-th arrange (1≤i≤picture traverse) from the 1st row to jth (1≤j≤picture altitude) row pixel and, its start-up parameter is block dimension be 1024 × 1, grid dimension is 1 × 1.Each thread completes the calculating of a column data in image by recursive call, adopts register to preserve intermediate data in circulation, and now digital independent meets global storage and merges access.
The data that (II) the 1st kernel function produces need the process carrying out data boundary, 2nd kernel function is with behavior processes data in units boundary problem, start-up parameter is block dimension be 16 × 16, grid dimension is the individual block of ((picture traverse+dimBlock.x-1)/dimBlock.x) × ((picture altitude+dimBlock.y-1)/dimBlock.y).Wherein dimBlock.x represents the dimension of thread block in x-axis, and dimBlock.y represents the dimension of thread block in y-axis.
(III) the 3rd kernel function be responsible for parallel computation image jth row (1≤j≤picture altitude) from the 1st row to i-th row (1≤i≤picture traverse) pixel and.For the restriction of unconsolidated access during elimination digital independent, adopt and first matrix permutation is carried out to this kernel function input data, then call the 1st kernel function and calculate, adopt when data store and write storage mode by row.
The data that (IV) the 3rd kernel function produces also need the process carrying out data boundary, 4th kernel function take row as the boundary problem of processes data in units, start-up parameter is identical with the 2nd kernel function, now export neighborhood window pixel value that data are the 1st kernel function input picture and, and be saved in corresponding global storage.
The like, the neighboring mean value mean_I of navigational figure I can be tried to achieve successively, the neighboring mean value mean_II of the neighboring mean value mean_Ip of image I*P, image I*I.
Here what deserves to be explained is, the programming model of CUDA is CPU and GPU collaborative work.Traditional CPU architecture can not effectively utilize resource to carry out general-purpose computations by the impact of its hardware structure, and utilizes CUDA that GPU can be made can not only to perform traditional graphics calculations, can also perform general-purpose computations efficiently.Consuming time in order to reduce data transmission as far as possible, improve arithmetic speed, the present invention sets between CPU internal memory and GPU video memory and only carries out 2 data transfer, namely input picture p and navigational figure I imports equipment end video memory into by host side internal memory, and output image q imports host side internal memory into by equipment end video memory, its concrete steps are as follows:
(I) utilizes CUDA to build CPU and GPU cooperative working environment;
(II) by the global storage of input picture p and navigational figure I by host memory reading device video memory, and is tied to Texture memory.
(III) distributes number of threads, setting kernel start-up parameter is that each block distributes 16 × 16, each grid has ((picture traverse+dimBlock.x-1)/dimBlock.x) × and ((picture altitude+dimBlock.y-1)/dimBlock.y) individual block, carries out chessboard division by image.Wherein dimBlock.x represents the dimension of thread block in x-axis, and dimBlock.y represents the dimension of thread block in y-axis.
(IV) calls the first kernel function (namely comprising 4 kernel functions), by all 1's matrix identical with required image size is asked neighborhood window pixel and, obtain neighborhood window pixel number N, and be saved in constant storage.
(V) is called N in the first kernel function and constant storage and is asked for the neighboring mean value of input picture p and navigational figure I successively, the neighboring mean value mean_Ip of image I*P, the neighboring mean value mean_II of image I*I, and successively result is kept at corresponding global storage.
(VI) builds the covariance cov_Ip that the second kernel function asks for image (I, p) successively, the variance var_I of image I, and then structure kernel function asks for filtering key parameter a and b.
That is, build according to formula cov_Ip=mean_Ip-mean_I.*mean_p the covariance that covariance kernel function asks for image (I, p);
The variance that variance kernel function asks for image I is built according to formula var_I=mean_II-mean_I.*mean_I;
Build parameter a kernel function according to formula a=cov_Ip./(var_I+ ε) and ask for filtering key parameter a;
Build parameter b kernel function according to formula b=mean_p-a.*mean_I and ask for filtering key parameter b.
(VII) is called the first kernel function and is asked neighborhood window average to filtering key parameter a and b, and tries to achieve final filter result q, and result is saved in corresponding global storage.
That is, according to formula mean_a=boxfilter (a, r)/N, the neighboring mean value that the first kernel function tries to achieve key parameter a is called;
According to formula mean_b=boxfilter (b, r)/N, call the neighboring mean value that the first kernel function tries to achieve key parameter b;
Build output q kernel function according to formula q=mean_a.*I+mean_b and try to achieve final filter result q, and result is saved in corresponding global storage.
The filter result image be kept in the global storage of equipment video memory is spread out of host memory by (VIII).
In addition, guide filtering algorithm when realizing image emergence algorithm, different from above-mentioned flow process:
R, g, b component data of (I) input picture p and navigational figure I is copied into during to the video memory of GPU by the internal memory of CPU, owing to relating to many data transfer, the present invention adopts CUDA to flow, and when such data assignment operation and kernel function execution intersection are carried out, can improve the utilization rate of GPU resource; Particularly when data volume is larger, the advantage of CUDA stream is obvious;
(II), when solving key parameter a, the present invention adopts a kernel function to realize the calculating of 3 components r, g, b in a, and it is that 16 × 16, block arranges with two-dimensional address that its start-up parameter is set to block dimension.First each thread successively by global storage var_I_rr, var_I_rg, var_I_rb, data in var_I_gg, var_I_gb, var_I_b are saved in register, build the Sigma matrix of 3 × 3, and utilize determinant computation equations, by result stored in register; Secondly each thread is by Sigma matrix inversion, and cov_Ip matrix is multiplied with this inverse matrix and unifiedly calculates, and to increase the computational intensity of program execution, makes full use of the calculated performance of GPU.
Embodiment 3
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with concrete example, technical scheme of the present invention is described in further detail.
Example of the present invention adopts windows 7 operating system, and CPU is Intel Core i5-3470, and dominant frequency is 3.2GHz, and Installed System Memory is 4GB; GPU is NVIDIA GeForce GTX660, which includes 5 stream multiprocessors (SMS), each SMS contains 192 CUDA cores, and Ban Zai global memory is 2048Mbytes, memory bandwidth is 192 bits, supports that CUDA Compute Capability is 3.0.The Visual Profile that the present invention simultaneously utilizes CUDA Toolkit to carry, to analyze every data, realizes the quantitative analysis to program feature.
For verifying the validity of this method, 4 applications such as example of the present invention is sprouted wings at image smoothing, image to guiding filtering algorithm, image enhaucament and flash denoising carry out CUDA parallel optimization, its filter effect figure and speed-up ratio table as follows:
Example 1 image smoothing
In this example, filter radius r is 16, and filtering parameter eps is 0.04.Input picture p and navigational figure I is set to same piece image, and output image q is final Output rusults, and example 1 effect as shown in Figure 2.As can be seen from Figure 2: the details in input picture p, sudden change, edge and noise are obtained for suppression to a certain degree, more satisfactory image smoothing effect is obtained.
Example 2 image is sprouted wings
In this example, filter radius r is 60, and filtering parameter eps is 0.000001.Input picture p and navigational figure I are set to the different image of two width, and output image q is final Output rusults, and example 2 effect as shown in Figure 3.As can be seen from Figure 3: output image feather effect is obvious, and marginal portion achieves asymptotic change, reaches the effect of natural sparse model.
Example 3 image enhaucament
In this example, filter radius r is 16, and filtering parameter eps is 0.01.Input picture p and navigational figure I is set to same piece image, and output image q is final Output rusults, and example 3 effect as shown in Figure 4.As seen from Figure 4: the entirety of output image or local feature are obtained for obvious enhancing, effectively improve the identification capability of image detail part.
Example 4 flash denoising
In this example, filter radius r is 8, and filtering parameter eps is 0.0004.Input picture p and navigational figure I are set to the different image of two width, and output image q is final Output rusults, and example 4 effect as shown in Figure 5.As seen from Figure 5: the painted denoising effect of output image q is coordinated naturally, obtains desirable treatment effect.
In addition, as can be seen from Fig. 2 ~ 5, the present invention is substantially identical with the former algorithm effect based on c program in level and smooth, emergence, enhancing, flash denoising 4, demonstrates accuracy of the present invention.In order to acceleration effect more of the present invention, programme based on Matlab respectively, guide filtering algorithm based on c program and CUDA programming realization, and carried out Experimental comparison.Time loss and the speed-up ratio of different resolution image procossing are as shown in table 1:
Table 1 guides filtering algorithm time loss (ms) and speed-up ratio based on distinct program programming realization
As can be seen from Table 1, realize guiding filtering algorithm compared to based on Matlab program and c program, the time loss that the present invention is based on CUDA Parallel Implementation shortens greatly; Wherein, the acceleration effect that image is sprouted wings is particularly evident, can realize the speed-up ratio of more than 60 times; Also can find out, along with the continuous increase of image resolution ratio, acceleration effect of the present invention is also more obvious simultaneously.
List of references:
[1]Petschnigg G,Szeliski R,Agrawala M,et al.Digital photography with flash and no-flash image pairs[J].ACM transactions on graphics(TOG),2004,23(3):664-672.
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (4)

1. an optimization method is accelerated in the guiding filtering based on CUDA, it is characterized in that, described guiding filtering is accelerated optimization method and comprised the following steps:
Input picture p and navigational figure I being read in global storage by host side internal memory, by building the first kernel function, obtaining input picture p, navigational figure I, image I*P, image I*I respectively in the Image neighborhood average of neighborhood window;
Build the covariance that the second kernel function asks for image (I, p) successively, the variance of navigational figure I, and then ask for filtering key parameter a and b;
Call the neighboring mean value mean_a that the first kernel function asks for parameter a, the neighboring mean value mean_b of parameter b, and then obtain final filter result, result is saved in corresponding global storage, spreads out of host side internal memory.
2. optimization method is accelerated in a kind of guiding filtering based on CUDA according to claim 1, it is characterized in that, described by structure first kernel function, acquisition input picture p, navigational figure I, image I*P, image I*I are specially in the step of the Image neighborhood average of neighborhood window respectively:
Input picture p, navigational figure I, image I*P, image I*I are converted to respectively the read group total of Image neighborhood window pixel value in the calculating of the Image neighborhood average of neighborhood window;
By building the summation of the first kernel function difference computed image neighborhood window pixel value.
3. optimization method is accelerated in a kind of guiding filtering based on CUDA according to claim 2, it is characterized in that, the described step by the summation of computed image neighborhood window pixel value respectively of structure first kernel function is specially:
Adopt integrogram to realize the summation of neighborhood window pixel value, carry out CUDA parallel optimization by 4 kernel functions in the first kernel function.
4. optimization method is accelerated in a kind of guiding filtering based on CUDA according to claim 1, it is characterized in that, described guiding filtering is accelerated optimization method and also comprised:
Call 4 kernel functions in the first kernel function, obtain neighborhood window pixel number N, and be saved in constant storage.
CN201510324806.0A 2015-06-12 2015-06-12 A kind of guiding filtering acceleration optimization method based on CUDA Active CN104899840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510324806.0A CN104899840B (en) 2015-06-12 2015-06-12 A kind of guiding filtering acceleration optimization method based on CUDA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510324806.0A CN104899840B (en) 2015-06-12 2015-06-12 A kind of guiding filtering acceleration optimization method based on CUDA

Publications (2)

Publication Number Publication Date
CN104899840A true CN104899840A (en) 2015-09-09
CN104899840B CN104899840B (en) 2018-12-18

Family

ID=54032488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510324806.0A Active CN104899840B (en) 2015-06-12 2015-06-12 A kind of guiding filtering acceleration optimization method based on CUDA

Country Status (1)

Country Link
CN (1) CN104899840B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105096277A (en) * 2015-09-17 2015-11-25 华北电力大学(保定) Image self-adaptive guidance filtering method based on parameter selection
CN106339993A (en) * 2016-08-26 2017-01-18 北京金山猎豹科技有限公司 Human face image polishing method and device and terminal device
CN109816595A (en) * 2017-11-20 2019-05-28 北京京东尚科信息技术有限公司 Image processing method and device
CN112381734A (en) * 2020-11-13 2021-02-19 海南众博数据科技有限公司 Two-dimensional guide filtering method, two-dimensional guide filter and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073982A (en) * 2011-01-10 2011-05-25 西安电子科技大学 Method for realizing acceleration of anisotropic diffusion filtration of overlarge synthetic aperture radar (SAR) image by graphic processing unit (GPU)
CN103745447A (en) * 2014-02-17 2014-04-23 东南大学 Fast parallel achieving method for non-local average filtering
CN104050637A (en) * 2014-06-05 2014-09-17 华侨大学 Quick image defogging method based on two times of guide filtration

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073982A (en) * 2011-01-10 2011-05-25 西安电子科技大学 Method for realizing acceleration of anisotropic diffusion filtration of overlarge synthetic aperture radar (SAR) image by graphic processing unit (GPU)
CN103745447A (en) * 2014-02-17 2014-04-23 东南大学 Fast parallel achieving method for non-local average filtering
CN104050637A (en) * 2014-06-05 2014-09-17 华侨大学 Quick image defogging method based on two times of guide filtration

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AIPIANO: ""引导滤波的OpenCV实现"", 《HTTP://BLOG.CSDN.NET/AICHIPMUNK/ARTICLE/DETAILS/21163543》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105096277A (en) * 2015-09-17 2015-11-25 华北电力大学(保定) Image self-adaptive guidance filtering method based on parameter selection
CN105096277B (en) * 2015-09-17 2017-08-01 华北电力大学(保定) A kind of image adaptive selected based on parameter instructs filtering method
CN106339993A (en) * 2016-08-26 2017-01-18 北京金山猎豹科技有限公司 Human face image polishing method and device and terminal device
CN109816595A (en) * 2017-11-20 2019-05-28 北京京东尚科信息技术有限公司 Image processing method and device
CN109816595B (en) * 2017-11-20 2021-01-26 北京京东尚科信息技术有限公司 Image processing method and device
CN112381734A (en) * 2020-11-13 2021-02-19 海南众博数据科技有限公司 Two-dimensional guide filtering method, two-dimensional guide filter and system

Also Published As

Publication number Publication date
CN104899840B (en) 2018-12-18

Similar Documents

Publication Publication Date Title
Cho et al. Weakly-and self-supervised learning for content-aware deep image retargeting
Xu et al. A distributed canny edge detector: algorithm and FPGA implementation
TWI690896B (en) Image processor, method performed by the same, and non-transitory machine readable storage medium
CN106358003A (en) Video analysis and accelerating method based on thread level flow line
TW202025081A (en) Block operations for an image processor having a two-dimensional execution lane array and a two-dimensional shift register
CN103390262B (en) The acquisition methods of weight coefficient of digital filter and device
CN104899840A (en) Guided-filtering optimization speed-up method based on CUDA
US20230177652A1 (en) Image restoration method and apparatus, and electronic device
CN108765282B (en) Real-time super-resolution method and system based on FPGA
Wang et al. A CUDA-enabled parallel algorithm for accelerating retinex
CN113034391A (en) Multi-mode fusion underwater image enhancement method, system and application
CN110246201B (en) Pencil drawing generation method based on thread-level parallelism
Rahman et al. Parallel implementation of a spatio-temporal visual saliency model
Lu et al. DSP-based image real-time dehazing optimization for improved dark-channel prior algorithm
Cheng et al. GPU fast restoration of non-uniform illumination images
CN104952043A (en) Image filtering method and CT system
Reddy et al. Performance analysis of GPU V/S CPU for image processing applications
CN105791635A (en) GPU-based enhanced video denoising method and apparatus
Bozkurt et al. Effective Gaussian blurring process on graphics processing unit with CUDA
Preethi et al. Gaussian filtering implementation and performance analysis on GPU
Afif et al. Efficient 2D convolution filters implementations on graphics processing unit using NVIDIA CUDA
Song et al. Unsharp masking image enhancement the parallel algorithm based on cross-platform
US20210264560A1 (en) Loading apparatus and method for convolution with stride or dilation of 2
CN103927721A (en) Moving object edge enhancement method based on GPU
Qiu et al. Parallel fast pencil drawing generation algorithm based on GPU

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210909

Address after: Room 109, no.1866, Bohai 12th Road, Port Economic Zone, Binhai New Area, Tianjin 300452

Patentee after: Tianjin Bohua Xinchuang Technology Co.,Ltd.

Address before: 300072 Tianjin City, Nankai District Wei Jin Road No. 92

Patentee before: Tianjin University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211018

Address after: 300452 room 121, No. 1866, Bohai 12th Road, Lingang Economic Zone, Binhai New Area, Tianjin

Patentee after: TIANJIN BOHUA ANCHUANG TECHNOLOGY Co.,Ltd.

Address before: Room 109, no.1866, Bohai 12th Road, Port Economic Zone, Binhai New Area, Tianjin 300452

Patentee before: Tianjin Bohua Xinchuang Technology Co.,Ltd.

TR01 Transfer of patent right