Embodiment
Hereinafter also describe the present invention in detail with reference to accompanying drawing in conjunction with the embodiments.It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.
Be illustrated in figure 1 the process flow diagram of the image de-noising method that preferred embodiment according to the present invention provides.As shown in Figure 1, the image de-noising method that preferred embodiment of the present invention provides comprises step 101-103.
Step 101: graphic process unit obtains and preserves treats denoising image;
Step 102: described graphic process unit adopts n parallel thread to perform n denoising flow process respectively, and described n denoising flow process is to the described denoising flow process treating that in denoising image, n pixel carries out, n be greater than 1 integer;
Specifically, described graphic process unit adopts n parallel thread to perform n denoising flow process respectively, comprise: the Thread Count (referring in thread block actual in the Thread Count used) comprised according to thread block each in described graphic process unit, a described n pixel is divided into multiple groups, the corresponding thread block of each group, a thread in the corresponding its respective thread block of denoising flow process in one group corresponding to a pixel, the sequence number of each pixel and the sequence number of its respective thread is made to carry out one_to_one corresponding, obtain the corresponding relation of the sequence number of each pixel and the sequence number of its respective thread.
Specifically, described corresponding relation is determined according to following formula:
Wherein, x is the sequence number of pixel in the x dimension treating denoising image, y is the sequence number of pixel in the y dimension treating denoising image, blockIDx.x is the sequence number of thread block in x dimension, blockDim.x is the Thread Count that the upper thread block of x dimension comprises, threadIDx.x is a thread sequence number in blockIDx.x, by=blockIDx.y/ntile, blockIDx.y is the sequence number of thread block in y dimension, ntile is that coefficient is (after thread block division rule is determined, this coefficient is exactly the constant determined), blockDim.y is the Thread Count that the upper thread block of y dimension comprises, threadIDx.y is a thread sequence number in blockIDx.y.
Step 103: use the denoising result of a described n parallel thread replace respectively described in treat the pixel value of corresponding pixel points in denoising image, be restored image.
Specifically, use the denoising result of a described n parallel thread replace respectively described in treat the pixel value of corresponding pixel points in denoising image, comprise: at the end of process, find the pixel corresponding to the thread terminated with process according to described sequence number corresponding relation, the denoising result of the thread terminated with described process replaces the pixel value of corresponding pixel points.
In addition, in order to realize adjusting dynamically the Thread Count that each thread block comprises, the present embodiment adopts following methods to realize:
At the end of a described n parallel thread performs described n denoising flow process, record the execution time of a described n parallel thread, adjust according to the described execution time Thread Count that each thread block comprises.When the described execution time is greater than the maximum process time value of described graphic process unit, increase the number of threads that each thread block comprises.
In the present embodiment, denoising flow process is any one denoise algorithm following: medium filtering, Wiener filtering, wavelet filtering, gaussian filtering, mean filter.
Technical scheme of the present invention is described in detail hereinafter for median filtering algorithm:
Denoising image (images of such as 1024 × 1024) is treated in acquisition, and this is treated denoising Image Saving is in graphic process unit, thread structure schematic diagram is concrete as shown in Figure 2, and will have 64 × 64 thread block in graphic process unit, each thread block comprises 16 × 16 threads.1024 × 1024 pixels are divided into 64 parts in x dimension, y dimension is divided into 64 parts, form 64 × 64 groups of pixel set, each group is 16 × 16 pixels, the corresponding thread block of each group, a thread in the corresponding its respective thread block of denoising flow process in one group corresponding to a pixel, makes the sequence number of each pixel and the sequence number of its respective thread carry out one_to_one corresponding, determines described corresponding relation according to following formula one:
Formula one
Table 1
In the present embodiment, table 1, for treating the distribution situation of 1024 × 1024 pixels in denoising image, will treat that denoising image is divided into 64 parts in y dimension, now, ntile is 16, so for the 0th thread block, and in the 0th thread block, in x dimension, sequence number is that in 1, y dimension, sequence number is the thread of 1, to have blockIDx.x to be 0, blockDim.x be 16, threadIDx.x is 1, by is 0, blockDim.y be 16, threadIDx.y is 1, this parameter value is substituted into formula one, can obtain:
x=blockIDx.x×blockDim.x+threadIDx.x=0×16+1=1;
y=by×blockDim.y+threadIDx.y=0×16+1=1。
According to result of calculation, x ties up sequence number and y, and to tie up the pixel that sequence number is respectively 1 be E, and that is, (1,1) the individual thread in the 0th thread block is distributed to pixel E, wherein, the sequence number of pixel E is (1,1).
In like manner, for the 0th thread block, and in the 0th thread block, in x dimension, sequence number is 2, in y dimension, sequence number is the thread of 2, and having blockIDx.x to be 0, blockDim.x is 16, threadIDx.x is 2, by is 0, blockDim.y be 16, threadIDx.y is 2, this parameter value is substituted in formula one, can obtain:
x=blockIDx.x×blockDim.x+threadIDx.x=0×16+2=2;
y=by×blockDim.y+threadIDx.y=0×16+2=2;
According to result of calculation, sequence number is the pixel of (2,2) is K, that is, is that the thread of (2,2) distributes to the pixel K that pixel sequence number is (2,2) by thread sequence number.
In like manner, thread x in the 63rd thread block being tieed up to sequence number to be 16, y dimension sequence number be 16 has, and blockIDx.x is 63, blockDim.x is 16, threadIDx.x be 1022, by is 0, blockDim.y is 16, threadIDx.y is 1, is substituted in formula one by this parameter value and obtains:
x=blockIDx.x×blockDim.x+threadIDx.x=63×16+1022=2030;
y=by×blockDim.y+threadIDx.y=0×16+1=1;
According to result of calculation, sequence number is the pixel of (2030,1) is L, and the thread being that is (2030,1) by thread sequence number by said method distributes to the pixel L that pixel sequence number is (2030,1).
After distributing a thread to each pixel, utilize each thread to perform the denoising flow process of respective pixel, when denoising flow process is median filtering algorithm, concrete processing procedure is:
Suppose pixel size sample window being set to 3 × 3, when the center of sample window arrives pixel E, sample window position contains pixel A, B, C, D, E, F, G, H and I, wherein the pixel value of pixel A is 200, the pixel value of pixel B is 1, the pixel value of pixel C is 108, the pixel value of pixel D is 10, the pixel value of pixel E is 203, the pixel value of pixel F is 11, the pixel value of pixel G is 6, the pixel value of pixel H is 14, the pixel value of pixel I is 100, these pixels that sample window obtains are sorted according to the size order of pixel value, obtaining pixel median is 14, the pixel value of pixel E is replaced to 14.After all flow processing terminate, after namely carrying out above-mentioned process to each pixel, be restored image.
Such as, suppose that GPU configuration parameter is that each thread block comprises 32 Thread Counts, once can run two thread block.Wherein a kind of thread allocation scheme gets 4 thread block, 16 threads (each thread block is maximum comprises 32 threads) are got in each thread block, need operation 2 times (owing to once can run two thread block), employing 64 threads process the denoising flow process of these 64 pixels altogether.In this case, resources occupation rate is 16/ (32 × 2) × 100%=25%, after process terminates, record this execution time t1, this execution time t1 is greater than the minimum treat time T0 of this GPU, the utilization factor that can also continue to improve resource can be found out resources occupation rate or on the execution time, so can adjust the Thread Count used in thread block, such as making allocation scheme into is get 2 thread block, 32 threads are distributed in each thread block, need operation 1 time (owing to once can run two thread block), employing 64 threads process the denoising flow process of these 64 pixels altogether.In this case, resources occupation rate is 32/ (32 × 1) × 100%=100%, and the execution time is T0, and this allocation scheme make use of the resource of GPU on to greatest extent, improves processing speed.
Generally, when actual allocated, for same GPU, described resources occupation rate is larger, and its execution time is shorter; Otherwise described resources occupation rate is less, its execution time is longer.Therefore thread can be distributed according to resources occupation rate or execution time, thus the Thread Count of the actual use of each thread block of dynamic conditioning.
Figure 3 shows that the schematic diagram of the image denoising equipment that preferred embodiment according to the present invention provides.As shown in Figure 3, the image denoising equipment that preferred embodiment of the present invention provides, comprising: graphic process unit, and wherein, described graphic process unit comprises: acquiring unit, treats denoising image, and treat that denoising image is sent to storage unit by described for obtaining; Storage unit, is connected to described acquiring unit, treats denoising image for receiving and storing described in the transmission of described acquiring unit; Processing unit, be connected to described storage unit, perform n denoising flow process respectively for adopting n parallel thread and treat that in denoising image, n pixel carries out denoising flow process to described, wherein, n be greater than 1 integer, described processing unit is also for treating the pixel value of corresponding pixel points in denoising image described in being replaced respectively by the denoising result of a described n parallel thread, and be restored image.
In addition, about the specific operation process of the said equipment with described in said method, therefore repeat no more in this.
In sum, the image de-noising method provided according to present pre-ferred embodiments and equipment, according to image de-noising method provided by the invention, special allocation rule is utilized to distribute in the denoising flow process of each pixel by the thread multiple arranged side by side in graphic process unit, achieve the two-dimentional allocation process of thread, thousands of the threads taken full advantage of in graphic process unit carry out the multiple denoising flow process of executed in parallel, like this, the image that image quality is good can not only be obtained, and improve a lot compared to the execution speed of CPU, real-time is high.In addition, graphic process unit GPU is high relative to CPU integrated level, and price is low, further reduces cost and the volume of equipment.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.