CN106991665A

CN106991665A - Method based on CUDA image co-registration parallel computations

Info

Publication number: CN106991665A
Application number: CN201710182686.4A
Authority: CN
Inventors: 江天; 彭元喜; 彭学锋; 宋明辉; 舒雷志; 张松松; 周士杰; 李俊; 赵健宏
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2017-03-24
Filing date: 2017-03-24
Publication date: 2017-07-28
Anticipated expiration: 2037-03-24
Also published as: CN106991665B

Abstract

The present invention discloses a kind of method based on CUDA image co-registration parallel computations, and this method step includes：S1. image is split：Obtain source images to be fused and split respectively, multiple subgraphs after being split；S2. Parallel Fusion：The subgraph of each source images is corresponded into the multigroup subgraph to be fused of composition respectively, each group subgraph to be fused carries out image co-registration processing by different CUDA cores respectively, obtains subgraph after multigroup fusion parallel；S3. image mosaic：Subgraph is spliced after each group fusion that step S2 is obtained, and obtains final fused image.The present invention realizes the parallel of image co-registration based on CUDA, has the advantages that implementation method is simple, fusion efficiencies are high and syncretizing effect is good.

Description

Method based on CUDA image co-registration parallel computations

Technical field

The present invention relates to image fusion technology field, more particularly to a kind of side based on CUDA image co-registration parallel computations Method.

Background technology

Multi-source image fusion is the information fusion technology using image as research object, and its handle is used same target or scene The different images that different sensors are obtained are fused to piece image, and multiple original image can be reflected in the image after fusion Information, utilizes multi-source image information so that various forms of information are complementary to one another, and are reached to same target or scene so as to cooperate with Comprehensive description purpose, be allowed to be more suitable for visually-perceptible or computer disposal.Image fusion technology has been widely applied to mesh Mark detection, tracking and identification and the field such as context aware, common are the merging of infrared image and visible images, remote sensing figure As fusion etc..As shown in figure 1, image co-registration process is that by image the former image for being defined on image space is become into scaling method first Other spaces are transformed into, are then handled using the special property in these spaces, last reconvert returns image space to obtain To required effect, the certain processing carried out to image transform domain coefficient is fusion rule；As shown in Fig. 2 based on NSCT During the image co-registration of algorithm, (decompose) is decomposed to image respectively, then the sub-band coefficients after conversion are melted Close, finally obtain fused images using NSCT inverse transformations reconstruct (reconstruction).

Become scaling method including the change scaling method based on wavelet analysis, based on many points currently used for the main flow image of image co-registration Theoretical change scaling method of resolution non-lower sampling etc., but due to comprising substantial amounts of two-dimensional convolution and a large amount of iterative calculation etc., generally There are problems that computational efficiency is relatively low, fusion process, the problem also has become image co-registration in actual use One of main restricting factor.Multiresolution analysis method by representative of wavelet transformation mainly uses human eye to local contrast This more sensitive characteristic of conversion ratio, according to certain fusion rule, most significant feature is selected in several original images, such as Edge, line segment etc., and these features are retained in final composograph, such algorithm is although local with good time-frequency Change characteristic, but due to needing to complete by convolution, computation complexity is high, and does not possess translation invariance, is easily caused reconstruct There are obvious Gibbs phenomenons in image；To make up the change scaling method for the multiresolution non-lower sampling that disadvantages mentioned above is proposed, under non- Sampled contour wave conversion algorithm (NSCT) etc., is that image conversion is divided into multi-resolution decomposition and the completion of the step of Directional Decomposition two, each Step is completed by specific wave filter, but is calculated due to introducing substantial amounts of two-dimensional convolution in calculating process so that fusion process It is time-consuming long.In addition, the syncretizing effect of different fusion rules is also different, for such as based on coupled pulse neutral net (PCNN) with And the fusion rule such as Non-negative Matrix Factorization (NMF), with preferable treatment effect, but comprising substantial amounts of iterative operation, further Constrain the performance of image co-registration.It is typical as based on adopting under non-particularly for fusion of the two width pixels for 256*256 image The fusion framework of sample profile wave convert algorithm (NSCT) and Non-negative Matrix Factorization (NMF) time-consuming will reach 188.23s, this is tight The actual use of image fusion technology is constrained again.

In order to realize the acceleration to image co-registration process, a kind of method is to reduce the complexity of algorithm, but can also be led simultaneously Syncretizing effect is caused to be deteriorated；Another method is to realize algorithm in a parallel fashion, such as special using DSP, FPGA and GPU Hardware device, wherein GPU possesses outstanding computation capability and higher memory bandwidth, and computing capability is far above CPU； CUDA (Computer Unified Device Architecture, unified calculation equipment framework) is the base in traditional GPGPU Released on plinth it is a kind of using GPU as the software and hardware architecture of data parallel device, can be opened by all-purpose languages such as C, C++ Hair so that GPU development difficulty is substantially reduced, while obtaining considerable acceleration using GPU computation capability. But the image co-registration framework for being currently based on GPU requires algorithm higher in itself, and generally syncretizing effect is poor, can only be used in Specific framework is accelerated, i.e., accelerator is confined to single algorithm, is hardly formed being all suitable for all algorithms uniformly Acceleration means, which increase the workload of exploitation and complexity, are limited more in actual use.

Chinese patent application CN105245841A discloses one kind and realizes panoramic video monitoring system using CUDA, by many Individual video flowing carries out being spliced to form panoramic video, but this method is limited only to the video of same type of sensor acquisition, is not related to multiclass The fusion treatment of sensor image, and be that complexity is much smaller than image co-registration for algorithm complex relatively low stitching algorithm Computation complexity.

The content of the invention

The technical problem to be solved in the present invention is that：The technical problem existed for prior art, the present invention provides one Kind implementation method is simple, fusion efficiencies are high and the good method based on CUDA image co-registration parallel computations of syncretizing effect.

In order to solve the above technical problems, technical scheme proposed by the present invention is：

A kind of method based on CUDA image co-registration parallel computations, step includes：

S1. image is split：Obtain source images to be fused and split respectively, multiple subgraphs after being split；

S2. Parallel Fusion：The subgraph of each source images is corresponded into the multigroup subgraph to be fused of composition, each group institute respectively State subgraph to be fused and carry out image co-registration processing parallel by different CUDA cores respectively, obtain subgraph after multigroup fusion；

S3. image mosaic：Subgraph is spliced after each group fusion that the step S2 is obtained, and obtains final fusion Image afterwards.

As a further improvement on the present invention：When being split in the step S1, specifically so that between adjacent sub-images Data overlap with specified size；Before being spliced in the step S3, the data overlap is removed.

As a further improvement on the present invention：When being split in the step S1, specifically by source images of m*n sizes etc. The subgraph of multiple (m/a+1) * (n/a+1) sizes is divided into, wherein a is the quantity of decile.

As a further improvement on the present invention：State also includes source images are carried out to include registration before step S1 is split Pre-treatment step.

As a further improvement on the present invention, concretely comprising the following steps for image co-registration processing is carried out in the step S2：

S21. by the data storage of subgraph to be fused described in each group in CUDA global storage；

S22., the data of each group subgraph to be fused in the global storage are transmitted separately to CUDA different threads In block；

S23. the subgraph to be fused that each described thread block side-by-side docking is received carries out including image conversion, coefficient The image co-registration processing of fusion.

As a further improvement on the present invention：In the step S23 when image conversion, coefficient fusion, specifically by specified meter Calculation process by performing parallel.

As a further improvement on the present invention：In the step S23 when image conversion, coefficient fusion, specifically by two dimension volume One or more calculating process pass through simultaneously in long-pending calculating process, image-region energy balane process, matrix decomposition calculating process Row is performed.

As a further improvement on the present invention：Described image conversion is specific to use non-down sampling contourlet transform algorithm；Institute State coefficient fusion and specifically use Algorithms of Non-Negative Matrix Factorization.

As a further improvement on the present invention：The source images to be fused are gathered by different types of sensor respectively Obtain.

Compared with prior art, the advantage of the invention is that：

1) method of the invention based on CUDA image co-registration parallel computations, utilizes original data dependence in image co-registration The characteristic such as small, computational intensity is high, gives full play to CUDA parallel computation advantage, by being divided first input source images Cut, the computation complexity in fusion process can be effectively reduced, the parallel ability in conjunction with CUDA carrys out parallel perform after segmentation respectively The image co-registration of subgraph, subgraph, which is spliced, after each group is merged obtains final fused image, implementation method letter It is single, the acceleration that image co-registration is calculated can be realized, fusion performance is effectively improved；

2) method of the invention based on CUDA image co-registration parallel computations, can by splitting first to source images Greatly reduce the computation complexity of follow-up fusion calculation, while cutting procedure is unrelated with specific blending algorithm, can reduce pair Use the requirement of blending algorithm so that specific single fusion framework is not necessarily limited in fusion process, can be applied to each Plant image and become scaling method and fusion rule, highly versatile；

3) method of the invention based on CUDA image co-registration parallel computations, the hardware characteristicses based on GPU count CUDA parallel Calculate and completed in GPU, the fusion process of each group of subgraph to be fused is one and line function, is carried out in different thread blocks It is parallel execution between processing, different threads block, so that the parallel processing between realizing subgraph, improves image co-registration efficiency； The parallel of blending algorithm is further realized using multiple threads, amalgamation is further improved by the concurrency of algorithm in itself Energy.

Brief description of the drawings

Fig. 1 is the implementation process schematic diagram of traditional image interfusion method.

Fig. 2 is traditional image interfusion method realization principle schematic diagram based on NSCT algorithms.

Fig. 3 is the implementation process schematic diagram of method of the present embodiment based on CUDA image co-registration parallel computations.

Fig. 4 is the realization principle schematic diagram that the present embodiment performs image segmentation.

Fig. 5 is the process schematic split to an auxiliary input source images in the specific embodiment of the invention.

Fig. 6 is the realization principle schematic diagram that the present embodiment performs image co-registration based on CUDA.

Fig. 7 is that the present embodiment implements principle schematic based on CUDA parallel computations.

Fig. 8 is the realization principle schematic diagram for becoming scaling method in the specific embodiment of the invention using NSCT.

Fig. 9 is the result that infrared image and visual image fusion are obtained to be illustrated in the specific embodiment of the invention Figure.

Embodiment

Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and Limit the scope of the invention.

Method of the present invention based on CUDA image co-registration parallel computations is carried out exemplified by being merged below to two images Illustrate, have and the present embodiment identical principle during two width images above fusion treatments.

As shown in figure 3, method of the present embodiment based on CUDA image co-registration parallel computations, step includes：

S2. Parallel Fusion：The subgraph of each source images is corresponded into the multigroup subgraph to be fused of composition respectively, each group is treated Fused subimage carries out image co-registration processing parallel by different CUDA cores respectively, obtains subgraph after multigroup fusion；

Characteristic that the present embodiment utilizes in image co-registration that original data dependence is small, computational intensity is high etc., gives full play to CUDA parallel computation advantage, by splitting first to input source images, can effectively reduce the calculating in fusion process Complexity, the parallel ability in conjunction with CUDA carrys out the parallel image co-registration for performing each subgraph after segmentation, sub after each group is merged Image, which is spliced, obtains final fused image, and implementation method is simple, can realize the acceleration that image co-registration is calculated, have Effect improves fusion performance；It is additionally based on CUDA concurrent techniques and realizes that parallel image is merged, without extra hardware device, supports C/ The high-level languages such as C++, image co-registration framework is easily achieved, and compared to the blending algorithm with such as language such as MATLAB language, sheet Image co-registration performance of the embodiment based on CUDA is higher.

Because initial data correlation is smaller, as shown in figure 4, the present embodiment is carrying out specific image conversion to source images Before, source images are split first, the high dimensional image that will be inputted be divided into several low-dimensionals subgraph (G1, G2,……,G_i... ..., Gn), wherein G_iTo split obtained i-th of subgraph, performed again after the completion of image segmentation it is follow-up and Row is calculated.Traditional image co-registration framework is all calculated and iterated to calculate comprising substantial amounts of two-dimensional convolution, and traditional based on CUDA It is that source images directly are carried out with image conversion to compare in the image co-registration of acceleration, the present embodiment to source images by carrying out image point Cut, the computation complexity of follow-up fusion calculation can be greatly reduced, and cutting procedure is unrelated with specific blending algorithm, thus phase Than in traditional accelerated method based on CUDA, additionally it is possible to reduce to the requirement using blending algorithm so that in fusion process not Need to be confined to single specific fusion framework, existing various images can be applied to and become scaling method and fusion rule, led to It is strong with property.

In the present embodiment, when being split in step S1, size is specified further such that having between adjacent sub-images Data overlap.As shown in figure 4, adjacent sub-images data G_iBetween have partial data overlapping, i.e., each group data G_iBeginning all Include its previous group data G_i-1Several values (dash area in figure), at end include later group data G_i+1Several Value.By ensureing that the edge of adjacent sub-images there are enough data overlaps in cutting procedure, picture size point can be eliminated Cutting may the caused influence to syncretizing effect.(this is sentenced+1 i.e. data overlap of expression segmenting edge in subgraph size The size of two-dimensional convolution calculation template is exemplified by 3*3).

In the present embodiment, when being split in step S1, the source images decile of m*n sizes is specifically divided into multiple (m/a + 1) subgraph of * (n/a+1) size, wherein a is the quantity of decile.It is false so that the source images to m*n sizes are split as an example If the template size used in fusion process is 3*3, the row, column for the source images of m*n sizes carries out a deciles and is total to respectively a²Individual size identical subgraph, after introducing segmenting edge is overlapping, it is (m/a+1) * (n/a+1) size to obtain each subgraph Matrix.The computation complexity of two-dimensional convolution can be expressed as O (m*n*size), wherein, m, n are respectively the picture of pending image Element, size is calculates the size for the calculation template that two-dimensional convolution is used, and size representative value is 3*3, and mould is calculated with two-dimensional convolution The size of plate is exemplified by 3*3, the present embodiment by source images by being divided into a²Individual size is (m/a+1) * (n/a+1) subgraph Calculated, by the pixel of subgraph after segmentation bring into two-dimensional convolution complicated dynamic behaviour formula and with it is undivided when compared, can To obtain O (a²) speed lifting ,+1 in subgraph size represents the data overlap of segmenting edge.

It is the process that an auxiliary input source images are split in the specific embodiment of the invention such as Fig. 5, wherein figure (a) is defeated Enter source images, specific pixel is 512*512, and original image row, column is halved respectively using above-mentioned dividing method, can be with The subgraph of 4 formed objects is obtained, is introduced after imbricate mechanism, the size of each subgraph is 257*257, subsequently should 4 subgraphs are passed in GPU different threads block and carry out parallel computation.

In the present embodiment, source images to be fused are specifically collected by different sensors respectively, and step S1 is divided Also include source images are carried out to include the pre-treatment step of registration before cutting, then above-mentioned space chi is performed to the source images after each registration Very little segmentation, it is possible to achieve the fusion treatment of multiclass sensor image.

Each source images are divided into after multiple subgraphs by the present embodiment by step S1, then are obtained step S1 by step S2 To each subgraph handled in GPU concurrent operation unit, to perform image co-registration parallel computation.Flowmeter based on CUDA Three levels, i.e. thread lattice, thread block and thread can be divided into by calculating model, and top layer is thread lattice (Grid), each thread lattice It is made up of several thread blocks, each thread block is made up of several threads again, thread is used as minimum computing unit, thread All threads in block can be performed simultaneously, and each thread has one's own privately owned register, while being carried in thread block Use, can be realized between the thread block inside same thread lattice parallel for all threads in the thread block for shared register Calculate.

In the present embodiment, concretely comprising the following steps for image co-registration processing is carried out in step S2：

S21. by the data storage of each group subgraph to be fused in CUDA global storage；

S22. the data of the subgraph to be fused of each group in global storage are transmitted separately in CUDA different threads block；

S23. the subgraph to be fused that each thread block side-by-side docking is received carries out including image conversion, coefficient fusion Image co-registration processing.

The present embodiment splits step S1 the global storage that obtained all sub-image datas are stored in GPU In (global memory), the view data in global storage is again in the form of multiple kernal functions in different CUDA Calculate core to be calculated, specific perform includes image change scaling method and to image transformation series in each concurrent operation unit Several processing (realization of fusion rule), realizes the parallel computation between different subgraphs.

In the present embodiment, image conversion is specific to use NSCT non-down sampling contourlet transform algorithms, and coefficient fusion is specifically adopted Non-negative Matrix Factorization is based on NMF.Scaling method and fusion rule are become using above-mentioned image, simple and resulting fusion is realized Effect is good, and other images can also be selected to become scaling method and fusion rule according to the actual requirements certainly.

As shown in Figure 6,7, the present embodiment is set exemplified by merging two images Image1, Image2 by main frame (HOST) Standby and computing device (GPU) realizes image co-registration parallel computation, and wherein CUDA parallel computations are completed in GPU, and operation CUDA parallel computation functions are referred to as Kernal functions；By step 1) in the obtained subgraph of each source images correspond structure respectively Into multigroup subgraph to be fused, such as Image1, Image2 subgraph G1 constitutes first group of subgraph to be fused, Image1, Image2 subgraph G2 constitutes second group of subgraph to be fused, and the rest may be inferred, and Image1, Image2 subgraph Gn constitutes the N groups subgraph to be fused, each group subgraph to be fused is sent respectively to different threads block in GUP, and every group of subgraph to be fused is made Concurrent operation unit for a Kernal function in GPU is handled, by each group sub-image data to be fused with multiple The form of kernal functions calculates core in different CUDA and calculated, i.e. the corresponding Kernal letters of each group subgraph to be fused Number is handled in different thread blocks respectively, is performed parallel between different threads block, is realized parallel between subgraph Fusion treatment.The characteristics of the present embodiment takes into full account used image co-registration data self-characteristic and CUDA hardware configurations, With reference to image segmentation, the acceleration of fusion process is realized based on the mode of CUDA parallel computations, calculating can be greatly reduced complicated Degree, while improving fusion efficiencies.

In the present embodiment, in step S23 when image conversion, coefficient fusion, image is further become into scaling method, fusion rule Middle finger determines calculating process by performing parallel, i.e., the parallel of algorithm is introduced when handling single subgraph.In each thread Inside block, due to there is privately owned register and shared register between each thread, the present embodiment further utilizes multiple threads To realize the parallel of blending algorithm；Other feasible parallel hold can also be supported the dividing processing of image in step S1 simultaneously OK, by, by specific algorithm Parallel Implementation, can further improve the whole of image co-registration inside GUP each parallel computation unit Body parallel performance, so as to further improve fusion speed.

In the present embodiment, in step S23 when image conversion, coefficient fusion, NSCT images are specifically become into two dimension in scaling method Matrix in the calculating process of convolution, image RE region energies calculating process, the coefficient blending algorithm based on Non-negative Matrix Factorization NMF Image can also be become its in scaling method, fusion rule by one or more processes certainly by performing parallel in NMF decomposable processes He performs feasible calculating process parallel.

The present embodiment after above-mentioned Parallel Fusion, incites somebody to action i-th of subgraph of institute's active image by step S3 To all subgraphs spliced, finally give complete fused images.The present embodiment is in splicing, to segmenting edge Certain processing is carried out, including removes the overlapped data introduced in S1 steps.Specifically show as in the matrix form, through S2 steps The pixel for handling subgraph after obtained fusion is (m/a+1) * (n/a+1), is removed after overlapped data, fused subimage pixel is (m/a) * (n/a), then the splicing of subgraph is carried out, finally give the complete fusion image after Parallel Fusion.

The implementation process for becoming scaling method in the specific embodiment of the invention using NSCT is illustrated in figure 8, wherein figure (a) is NSCT becomes the realization principle of scaling method, and figure (b) is the principle of Directional Decomposition algorithm (NSDFB) in NSCT.Analysis directions, which are decomposed, to be calculated Method is understood, is calculated in its calculating process comprising substantial amounts of two-dimensional convolution, big simultaneously for being included in the processing procedure of conversion coefficient The iterative calculation of amount, the common feature of two kinds of calculating process is all that data dependence is smaller, but very quick to the size of data volume Sense.The multi-resolution decomposition and the step of Directional Decomposition two that NSCT can be divided into image to the process that image is converted are completed, multiple dimensioned point Solution realizes that Directional Decomposition is realized using non-lower sampling anisotropic filter group using non-lower sampling Laplace filter (NLP).Such as Scheme shown in (b), for Directional Decomposition algorithm (NSDFB), operated in calculating process comprising substantial amounts of two-dimensional convolution, it is necessary to substantial amounts of Matrix convolution computing；When constructing two dimension NSDFB, a kind of efficient approach is that one-dimensional filtering device is entered in the way of Mapping Row lifting, specifically shows as following Lifting structures：

Wherein P, Q are respectively one-dimensional low pass filter and one-dimensional high-pass filter.Two dimension is being constructed by mapping modes After NSDFB, the process that view data travel direction is decomposed is realized in the form of following two-dimensional convolution：

Wherein Mr, Mc, Nr, Nc are respectively matrix A, B line number and columns.

Amount of calculation and the line number of matrix and the product of columns are can be seen that into just from the calculation formula of above-mentioned two-dimensional convolution Than using the above-mentioned image co-registration parallel calculating method of the present embodiment, after splitting to source images, when image is divided into a After part, O (a can be at least obtained by way of parallel computation²) performance boost, you can to effectively reduce amount of calculation, enter One step demonstrates the feasibility and performance of the above-mentioned method based on CUDA image co-registration parallel computations of the present embodiment.Melt for coefficient The coefficient processing rule of conjunction, such as PCNN, NMF, with identical acceleration mechanism as described above.

It is illustrated in figure 9 in the specific embodiment of the invention and infrared image is obtained with visual image fusion using the above method The result arrived, wherein the image that (a) figure, which is infrared sensor, to be collected, (b) figure is the figure that visible light sensor is collected Picture, the image syncretizing effect figure that (c) figure is obtained when being and not splitting to source images, (d) figure is that source images are carried out after the quartering most The fused images obtained eventually, (e) figure is the difference figure of (c) figure and (d), can be seen that source images segmentation to syncretizing effect from (e) figure Influence is minimum, i.e., the present embodiment above method can realize that effective integration accelerates based on CUDA, while ensure that syncretizing effect.

For the validity of further checking the present embodiment above method, it is not right to have been carried out respectively on based on MATLAB platforms Source images do dividing processing fusion, to source images carry out the quartering when fusion and to source images carry out the quartering when melt Experiment is closed, input source images are visible images and infrared image after registered, and pixel is 512*128.As table 1 below is shown The comparative test result of time-consuming, entropy when taking different partitioning schemes for same blending algorithm, including source images are not done split The fusion merged the used time and source images are carried out with the timesharing such as 16 when merging the used time, source images being carried out with the quartering of processing Used time, and the image entropy of fused images that Different treatments are obtained, as seen from the experiment, do not reducing syncretizing effect Meanwhile, the present embodiment above method can also significantly improve the speed of image co-registration.

Table 1：The contrast table of the different partitioning schemes of same blending algorithm.

Processing mode	It is undivided	The quartering	16 deciles
				Time-consuming (s)	1	4.06	15.12
Entropy	6.5376	6.5373	6.5373

Above-mentioned simply presently preferred embodiments of the present invention, not makees any formal limitation to the present invention.Although of the invention It is disclosed above with preferred embodiment, but it is not limited to the present invention.Therefore, it is every without departing from technical solution of the present invention Content, according to the technology of the present invention essence to any simple modifications, equivalents, and modifications made for any of the above embodiments, all should fall In the range of technical solution of the present invention protection.

Claims

1. a kind of method based on CUDA image co-registration parallel computations, it is characterised in that step includes：

S2. Parallel Fusion：The subgraph of each source images is corresponded respectively and constitutes multigroup subgraph to be fused, is treated described in each group Fused subimage carries out image co-registration processing parallel by different CUDA cores respectively, obtains subgraph after multigroup fusion；

S3. image mosaic：Subgraph is spliced after each group fusion that the step S2 is obtained, and obtains scheming after final fusion Picture.

2. the method according to claim 1 based on CUDA image co-registration parallel computations, it is characterised in that：The step S1 It is middle when being split, it is specific make it that there is the data overlap for specifying size between adjacent sub-images；Spelled in the step S3 Before connecing, the data overlap is removed.

3. the method according to claim 2 based on CUDA image co-registration parallel computations, it is characterised in that：The step S1 It is middle that the source images decile of m*n sizes is specifically divided into the subgraph of multiple (m/a+1) * (n/a+1) sizes when being split, Wherein a is the quantity of decile.

4. the method according to claim 3 based on CUDA image co-registration parallel computations, it is characterised in that：The step S1 Also include source images are carried out to include the pre-treatment step of registration before being split.

5. the method based on CUDA image co-registration parallel computations according to any one in Claims 1 to 4, its feature exists In progress image co-registration processing concretely comprises the following steps in the step S2：

S22. the data of each group subgraph to be fused in the global storage are transmitted separately in CUDA different threads block；

S23. the subgraph to be fused that each described thread block side-by-side docking is received carries out including image conversion, coefficient fusion Image co-registration processing.

6. the method according to claim 5 based on CUDA image co-registration parallel computations, it is characterised in that：The step In S23 when image conversion, coefficient fusion, specifically by specified calculating process by performing parallel.

7. the method according to claim 6 based on CUDA image co-registration parallel computations, it is characterised in that：The step In S23 when image conversion, coefficient fusion, specifically by the calculating process of two-dimensional convolution, image-region energy balane process, matrix point One or more calculating process by performing parallel in solution calculating process.

8. the method based on CUDA image co-registration parallel computations according to claim 6 or 7, it is characterised in that：The figure Non-down sampling contourlet transform algorithm is used as conversion is specific；The coefficient fusion specifically uses Algorithms of Non-Negative Matrix Factorization.

9. the method based on CUDA image co-registration parallel computations according to any one in Claims 1 to 4, its feature exists In the source images to be fused are gathered by different types of sensor obtain respectively.