CN111539997A - Image parallel registration method, system and device based on GPU computing platform - Google Patents

Image parallel registration method, system and device based on GPU computing platform Download PDF

Info

Publication number
CN111539997A
CN111539997A CN202010326223.2A CN202010326223A CN111539997A CN 111539997 A CN111539997 A CN 111539997A CN 202010326223 A CN202010326223 A CN 202010326223A CN 111539997 A CN111539997 A CN 111539997A
Authority
CN
China
Prior art keywords
image
data
images
gpu
registered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010326223.2A
Other languages
Chinese (zh)
Other versions
CN111539997B (en
Inventor
赵美婷
蒿杰
吕志丰
范秋香
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Institute Of Artificial Intelligence And Advanced Computing Institute Of Automation Chinese Academy Of Sciences
Institute of Automation of Chinese Academy of Science
Original Assignee
Guangzhou Institute Of Artificial Intelligence And Advanced Computing Institute Of Automation Chinese Academy Of Sciences
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Institute Of Artificial Intelligence And Advanced Computing Institute Of Automation Chinese Academy Of Sciences, Institute of Automation of Chinese Academy of Science filed Critical Guangzhou Institute Of Artificial Intelligence And Advanced Computing Institute Of Automation Chinese Academy Of Sciences
Priority to CN202010326223.2A priority Critical patent/CN111539997B/en
Publication of CN111539997A publication Critical patent/CN111539997A/en
Application granted granted Critical
Publication of CN111539997B publication Critical patent/CN111539997B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/37Determination of transform parameters for the alignment of images, i.e. image registration using transform domain methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image registration, and particularly relates to a method, a system and a device for parallel image registration based on a GPU (graphics processing unit) computing platform, aiming at solving the problem of low processing efficiency of an image registration algorithm based on Fourier transform under massive images in the prior art. According to the image parallel registration method based on the GPU computing platform, image registration is parallelized, multiple GPU task division is carried out on massive images, sub tasks are divided according to the size of image resolution, the sub tasks are distributed to thread blocks of the GPU, data computation is completed in parallel in kernel functions based on a Fourier transform registration algorithm, and therefore image registration is accelerated, and each sub step of the Fourier transform registration algorithm is completed in the GPU kernel functions, so that the parallel efficiency in each GPU is maximized. The invention realizes the pipeline parallelism of three processes of data transmission, registration, return and disk writing by adopting an asynchronous transmission mode, improves the efficiency of parallel registration of massive images and realizes real-time processing.

Description

Image parallel registration method, system and device based on GPU computing platform
Technical Field
The invention belongs to the technical field of image registration, and particularly relates to a method, a system and a device for parallel image registration based on a GPU computing platform.
Background
Image registration is an important technology in image processing, and mainly refers to a process of aligning two or more images of the same object at spatial positions and mapping one image onto the other image by finding a spatial transformation so that points corresponding to the same spatial position in the two images are in one-to-one correspondence. Image registration is an important step for accurately obtaining image information, and has wide research and application in the fields of remote sensing images, medical images, computer vision, target positioning, even neural research and the like.
The image registration algorithm has different classification modes according to different methods, and comprises a feature-based registration algorithm, a frequency domain transformation-based image registration algorithm and a gray-scale-based image registration algorithm. Among them, the image registration algorithm based on the frequency domain is also the registration algorithm which is widely applied at present, and the most common one is fourier transform. The algorithm has high inclusiveness on image translation and zooming in registration, but the data volume calculated by the algorithm is very large, and particularly when high-resolution images are registered, the image processing efficiency is low, and the research efficiency of researchers is limited. Especially, when massive image data is processed, the processing efficiency is greatly reduced, and a long image registration wait becomes a difficult problem and a research hotspot in practical research.
In recent years, Graphics Processing Units (GPUs) have become the first accelerator in the field of high-performance parallel computing. An important approach to solving parallel computing using GPUs is to use the cuda (computedified Device architecture) architecture. The CUDA is a programming model released by NVIDIA corporation in 2007, which is a heterogeneous programming model of CPU + GPU. Due to the appearance of the CUDA, GPU programming becomes simpler, the function is stronger, and the application field is wider. The research efficiency is limited by the problems of long image registration time, low efficiency and the like under mass data, parallel acceleration of the algorithm is necessary, and the acceleration of the algorithm by using the GPU becomes a problem which needs to be solved urgently in the field.
Disclosure of Invention
In order to solve the above-mentioned problems in the prior art, that is, to solve the problem of low processing efficiency of an image registration algorithm based on fourier transform under a mass of images in the prior art, a first aspect of the present invention provides an image parallel registration method based on a GPU computing platform, where the number of GPUs in the GPU computing platform is X, and the method includes the following steps:
step S100, acquiring a template image, acquiring frequency domain data of the template image through a first registration algorithm to serve as first data, and respectively storing the first data into a video memory of each GPU; the first registration algorithm is a Fourier transform-based registration algorithm;
step S200, segmenting the template image to obtain N images with the same resolution, calculating corresponding frequency domain data through the first registration algorithm to serve as second data, and storing the second data into a video memory of each GPU;
step S300, acquiring a group of images to be registered, dividing the images to be registered in the group of images to be registered, and respectively inputting the images to be registered into X memory buffer areas;
step S400, each GPU reads the image to be registered in the corresponding memory buffer area to a video memory, and respectively acquires the frequency domain image of each image to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image;
step S500, segmenting the first image to obtain N second images with the same resolution, and respectively calculating through the first registration algorithm to obtain corresponding frequency domain data as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by a preset translation parameter calculation method and translating to obtain a registered image.
In some preferred embodiments, the "segmenting the template image" in step S200 and the "segmenting the first image" in step S500 are based on a preset segmentation method, where the preset segmentation method is: the method comprises the following steps of segmenting an image to be segmented through a sliding window with preset parameters, and obtaining N small images with the same resolution ratio after segmentation, wherein the calculation formula of N is as follows:
Figure BDA0002463313640000031
wherein, W is the width of the image to be segmented, H is the height of the image to be segmented, Sw is the width of the sliding window, Sh is the height of the sliding window, and D is the sliding step length of the sliding window. In some preferred technical solutions, the method for calculating the preset translation parameter in step S400 specifically includes the following steps:
step A100, based on the first data and the third data, calculating through a CUDA library function and inverse Fourier transform to obtain time domain data of each image to be registered;
step A200, based on the time domain data, obtaining the translation parameters of each image to be registered through kernel function calculation.
In some preferred technical solutions, the GPU computing platform further includes a CPU, and the method further includes the following steps: and step S600, each GPU respectively transmits the registered images to a CPU memory and stores the images to a hard disk.
In some preferred embodiments, the step S100 of obtaining the frequency domain data of the template image through the first registration algorithm is completed in the GPU.
In some preferred technical solutions, "the template image is segmented by a preset segmentation method in step S200 to obtain N images with the same resolution, and frequency domain data corresponding to the N images are obtained by calculation through the first registration algorithm, and the obtained frequency domain data is completed in the GPU as second data.
In some preferred technical solutions, in the step S300, "dividing the images to be registered in the image group to be registered" is to divide the images to be registered in the image group to be registered based on the number of the images to be registered and the number of GPUs.
The invention provides an image parallel registration system based on a GPU computing platform, which comprises a CPU module and X same GPU modules;
the CPU module is configured to transmit the template images to the GPU module, divide the images to be registered in the images to be registered based on the number of the images to be registered and the number of the GPU modules, and input the divided images to be registered into X memory buffers respectively;
the GPU module is configured to acquire a template image from the CPU module, acquire frequency domain data of the template image through a first registration algorithm, serve as first data, and respectively store the first data into a video memory; the first registration algorithm is a Fourier transform-based registration algorithm;
segmenting the template image by a preset segmentation method to obtain N images with the same resolution, calculating corresponding frequency domain data by the first registration algorithm respectively to serve as second data, and storing the second data into a video memory respectively;
reading the images to be registered in the corresponding memory buffer area to a video memory, and respectively acquiring the frequency domain images of the images to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image;
segmenting the first image by a preset segmentation method to obtain N second images with the same resolution, and respectively calculating corresponding frequency domain data by the first registration algorithm to serve as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by a preset translation parameter calculation method, translating to obtain a registered image, and transmitting the registered image to the CPU module.
A third aspect of the present invention provides a storage device, in which a plurality of programs are stored, and the program applications are loaded and executed by a processor to implement the image parallel registration method based on the GPU computing platform according to any of the above technical solutions.
A fourth aspect of the present invention provides a processing apparatus, comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the image parallel registration method based on the GPU computing platform in any one of the technical schemes.
The invention has the beneficial effects that:
according to the invention, a GPU computing platform is used, massive images can be processed in real time based on a multi-GPU parallel technology, and each sub-step of a registration algorithm based on Fourier transform is completed in a GPU kernel function, so that the parallel efficiency in each GPU is maximized.
According to the image registration method, image registration is parallelized, multi-GPU task division is carried out on massive images, sub-tasks are divided according to the resolution of the images, the sub-tasks are distributed to thread blocks of a GPU, data calculation is completed in a kernel function in parallel, and therefore image registration is accelerated.
The invention divides the whole processing flow into three stages, and adopts an asynchronous transmission mode to enable the data transmission and the GPU calculation to be executed in parallel, thereby realizing the pipeline parallel of the three processes of data transmission, registration, return and writing into a disk, further improving the efficiency of parallel registration of massive images and achieving the real-time processing.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a block flow diagram of an embodiment of a method for parallel image registration based on a GPU computing platform according to the present invention;
FIG. 2 is a thread relationship diagram of an embodiment of an image parallel registration method based on a GPU computing platform according to the present invention;
FIG. 3 is a diagram illustrating the relationship between a memory buffer and a GPU when template data is processed according to an embodiment of the image parallel registration method based on a GPU computing platform;
FIG. 4 is a schematic pipeline diagram of three processing stages in an embodiment of an image parallel registration method based on a GPU computing platform according to the present invention;
FIG. 5 is a diagram illustrating a relationship between a memory buffer and a GPU in image parallel registration according to an embodiment of the image parallel registration method based on a GPU computing platform;
FIG. 6 is a flowchart of a global image parallel registration algorithm of an embodiment of the image parallel registration method based on a GPU computing platform of the present invention;
fig. 7 is a block diagram of a local image parallel registration flow of an embodiment of an image parallel registration method based on a GPU computing platform according to the present invention.
Detailed Description
In order to make the embodiments, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
In the image parallel registration method based on the GPU computing platform, the number of GPUs in the GPU computing platform is X, in the preferred embodiment of the invention, X is a positive integer, and it needs to be noted that the image parallel registration method is also suitable for a single GPU; in order to more fully illustrate the advantages of the present invention, X ≧ 2 is taken as an example in the present specification for specific description, and specifically, the present invention is described below with multiple GPUs as an example, and the image parallel registration method based on the GPU computing platform of the present invention includes the following steps:
step S100, acquiring a template image, acquiring frequency domain data of the template image through a first registration algorithm to serve as first data, and respectively storing the first data into a video memory of each GPU; the first registration algorithm is a registration algorithm based on Fourier transform;
step S200, segmenting the template image to obtain N images with the same resolution, calculating corresponding frequency domain data through the first registration algorithm to serve as second data, and storing the second data into a video memory of each GPU;
step S300, acquiring a group of images to be registered, dividing the images to be registered in the group of images to be registered, and respectively inputting the images to be registered into X memory buffer areas;
step S400, each GPU reads the image to be registered in the corresponding memory buffer area to a video memory, and respectively acquires the frequency domain image of each image to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image;
step S500, segmenting the first image to obtain N second images with the same resolution, and respectively calculating through the first registration algorithm to obtain corresponding frequency domain data as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by a preset translation parameter calculation method and translating to obtain a registered image.
The registration algorithm is completed by a heterogeneous computing platform based on a CPU + GPU, and the registration method can be used for completing the registration of the high-resolution images in real time, quickly, accurately and efficiently under the condition of mass images.
For the purpose of more clearly illustrating the present invention, a preferred embodiment of the present invention will be described in detail below with reference to the accompanying drawings.
The invention provides an image parallel registration method based on a GPU computing platform, and as a preferred embodiment of the invention, programs are written by C/C + + and CUDA. The CUDA programming model enables GPU programming to be simple and more powerful. The CUDA technology adopts a new general parallel interface, does not need to use a graphic API interface, and can carry out GPU programming by using a general programming language C/C + +. In the preferred embodiment, a dual GPU computing platform is used. I.e. X is 2. The flow chart of the image parallel registration method based on the GPU computing platform is shown in FIG. 1, when massive images are processed, firstly, a GPU obtains a template image, the template image is segmented according to registration algorithm parameters, and related template data of the template image which is calculated in advance is respectively stored in each GPU video memory. And then registering the image to be registered, wherein the whole process of the registration of the method comprises the steps of dividing the number of memory buffer areas according to the number of the GPUs, sequentially reading the image to be registered into different memory buffer areas, sequentially transmitting a plurality of image data to each GPU video memory for registration processing through a CUDA programming zero-copy technology, and respectively returning the registered image to the memory and storing the image in a disk after the registration is finished. And image data is formed on the whole data transmission and is transmitted into a video memory, a GPU (graphics processing unit) is used for parallel registration, and a registration result is transmitted back to a real-time pipeline processing of a memory and a write-in disk. And (4) each time, putting the complete image to be registered into a GPU memory, carrying out parallel registration until the registration is finished, and returning the registration result to the GPU memory. The CPU only performs the tasks of reading an original image and storing a registration result image, and all the steps of the registration algorithm are finished in the GPU, so that the parallel processing efficiency is greatly improved.
Specifically, a whole image is subjected to parallel registration in each GPU, task division is carried out according to the number of images to be registered and the number of GPUs, and tasks are distributed to different GPUs. And the registration in each GPU is divided into a global registration part and a local registration part, the global registration is firstly carried out, then the image is cut on the global registration result, and the local registration is carried out.
Firstly, global registration is carried out, a sub-task is divided according to the size of the image resolution, and the sub-task is allocated to kernel (kernel function) of the GPU. And calculating the global geometric transformation relation of the image relative to the template image according to a registration algorithm based on Fourier transformation.
Further, local registration is performed, because global registration is performed on the whole image, the local registration accuracy of the image is low, and accurate research on the determined part of the image cannot be satisfied. The local registration is to divide the image, divide the image into N small images, divide the sub-tasks according to the size of N and the size of the resolution of the small images, allocate the sub-tasks to a GPU processor, and calculate the small images at the same positions of each small image relative to the template image by using a registration algorithm based on Fourier transform for registration.
In the preferred embodiment of the invention, the template image is a well-known and determined image, and the template image is respectively transmitted into each GPU video memory and divided into tasks which can be executed in parallel. And performing Fourier transform on the whole image of the template image in each GPU according to a registration algorithm based on Fourier transform, storing frequency domain data of the template image in a video memory, wherein the data is global registration template data, namely first data.
Further, template image segmentation is completed in each GPU, and is performed according to a preset segmentation method, the preset segmentation method is sliding window segmentation, an image to be segmented is segmented through a sliding window with preset parameters, N small images with the same resolution are obtained after segmentation, and the calculation formula of N is as follows:
N=Wn*Wm
Figure BDA0002463313640000091
Figure BDA0002463313640000092
Figure BDA0002463313640000093
wherein, W is the width of the image to be segmented, H is the height of the image to be segmented, Sw is the width of the sliding window, Sh is the height of the sliding window, and D is the sliding step length of the sliding window.
In a preferred embodiment of the present invention, after the template image is segmented, task division is performed according to the number N of the small images and the resolution size Sw × Sh of the small images, and parallel computation is completed in a kernel (kernel) of the GPU, that is, according to a registration algorithm based on fourier transform, fourier transform is performed on each small image, and frequency domain data thereof is stored in a video memory, where this data is local registration template data, that is, second data.
In some preferred embodiments, in step S300, "divide the images to be registered in the image group to be registered based on the number of the images to be registered and the number of GPUs, and simultaneously perform task division according to the number of GPUs and buffer areas, where in each task processing process, the GPUs are in one-to-one correspondence with the buffer areas, that is, GPU1 processes the images in buffer area 1, and GPU2 processes the images in buffer area 2, so as to implement task parallel processing of a large number of images, which may specifically refer to fig. 5.
The number of the memory buffers is the same as the number of the GPUs. Each buffer can store the number of images as P during task allocation:
number of M/GPUs
Where M represents the number of images to be registered. It should be noted that the invention is applied to parallel registration under massive images, so when P is an integer, the number of tasks allocated to each GPU by the system is the same, which facilitates parallel completion of registration tasks. When P is a non-integer, the system allocates redundant tasks to any GPU in a random allocation mode, so that the number of the tasks among multiple GPUs is not large in difference, and parallel registration can still be completed. The buffer area is stored by adopting a method of a circular buffer area, and one area is released after each image is processed, so that the continuous processing of mass image data is ensured.
Furthermore, the invention adopts an asynchronous transmission mode to enable the data transmission of the CPU-GPU and the calculation of the GPU to be parallel, thereby avoiding the bottleneck of the data transmission of the CPU-GPU frequently faced by the parallel acceleration of the GPU. The registration of each image is mainly divided into four stages, wherein the first stage is to transfer the image from a CPU memory to a GPU video memory, the second stage is to start kernel function calculation for registration, and the third stage is to transmit the registered image data back to a host computer to be written into a disk. 3 processes of data transmission, registration, transmission back and writing into a disk are realized in a pipeline parallel mode.
Preferably, step S400 is global registration, each sub-step of the fourier transform based registration algorithm is assigned to kernel processing by the GPU. And performing task division on each sub-step of the algorithm according to the image resolution W x H by adopting a global image parallel registration algorithm, calculating the size of a proper thread block, and starting a kernel function to perform parallel calculation. The whole registration process is processed in parallel in the GPU, so that the parallel efficiency of the registration algorithm is maximized.
The preset translation parameter calculation method specifically comprises the following steps: step A100, based on the first data and the third data, calculating through a CUDA library function and inverse Fourier transform to obtain time domain data of each image to be registered; step A200, based on the time domain data, obtaining the translation parameters of each image to be registered through kernel function calculation.
Fig. 6 is a global registration parallel processing flow in the GPU, which mainly includes the following algorithm sub-steps:
algorithm substep 1: according to the method of starting GPU kernel function twice, calculating the sum of global image pixel data in parallel;
algorithm substep 2: and dividing proper thread blocks and grid sizes according to the image resolution, starting a GPU (graphics processing unit) kernel function, and carrying out parallel processing on the global image pixel data and the image median in the algorithm substep 1.
Algorithm substep 3: based on the result obtained in the substep 2, calculating FFT (fast Fourier transform) through a CUDA library function to obtain frequency domain data, namely third data, of the image to be registered;
algorithm substep 4: performing parallel multiplication on the third data and the first data obtained in the substep 3 of the algorithm by using a GPU;
algorithm substep 5: performing inverse Fourier transform on the result in the sub-step 4 according to the CUDA library function to obtain time domain data;
algorithm substep 6: dividing thread blocks and grid sizes according to the resolution, customizing a kernel function, and performing mobile transformation on time domain data;
algorithm substep 7: finding out the coordinate corresponding to the maximum value of the data peak value according to a method of starting the GPU kernel function twice, and further obtaining the translation parameter of each image to be registered;
algorithm substep 8: and appropriate thread blocks and grid sizes are divided according to the image resolution, and the original image is subjected to parallel translation based on the translation parameters in the algorithm substep 7 to obtain well-registered image data, namely the first image.
Step S500 is local registration, the first image is segmented in each GPU according to the preset segmentation method, thread block segmentation is performed according to the image resolution, kernel functions of the GPUs are started to perform parallel segmentation, and the kernel functions are started to obtain N pieces of small image data with the same resolution at a time, that is, the small image data is the second image.
Fig. 7 shows a local registration parallel algorithm flow of the present invention, in which a kernel function is used in each GPU to segment a global registration result image, so as to obtain N small image data at a time, and perform local registration according to a global image parallel registration algorithm. Firstly, fft calculation of N small images is completed once by using a cuFFT library function batch method of CUDA, then parallel local registration algorithm calculation is sequentially and circularly carried out on the N small images in GPU to obtain translation parameters of each local image, subtasks are divided according to global image resolution, proper thread blocks and grid sizes are selected, a kernel function is started once, image local adjustment is carried out in parallel, a final registration result is obtained, and a Zero-Copy method is adopted to asynchronously transmit the registration result back to a CPU memory.
According to the global image parallel registration algorithm, sequentially performing parallel local registration algorithm calculation on N second images in a GPU to obtain corresponding frequency domain data serving as fourth data, obtaining a translation parameter of each local image through a preset translation parameter calculation method based on the second data and the fourth data, dividing subtasks according to global image resolution, selecting proper thread blocks and grid sizes, starting a kernel function, performing parallel processing on the images to obtain final registered images, transmitting the final registered images back to a CPU memory, and storing the final registered images to a hard disk.
Through all the steps, the image parallel registration method (algorithm) based on the GPU computing platform is realized. The invention provides a method for processing massive images by using a multi-GPU parallel technology, and maximizes parallel efficiency in each GPU, so that each sub-step of a registration algorithm is completed in a GPU kernel function. The image registration algorithm is a time-consuming part in image processing, image registration is parallelized, multi-GPU task division is carried out on massive images, sub-tasks are divided according to the resolution of the images, the sub-tasks are distributed to thread blocks of a GPU, data calculation is completed in a kernel function in parallel, and therefore image registration is accelerated; in addition, the whole processing flow is divided into three stages, the data transmission and the GPU calculation are executed in parallel by adopting an asynchronous transmission mode, and the three processes of data transmission, registration, return and writing into a disk are realized in a pipeline parallel mode, so that the efficiency of parallel registration of massive images is further improved, and the real-time processing is realized.
In order to verify the execution efficiency of the method, the method adopts a high-resolution image as original data, randomly selects a partial image as a template reference image, and performs 3 experiments on the basis of ensuring the accuracy of the experiments. The experimental environment is shown in detailed configuration table 1.
Table 1 this experimental environment configuration
Figure BDA0002463313640000121
Figure BDA0002463313640000131
Experiment 1
In the experiment, 2048 × 2048 and 2048 × 1024 high-resolution images are used as original data template images to perform registration experiments, the high efficiency of the parallel registration algorithm is verified by a contrast serial method, and the experimental results are shown in the following table 2.
TABLE 2 Serial method under high resolution image and the calculation time comparison of the method of the present invention
Figure BDA0002463313640000132
As can be seen from Table 2, in a high-resolution image experiment, the parallel registration algorithm of the invention has high execution efficiency, can greatly shorten the registration time, and has an acceleration ratio of about 183 compared with a serial registration algorithm under a CPU. Because all the processes of the registration parallel algorithm are completed in the GPU, the parallel computing performance of the GPU is fully exerted, and the running efficiency of the registration algorithm is effectively improved.
Experiment 2
In the experiment, 2048 × 2048 high-resolution images are used as original data, the stability of the parallel registration algorithm is verified by a contrast serial method under the background of testing massive images, and the experimental result is shown in table 3.
TABLE 3 Serial method under massive image and the calculation time comparison method of the invention
Figure BDA0002463313640000133
Figure BDA0002463313640000141
As can be seen from Table 3, in the experiment of massive images, the parallel registration algorithm of the invention has stable execution, the running time basically increases linearly, and the acceleration ratio is stabilized at about 155 compared with the serial method.
Experiment 3
In the experiment, 2048 × 2048 high-resolution images are used as original data, the running time of parallel algorithms under test of a single GPU and a double GPU is compared, and the experiment result is shown in table 4.
TABLE 4 comparison of computation times for the parallel registration algorithm of the present invention under single GPU and dual GPUs
Figure BDA0002463313640000142
As can be seen from table 4, the parallel registration algorithm of the present invention has a running time 2 times that of the dual GPUs in a single GPU, because in the context of massive images, the dual GPUs adopt task division in parallel, and the number of the images is equally divided according to the number of the GPUs, the running time decreases linearly as the number of the GPUs increases, and the speed-up ratio increases linearly.
A second aspect of the preferred embodiment of the present invention provides an image parallel registration system based on a GPU computing platform, which includes a CPU module and X identical GPU modules, where X is a positive integer, and it should be noted that the image parallel registration system of the present invention is also applicable to a single GPU; in order to more fully illustrate the advantages of the present invention, X ≧ 2 is specifically illustrated in the present specification; the CPU module is configured to transmit the template images to the GPU module, divide the images to be registered in the images to be registered based on the number of the images to be registered and the number of the GPU modules, and input the divided images to be registered into X memory buffers respectively; the GPU module is configured to acquire a template image from the CPU module, acquire frequency domain data of the template image through a first registration algorithm, serve as first data, and respectively store the first data into a video memory; the first registration algorithm is a Fourier transform-based registration algorithm; segmenting the template image by a preset segmentation method to obtain N images with the same resolution, calculating corresponding frequency domain data by the first registration algorithm respectively to serve as second data, and storing the second data into a video memory respectively; reading the images to be registered in the corresponding memory buffer area to a video memory, and respectively acquiring the frequency domain images of the images to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image; segmenting the first image by a preset segmentation method to obtain N second images with the same resolution, and respectively calculating corresponding frequency domain data by the first registration algorithm to serve as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by a preset translation parameter calculation method, translating to obtain a registered image, and transmitting the registered image to the CPU module.
A third aspect of the preferred embodiments of the present invention provides a storage device, in which a plurality of programs are stored, and the programs are loaded and executed by a processor to implement the above-mentioned image parallel registration method based on the GPU computing platform.
A fourth aspect of preferred embodiments of the present invention provides a processing apparatus, comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the image parallel registration method based on the GPU computing platform.
In the technical solution in the embodiment of the present application, at least the following technical effects and advantages are provided:
according to the invention, a GPU computing platform is used, massive images can be processed in real time based on a multi-GPU parallel technology, and each sub-step of a registration algorithm based on Fourier transform is completed in a GPU kernel function, so that the parallel efficiency in each GPU is maximized. According to the image registration method, image registration is parallelized, multi-GPU task division is carried out on massive images, sub-tasks are divided according to the resolution of the images, the sub-tasks are distributed to thread blocks of a GPU, data calculation is completed in a kernel function in parallel, and therefore image registration is accelerated. The invention divides the whole processing flow into three stages, and adopts an asynchronous transmission mode to enable the data transmission and the GPU calculation to be executed in parallel, thereby realizing the pipeline parallel of the three processes of data transmission, registration, return and writing into a disk, further improving the efficiency of parallel registration of massive images and achieving the real-time processing.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method examples, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. An image parallel registration method based on a GPU computing platform is characterized in that the number of GPUs in the GPU computing platform is X, and the method comprises the following steps:
step S100, acquiring a template image, acquiring frequency domain data of the template image through a first registration algorithm to serve as first data, and respectively storing the first data into a video memory of each GPU; the first registration algorithm is a Fourier transform-based registration algorithm;
step S200, segmenting the template image to obtain N images with the same resolution, calculating corresponding frequency domain data through the first registration algorithm to serve as second data, and storing the second data into a video memory of each GPU;
step S300, acquiring a group of images to be registered, dividing the images to be registered in the group of images to be registered, and respectively inputting the images to be registered into X memory buffer areas;
step S400, each GPU reads the image to be registered in the corresponding memory buffer area to a video memory, and respectively acquires the frequency domain image of each image to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image;
step S500, segmenting the first image to obtain N second images with the same resolution, and respectively calculating through the first registration algorithm to obtain corresponding frequency domain data as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by the preset translation parameter calculation method and translating the translation parameter to obtain a registered image.
2. The image parallel registration method based on the GPU computing platform of claim 1, wherein the "segmenting the template image" in step S200 and the "segmenting the first image" in step S500 are based on a preset segmentation method, and the preset segmentation method is: the method comprises the following steps of segmenting an image to be segmented through a sliding window with preset parameters, and obtaining N small images with the same resolution ratio after segmentation, wherein the calculation formula of N is as follows:
Figure FDA0002463313630000021
wherein, W is the width of the image to be segmented, H is the height of the image to be segmented, Sw is the width of the sliding window, Sh is the height of the sliding window, and D is the sliding step length of the sliding window.
3. The image parallel registration method based on the GPU computing platform of claim 1, wherein the pre-set translation parameter calculating method in step S400 specifically comprises the following steps:
step A100, based on the first data and the third data, calculating through a CUDA library function and inverse Fourier transform to obtain time domain data of each image to be registered;
step A200, based on the time domain data, obtaining the translation parameters of each image to be registered through kernel function calculation.
4. A GPU computing platform based image parallel registration method as claimed in claim 1, wherein the GPU computing platform further comprises a CPU, the method further comprising the steps of:
and step S600, each GPU respectively transmits the registered images to a CPU memory and stores the images to a hard disk.
5. A GPU computing platform based image parallel registration method according to claim 1, wherein the step S100 of obtaining the frequency domain data of the template image by the first registration algorithm is completed in the GPU.
6. The image parallel registration method based on the GPU computing platform of claim 5, wherein in step S200, the template image is segmented by a preset segmentation method to obtain N images with the same resolution, and the frequency domain data corresponding to the N images are obtained by the first registration algorithm through calculation respectively and are completed in the GPU as the second data.
7. The image parallel registration method based on the GPU computing platform of claim 1, wherein in step S300, dividing the images to be registered in the image group to be registered is dividing the images to be registered in the image group to be registered based on the number of the images to be registered and the number of GPUs.
8. An image parallel registration system based on a GPU computing platform is characterized by comprising a CPU module and X identical GPU modules;
the CPU module is configured to transmit the template images to the GPU module, divide the images to be registered in the images to be registered based on the number of the images to be registered and the number of the GPU modules, and input the divided images to be registered into X memory buffers respectively;
the GPU module is configured to acquire a template image from the CPU module, acquire frequency domain data of the template image through a first registration algorithm, serve as first data, and respectively store the first data into a video memory; the first registration algorithm is a Fourier transform-based registration algorithm;
segmenting the template image by a preset segmentation method to obtain N images with the same resolution, calculating corresponding frequency domain data by the first registration algorithm respectively to serve as second data, and storing the second data into a video memory respectively;
reading the images to be registered in the corresponding memory buffer area to a video memory, and respectively acquiring the frequency domain images of the images to be registered as third data through a kernel function and a first registration algorithm; based on the first data and the third data, obtaining translation parameters of the image to be registered through a preset translation parameter calculation method, translating the translation parameters, and taking the translated image to be registered as a first image;
segmenting the first image by a preset segmentation method to obtain N second images with the same resolution, and respectively calculating corresponding frequency domain data by the first registration algorithm to serve as fourth data; and based on the second data and the fourth data, obtaining a translation parameter of the second image by a preset translation parameter calculation method, translating to obtain a registered image, and transmitting the registered image to the CPU module.
9. A storage device having stored therein a plurality of programs, wherein the program applications are loaded and executed by a processor to implement the GPU computing platform based image parallel registration method of any of claims 1-7.
10. A processing device comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; characterized in that the program is adapted to be loaded and executed by a processor to implement the GPU computing platform based image parallel registration method of any of claims 1-7.
CN202010326223.2A 2020-04-23 2020-04-23 Image parallel registration method, system and device based on GPU computing platform Active CN111539997B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010326223.2A CN111539997B (en) 2020-04-23 2020-04-23 Image parallel registration method, system and device based on GPU computing platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010326223.2A CN111539997B (en) 2020-04-23 2020-04-23 Image parallel registration method, system and device based on GPU computing platform

Publications (2)

Publication Number Publication Date
CN111539997A true CN111539997A (en) 2020-08-14
CN111539997B CN111539997B (en) 2022-06-10

Family

ID=71978906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010326223.2A Active CN111539997B (en) 2020-04-23 2020-04-23 Image parallel registration method, system and device based on GPU computing platform

Country Status (1)

Country Link
CN (1) CN111539997B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111932595A (en) * 2020-09-24 2020-11-13 平安科技(深圳)有限公司 Image registration method and device, electronic equipment and storage medium
CN114283046A (en) * 2021-11-19 2022-04-05 广州市城市规划勘测设计研究院 Point cloud file registration method and device based on ICP algorithm and storage medium
CN114416365A (en) * 2022-01-18 2022-04-29 北京拙河科技有限公司 Ultra-clear image quality image data processing method and device based on GPU fusion processing
CN117173439A (en) * 2023-11-01 2023-12-05 腾讯科技(深圳)有限公司 Image processing method and device based on GPU, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050271302A1 (en) * 2004-04-21 2005-12-08 Ali Khamene GPU-based image manipulation method for registration applications
US20160198940A1 (en) * 2013-08-28 2016-07-14 Kabushiki Kaisha Topcon Ophthalmologic apparatus
CN106384350A (en) * 2016-09-28 2017-02-08 中国科学院自动化研究所 Neuron activity image dynamic registration method based on CUDA acceleration and neuron activity image dynamic registration device thereof
CN107451955A (en) * 2017-06-20 2017-12-08 昆明理工大学 A kind of K T algorithms rebuild the parallelization implementation method of spot figure in astronomic graph picture
CN109919987A (en) * 2019-01-04 2019-06-21 浙江工业大学 A kind of 3 d medical images registration similarity calculating method based on GPU

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050271302A1 (en) * 2004-04-21 2005-12-08 Ali Khamene GPU-based image manipulation method for registration applications
US20160198940A1 (en) * 2013-08-28 2016-07-14 Kabushiki Kaisha Topcon Ophthalmologic apparatus
CN106384350A (en) * 2016-09-28 2017-02-08 中国科学院自动化研究所 Neuron activity image dynamic registration method based on CUDA acceleration and neuron activity image dynamic registration device thereof
CN107451955A (en) * 2017-06-20 2017-12-08 昆明理工大学 A kind of K T algorithms rebuild the parallelization implementation method of spot figure in astronomic graph picture
CN109919987A (en) * 2019-01-04 2019-06-21 浙江工业大学 A kind of 3 d medical images registration similarity calculating method based on GPU

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIE HAO,ET AL.: "parallel implementation of arbitrary-sized disctrte fourier transform on FPGA", 《INTERNALTIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS》 *
徐如林: "基于GPU的遥感图像配准并行算法研究及应用系统实现", 《CNKI硕士电子期刊》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111932595A (en) * 2020-09-24 2020-11-13 平安科技(深圳)有限公司 Image registration method and device, electronic equipment and storage medium
CN114283046A (en) * 2021-11-19 2022-04-05 广州市城市规划勘测设计研究院 Point cloud file registration method and device based on ICP algorithm and storage medium
CN114416365A (en) * 2022-01-18 2022-04-29 北京拙河科技有限公司 Ultra-clear image quality image data processing method and device based on GPU fusion processing
CN114416365B (en) * 2022-01-18 2022-09-27 北京拙河科技有限公司 Ultra-clear image quality image data processing method and device based on GPU fusion processing
CN117173439A (en) * 2023-11-01 2023-12-05 腾讯科技(深圳)有限公司 Image processing method and device based on GPU, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111539997B (en) 2022-06-10

Similar Documents

Publication Publication Date Title
CN111539997B (en) Image parallel registration method, system and device based on GPU computing platform
Sinha et al. Feature tracking and matching in video using programmable graphics hardware
Wang et al. Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone
Van den Braak et al. Fast hough transform on GPUs: Exploration of algorithm trade-offs
CN112233216B (en) Game image processing method and device and electronic equipment
CN109272442B (en) Method, device and equipment for processing panoramic spherical image and storage medium
CN111325663B (en) Three-dimensional point cloud matching method and device based on parallel architecture and computer equipment
EP3625771A1 (en) A parallelized pipeline for vector graphics and image processing
Afif et al. Computer vision algorithms acceleration using graphic processors NVIDIA CUDA
CN114002701A (en) Method, device, electronic equipment and system for rendering point cloud in real time
US20070211078A1 (en) Image Processing Device And Image Processing Method
CN110427506A (en) Spatial data boundary processing method, device, computer equipment and storage medium
Palossi et al. Gpu-shot: Parallel optimization for real-time 3d local description
US7379599B1 (en) Model based object recognition method using a texture engine
CN116563096B (en) Method and device for determining deformation field for image registration and electronic equipment
CN115496835B (en) Point cloud data color-imparting method and system based on CPU and GPU heterogeneous parallel architecture
CN114882085B (en) Three-dimensional point cloud registration method and system based on single cube
CN113344765B (en) Frequency domain astronomical image target detection method and system
JP2023092446A (en) Cargo counting method and apparatus, computer apparatus, and storage medium
CN115761119A (en) Neighborhood voxel storage method and device, computer equipment and storage medium
CN111445503B (en) Pyramid mutual information image registration method based on parallel programming model on GPU cluster
CN113989374A (en) Method, device and storage medium for object positioning under discontinuous observation conditions
Morar et al. GPU accelerated 2D and 3D image processing
Ivan et al. Light field depth estimation on off-the-shelf mobile GPU
Lu et al. Fast implementation of image mosaicing on GPU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100190 No. 95 East Zhongguancun Road, Beijing, Haidian District

Applicant after: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Applicant after: Guangdong Institute of artificial intelligence and advanced computing

Address before: 100190 No. 95 East Zhongguancun Road, Beijing, Haidian District

Applicant before: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Applicant before: Guangzhou Institute of artificial intelligence and advanced computing, Institute of automation, Chinese Academy of Sciences

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant