CN107194864A - CT 3-dimensional reconstructions accelerated method and its device based on heterogeneous platform - Google Patents
CT 3-dimensional reconstructions accelerated method and its device based on heterogeneous platform Download PDFInfo
- Publication number
- CN107194864A CN107194864A CN201710270520.8A CN201710270520A CN107194864A CN 107194864 A CN107194864 A CN 107194864A CN 201710270520 A CN201710270520 A CN 201710270520A CN 107194864 A CN107194864 A CN 107194864A
- Authority
- CN
- China
- Prior art keywords
- projection
- grain
- data
- mrow
- calculated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware
Abstract
The present invention relates to a kind of CT 3-dimensional reconstructions accelerated method and its device based on heterogeneous platform, heterogeneous platform includes main frame and isomery OpenCL computing devices, and the accelerated method is included:FDK algorithm for reconstructing is carried out to calculate grain decomposition, each parallel computation flow for calculating grain is analyzed;Acceleration optimization processing is carried out to each calculation grain by the main frame in heterogeneous platform and isomery OpenCL computing devices.What depth of the present invention excavated CT algorithm for reconstructing can concurrency, using GPU+FPGA Heterogeneous Computing pattern, computing system is constituted using the computing unit of different type instruction set and architectural framework, algorithm is matched with isomeric architecture to the full extent, makes full use of the performance of different acceleration components;Storage and the communication plan for being adapted to the efficient computing of algorithm for reconstructing are designed simultaneously, system supports PCI E/Ethernet interconnections, polylith process plate is supported to realize that multiprocessor efficient parallel is handled by interconnection, coprocessor system either synchronously or asynchronously is realized, is improved on the premise of the loss precision of reduction as far as possible and rebuilds speed.
Description
Technical field
The invention belongs to x-ray ct technology field, more particularly to a kind of CT 3-dimensional reconstructions based on heterogeneous platform add
Fast method and its device.
Background technology
X ray computer fault imaging (Computed Tomography, CT) be it is a kind of by the x-ray projection of object Lai
The technology of its attenuation distribution of reverse, covers multiple subjects such as nuclear physics, mathematics, computer, precision instrument.Because CT can be non-
Contact, it is non-destructive under the conditions of obtain the high-precision three-dimensional structural information of interior of articles, therefore successfully developed from Hounsfield
Since First CT, CT is used widely in fields such as Non-Destructive Testing, medical diagnosis, material analysis.
In actual applications, the computing resource and storage resource needed for Cone-Beam CT high-resolution three-dimension is rebuild are all very big,
With the increase of the scale of reconstruction, the storage demand of reconstruction and amount of calculation increase sharply, under many circumstances, it is difficult to meet actual answer
Demand, so that back projection general in algorithm for reconstructing calculates as an example:If each dimension size of 3-D view to be reconstructed is all N, then
The computation complexity of corresponding back projection will be up to O (N4), rebuild the 3-D view that a resolution ratio is 10243, and calculating will circulation
1099500000000 times, it is very time-consuming to complete so big amount of calculation in ordinary PC, it is difficult to meet the requirement of practical application.
Therefore, the problem of acceleration of cone-beam CT reconstruction process is current engineering staff's urgent need to resolve, designs corresponding for CT algorithm for reconstructing
Accelerate platform and acceleration strategy that there is important practical significance, be the difficulty that Industrial Computed Tomography is badly in need of solving in actual applications
Point problem.
The content of the invention
For deficiency of the prior art, the present invention provides a kind of CT 3-dimensional reconstructions acceleration side based on heterogeneous platform
Method and its device, for the characteristic of CT algorithm for reconstructing, are accelerated with reference to the general acceleration device performance such as FPGA, GPU and based on isomery
Platform is realized, it is improved on the premise of loss precision is reduced as far as possible and rebuilds speed, performance is stable, and acceleration effect is preferable.
According to design provided by the present invention, a kind of CT 3-dimensional reconstruction accelerated methods based on heterogeneous platform,
Heterogeneous platform includes main frame and isomery OpenCL computing devices, and the accelerated method includes following content:FDK algorithm for reconstructing is carried out
Calculate grain to decompose, analyze each parallel computation flow for calculating grain;Pass through the main frame in heterogeneous platform and isomery OpenCL computing devices
Acceleration optimization processing is carried out to each calculation grain.
Above-mentioned, described main frame is the CPU of operation main program, and OpenCL computing devices, which are included, runs the different of kernel program
Structure container GPU and FPGA, are communicated, main program is by defining context come pipe between CPU, GPU and FPGA by PCI-E buses
Manage the operation of kernel program.
It is preferred that, FDK algorithm for reconstructing is carried out to calculate grain decomposition, comprising:According to FDK algorithm contents, it is decomposed into:For to throwing
Grain is calculated in the projection weighting that shadow data are weighted, and grain is calculated in the filtering for being filtered to the data for projection after weighting, for inciting somebody to action
Grain is calculated to the back projection rebuild on object by filtered data for projection back projection, and for carrying out reduction process to back projection's result
Reduction calculate grain.
It is preferred that, according to FDK algorithm for reconstructing formula:
By the fractionation to being integrated in formula and discretization, it is divided into:
Grain is calculated in projection weighting, is expressed as:Wherein, p'(θ, u, v) represent
Data after being weighted when rotary index is θ to data for projection,For weight coefficient;
Grain is calculated in filtering, is expressed as:Wherein, dfAfter (θ, u, v) is filtering
Data, h (u) be filter operator unit impulse response, [- um,um] represent the 2m data that detector is gathered per a line;
Grain is calculated by back projection, is expressed as:Wherein, f (x,
Y, z, θ) represent to rebuild the object contribution margin of subpoint to f (x, y, z) when rotary index is θ;
Reduction calculates grain, is expressed as:Wherein, φmaxFor rebuild object rotate a circle when from
Dissipate the projection number of divisions adopted.
Above-mentioned, acceleration optimization processing is carried out to each calculation grain, comprising:Grain is calculated using FPGA to projection weighting to carry out parallel
Processing, by asynchronous transmission to GPU, is handled filtering calculation grain simultaneously in transmitting procedure;Each voxel during with reference to back projection
The data parallel operations of point, the calculating of grain progress multi-threaded parallel back projection is calculated back projection in GPU by tissue points.
It is preferred that, according to reconstruction regions in FDK algorithm for reconstructing in the up each layer data for projection of rotation direction of principal axis and detection
The projection corresponding relation of each row data on device data for projection y direction, will along rotation direction of principal axis using piecemeal Reconstruction Strategy
Region to be reconstructed is divided into some pieces, and taking out corresponding data for projection from external memory storage when rebuilding one of carries out reconstruction behaviour
Make.
It is preferred that, grain is calculated to projection weighting using FPGA and carries out parallel processing, comprising:Global storage is divided into 2 pieces
Bank, realizes that the access of random access memory is balanced by loading distribution;Need to be repeated several times by constant storage storage
The intermediate variable of calculating.
It is preferred that, the calculating of grain progress multi-threaded parallel back projection is calculated back projection by tissue points in GPU, comprising:Using
Based on voxel type of drive, task division is carried out to GPU by reconstructed volumetric data output;Variable unrelated with voxel in calculating is entered
Row separation and merge, and calculate and be stored in GPU constant storage before back projection, when back projection calculates, directly reading
The variable in constant storage is taken to participate in calculating;Optimize the number of a back projection in kernel program.
Further, variable unrelated with voxel in calculating is separated and merged, it is as follows comprising content:Volume data
Middle any point (x, y, z) projects point (u, v) on detector when projection angle is θ, and subpoint (u, v) is calculated as:
U=(x-vCenter) × cos (θ)+(y-vCenter) sin (θ)+pCenter
Dis=(u-pCenter) × a
V=(z- (s0+θ×h)-γ×h/a)×w+γ×h/a+pCenter
, separate and be after merging variable:
U=x × A [0]+y × A [1]+A [2]+pCenter
Dis=(u-pCenter) × a
V=(z-A [4]-γ × A [5]) × w+ γ × A [5]+pCenter
, wherein, vCenter represents volume data center, and pCenter is data for projection center, and α is voxel size, and θ is projection
Angle, r is rotation radiographic source radius of turn, and h is pitch, and γ projects the angle with central beam for beam on central plane.
A kind of CT 3-dimensional reconstruction accelerators based on heterogeneous platform, heterogeneous platform uses PCI-Express conducts
Data-signal and the interconnection bus of control signal are transmitted, and networking control and data transfer are carried out using Ethernet as with outside
Additional busses;There is provided each functional module base of application-oriented layer for application layer of the framework of heterogeneous platform comprising offer functional module
The components layer of interface specification needed for component base and algorithm for reconstructing in different processor, and component-oriented layer and application layer provide clothes
The supporting layer of business, supporting layer comprising perform main program CPU and perform kernel program multiple OpenCL computing devices, CPU and
OpenCL computing devices communicate connection, and described OpenCL computing devices include GPU, FPGA;Based on described heterogeneous platform
Framework CT 3-dimensional reconstructions accelerator include following content:
Grain decomposing module is calculated, for calculate particle shape formula split and discrete by the algorithm according to FDK algorithm for reconstructing content
Change, be decomposed into the projection weighting for carrying out Data correction to data for projection and calculate grain, for being carried out to the data for projection after weighting
Grain is calculated in the filtering of filtering, for filtered data for projection back projection to the back projection rebuild on object to be calculated into grain, and for pair
The reduction that back projection's result carries out reduction process calculates grain;
Accelerate processing module, for being transmitted by additional busses to the data for projection in heterogeneous platform calculate node, root
Set according to user and reconstruction performance is assessed, Coordination Treatment is carried out by interconnection bus, it is complete in GPU and FPGA acceleration components respectively
Preconceived plan grain accelerates, and rebuilds data real-time storage and feeds back to user.
Beneficial effects of the present invention:
Depth of the present invention excavate CT algorithm for reconstructing can concurrency, using GPU+FPGA Heterogeneous Computing pattern, using not
The computing unit composition computing system of same type instruction set and architectural framework, matches with isomeric architecture to the full extent,
The performance of different acceleration components is made full use of, while devising storage and the communication plan of the efficient computing of algorithm for reconstructing, is supported
PCI-E/Ethernet is interconnected, and supports polylith process plate to realize that multiprocessor efficient parallel is handled by interconnection, is realized same
Step or asynchronous coprocessor system, its reconstruction speed is improved on the premise of loss precision is reduced as far as possible, user is met and uses
Demand.
Brief description of the drawings:
Fig. 1 is method flow schematic diagram of the invention;
Fig. 2 is the geometrical relationship schematic diagram of FDK algorithms;
Fig. 3 calculates grain flow chart for the overall of FDK algorithms;
Fig. 4 is heterogeneous platform frame model;
Fig. 5 is heterogeneous platform software block diagram;
Fig. 6 is schematic device of the invention;
Fig. 7 is oil rock core reconstructed results three-dimensional section view.
Embodiment:
For the objects, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with drawings and Examples, to this
Invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, not
For limiting the present invention.
As shown in figure 1, there is provided a kind of CT 3-dimensional reconstructions acceleration side based on heterogeneous platform in one embodiment
Method.Heterogeneous platform includes main frame and isomery OpenCL computing devices in the present embodiment, and described main frame is operation main program
CPU, OpenCL computing device include isomery the container GPU and FPGA of operation kernel program, pass through between CPU, GPU and FPGA
PCI-E buses are communicated, and main program manages the operation of kernel program by defining context.
FPGA uses logical cell array LCA (Logic Cell Array), and inside includes:Configurable logic blocks CLB
(Configurable Logic Block), output input module IOB (Input Oouput Block) and interconnector
(Interconnect) three parts.Field programmable gate array (FPGA) is programming device, with conventional logic circuit and door
Array (such as PAL, GAL, and CPLD device) is compared, and FPGA has a different structures, FPGA using small-sized look-up table (16 ×
Combinational logic 1RAM) is realized, each look-up table is connected to the input of a d type flip flop, trigger drives other to patrol again
Circuit or driving I/O are collected, composition can not only realize combination logic function but also can realize the basic logic unit mould of sequential logic function
Block, these intermodules are interconnected or are connected to I/O modules using metal connecting line.FPGA logic is deposited by internally static state
Storage unit loads programming data to realize, the value of storage in a memory cell determine logic unit logic function and
Connecting mode between each module or between module and I/O, and the function achieved by FPGA is finally determined, FPGA allows unlimited
Secondary programming.GPU architecture is by a highly threading multinuclear stream handle (Streaming Multiprocess, SM)
Array is constituted.Two SM, one structure block of formation, the GPU based on CUDA technologies it is each instead of between, SM in each structure block
Quantity may be different.In addition, each SM contains multiple stream handles (Streaming Processor, SP) again, they it
Between Compliance control logical sum instruction buffer.Each GPU carries the figure double data rate of some GB (GB)
These in (Graphics Double Data Rate, GDDR) DRAM, referred to as global storage (Global Memory), GPU
GDDR DRAM are totally different from the system dram being arranged in CPU systems on mainboard, and the frame that they are mainly used in graphics process delays
Rush region memorizer.In graphics application program, they are used for preserving video image and the texture information rendered for 3D;And for
Calculate, they can be used as bandwidth memory chip.Although the delay than canonical system memory is long, large-scale parallel should
Time delay is made up with the usual high bandwidth of program.
Mass data transfers are to rebuild the difficult point accelerated and emphasis.PCI-E bussing techniques are that a kind of high performance universal I/O is mutual
Connect bus, single channel transmission rate can reach 2.5Gbit/s.Therefore, the present invention is transmitted using PCI-E interface as key data
Bus.GPU existing commercialized PCI-E drivings in terms of communication interface, therefore, the high-speed data between FPGA and PCI-E buses
Transmission design is the key of Platform Designing.The transmission of PCI-E interface is realized on FPGA typically 2 kinds of schemes:One is using special
Bridging chip, another is using the specific FPGA that PCI-E physical interfaces can be achieved.Supported due to current bridging chip
PCI-E port numbers and transmission rate are relatively low, therefore this project uses latter scheme.Data transfer can be divided into 2 kinds of patterns:Common mould
Formula and DMA (Direct Memory Access, direct memory access) transmission mode.Wherein general mode mainly realize main frame and
Command communication between equipment;DMA mode is transmitted mainly for chunk data, needs not move through CPU in transmitting procedure, data are straight
Connect slave unit to be sent in internal memory, the speed of data transfer is fast, can make full use of PCI-E data bandwidths.Therefore, FPGA data
Using DMA transmission mode.
The implementation process of the CT image reconstructions of the present embodiment is detailed below:
S1, to FDK algorithm for reconstructing carry out calculate grain decompose, analyze each calculate grain parallel computation flow:
CT image reconstruction algorithms mainly include two classes:Parse class algorithm and Class of Iterative algorithm.Relative to iterative algorithm, parsing
Algorithm mathematics form is simple, rebuilds that speed is fast, and it is convenient to realize, is the main flow algorithm applied in actual CT system.It is based on various
Among the analytic reconstruction algorithm of filtered backprojection, FDK algorithms computational efficiency is high, implements easily, and when cone angle is smaller,
It can obtain and preferably rebuild effect, be most widely used in practice.FDK algorithms geometrical relationship schematic diagram is as shown in Fig. 2 figure
Middle S represents radiographic source, and f (x, y, z) is represented to rebuild the certain point of object, as voxel, and P (θ, u, v) is represented in object rotation point
Spend the data for projection that detector during for θ collects f (x, y, z) points.According to the principle of FDK algorithms, it mainly includes 3 steps:
Projection weighting, one-dimensional filtering and back projection.Pass through the analysis to algorithm:The weighting of each subpoint be it is separate,
The lower data for projection of each indexing is filtered by row, and filtering is identical, and anti-throwing of the tissue points under each indexing
It is relevant that shadow process is only mapped to the coordinate that detector coordinates fasten with the point, with independence.That is, the three of FDK algorithms
Individual step all has good concurrency, is adapted to carry out algorithm acceleration with the mode of parallel processing.
In one embodiment, as shown in figure 3, according to FDK algorithm contents, being decomposed into:For being weighted to data for projection
Projection weighting calculate grain, grain is calculated for the filtering that is filtered to the data for projection after weighting, for by filtered projection number
Grain is calculated according to back projection to the back projection rebuild on object, and grain is calculated for the reduction that reduction process is carried out to back projection's result.It is logical
Cross amount of storage, amount of calculation, the traffic that grain is calculated in analysis so that efficiently realized on heterogeneous platform.
In some of embodiments of the present invention, FDK algorithm for reconstructing formula:
By the fractionation to being integrated in formula and discretization, it is divided into:
Grain is calculated in projection weighting, is expressed as:Wherein, p'(θ, u, v) represent
Data after being weighted when rotary index is θ to data for projection,For weight coefficient;
Grain is calculated in filtering, is expressed as:Wherein, dfAfter (θ, u, v) is filtering
Data, h (u) be filter operator unit impulse response, [- um,um] represent the 2m data that detector is gathered per a line;
Grain is calculated by back projection, is expressed as:Wherein, f (x,
Y, z, θ) represent to rebuild the object contribution margin of subpoint to f (x, y, z) when rotary index is θ;
Reduction calculates grain, is expressed as:Wherein, φmaxFor rebuild object rotate a circle when from
Dissipate the projection number of divisions adopted.
Weight coefficient only with projection ray detector plane relative position and radiographic source to origin distance dependent, because
The weighting of data for projection under this each subpoint has independence;According to the frequency filtering formula of data for projection, each indexing
Data for projection before filtering all need carry out Fourier transformation, and each index lower data for projection Fourier transformation only and projection
Abscissa Y of the data on planar array detector is relevant, and frequency domain filtering function is all identical, therefore the frequency domain filter of data for projection
Ripple has independence under every a line;Back projection of the body image vegetarian refreshments under each indexing calculates only is mapped to detection with the pixel
Coordinate on device coordinate system is relevant, therefore it is independent that back projection, which is calculated under each tissue points,.
In summary, three steps of FDK algorithms all have good concurrency, are adapted to be carried out with the mode of parallel processing
Algorithm accelerates.Under fine-grained dissection, the overall calculation grain flow of FDK algorithms can be as shown in Figure 3
S2, by the main frame in heterogeneous platform and isomery OpenCL computing devices each calculation grain accelerate at optimization
Reason:
Heterogeneous platform frame model is as shown in Figure 4.Requirement to message transmission rate is rebuild according to CT, in order in each isomery
Efficiently distributed on processor and scheduling reconstruction tasks, heterogeneous platform is used as interconnection bus, transmission data letter using PCI-Express
Number and control signal.Meanwhile, heterogeneous platform as overall platform a calculate node, using Ethernet as additional busses,
Networking control or data transfer are carried out with outside.According to the granularity of CT algorithm for reconstructing, load distribution and parallelization characteristic, the present invention
Main acceleration components are used as using GPU and FPGA.The data for projection of CT collections is entered by Ethernet bus transfers to be accelerated
Platform nodes, are set according to user and reconstruction performance is assessed, and system carries out Coordination Treatment by PCI-E buses, respectively in GPU and
Corresponding grain of calculating is completed in FPGA acceleration components to accelerate, and is rebuild data real-time storage and is fed back to user.
In one embodiment, grain is calculated to projection weighting using FPGA and carries out parallel processing, by asynchronous transmission to GPU,
Grain is calculated in transmitting procedure to filtering simultaneously to handle;The data parallel operations of each tissue points during with reference to back projection, in GPU
Grain, which is calculated back projection, by tissue points carries out the calculating of multi-threaded parallel back projection.
Further, the data for projection of CT images needs to carry out Data correction to geometrical relationship;And it is former according to FDK algorithms
Reason, it can be divided into 3 steps:Weighting, filtering and back projection.Accordingly, it would be desirable to the resource that three steps of reasonable distribution are consumed, with
Reach that acceleration is optimal.On the other hand, FPGA device has reconfigurability, and flexibility is strong, and is appropriate at parallel pipelining process
Reason, GPU devices have substantial amounts of high-performance stream handle, are adapted to data parallel operations pattern, should give full play to and respectively add in design
The speciality of fast device.Data weighting is corrected, filtering and back projection 3 calculate grain and optimize realization respectively.Data weighting is corrected
It is that data for projection is pre-processed, the row data due to calculation grain simply to projection carry out water operation, are adapted in FPGA
Portion is handled by the way of streamline.Therefore, parallel processing is carried out using the correction of FPGA weight datas, and employed different
The mode of transmission is walked, flowing water is weighted and filtered calculating simultaneously in transmitting procedure, reduce processing delay, played FPGA
The characteristics of stream treatment.And weighted filtering and back projection are related to substantial amounts of data operation and storage problem, during with reference to back projection
The data parallel operations feature of each tissue points, based on we carry out multi-threaded parallel back projection by tissue points in GPU when realizing
Calculate, improve resource utilization and rebuild speed.
Further, according to reconstruction regions in FDK algorithm for reconstructing in the up each layer data for projection of rotation direction of principal axis and spy
The projection corresponding relation for each row data surveyed on device data for projection y direction, using piecemeal Reconstruction Strategy, along rotation direction of principal axis
Region to be reconstructed is divided into some pieces, taking out corresponding data for projection from external memory storage when rebuilding one of carries out reconstruction behaviour
Make.
When to be reconstructed be on a grand scale, the storage size on GPU/FPGA boards is likely difficult to meet and once rebuild, therefore,
The strategy that need to be rebuild using piecemeal.According to the formula of FDK algorithms, each layer data and detection of the reconstruction regions on rotation direction of principal axis
Device data for projection has strict projection corresponding relation by each row data on y direction.Therefore, it be able to will be treated along rotation direction of principal axis
Reconstruction regions are divided into some pieces, and only need to take out corresponding data for projection from external memory storage when rebuilding one of carries out reconstruction behaviour
Work.
Further, FPGA hardware characteristicses are considered, calculating grain to projection weighting using FPGA carries out parallel processing, will
Global storage is divided into 2 pieces of bank, realizes that the access of random access memory is balanced by loading distribution;Stored by constant
Device storage needs that the intermediate variable calculated is repeated several times.Global storage is divided into 2 pieces of bank, distributed by loading to realize
2 pieces of DDR2SDRAM access balance, so as to lift the access bandwidth of memory;Using in constant storage storage calculating process
Need that the intermediate variable calculated is repeated several times, save computing resource;Optimize the number of a back projection in kernel function, lifting pair
The access bandwidth of data for projection storage, while reducing the access to rebuilding data storage, is stored by adjusting with reaching to the overall situation
It is optimal that device is accessed.
Further, grain is calculated to back projection by tissue points in GPU and carries out the calculating of multi-threaded parallel back projection, using based on
Voxel type of drive, task division is carried out by reconstructed volumetric data output to GPU;Variable unrelated with voxel in calculating is divided
From and merge, and calculate and be stored in GPU constant storage before back projection, when back projection calculates, directly read often
Variable in number memory participates in calculating;Optimize the number of a back projection in kernel program.
GPU task, which is divided, can divide according to input or be divided by output.For backprojection algorithm, input and be
Data for projection, is output as reconstructed volumetric data, and GPU two kinds of task methods of salary distribution substantially reflect two kinds of different back projections' realities
Existing mode:Based on ray-driven and based on voxel driving.Mode based on ray-driven carries out task division, one by data for projection
Individual or several threads complete the anti-throwing of all voxels on a ray, and all bodies that current ray is passed through are calculated first during anti-throwing
Element, is then assigned to current ray projection value by the value of these voxels, due to the corresponding thread of different rays or same ray correspondence
Different threads may give voxel assignment simultaneously, i.e., this mode be present " writing competition ";The side driven based on voxel
Formula divides task by volume data, and a thread completes the anti-throwing of one or several voxels, current voxel is calculated first during anti-throwing
Orthographic projection position, then take the projection value of the position to be assigned to current voxel.The back projection's mode driven based on voxel is not present
" writing competition " problem, it is not necessary to design extra reduction step, therefore this task method of salary distribution accelerates more suitable for GPU.
In the present embodiment, using the type of drive based on voxel, i.e., task is divided by output.Needed when thread is distributed
Consider that a critically important principle, i.e. SM occupancy can not be too low, SM occupancies refer to the movable Warp numbers on each SM
Amount and the maximum activity Warp ratio of number allowed.Because GPU is to hide long delay operation by the switching of cross-thread (to access
Global storage, inter-thread synchronization etc.), when the thread in a Warp carries out long delay operation, another activity
Thread in Warp just can so hide a part of delay automatically into computing state.But this does not represent SM occupancies
The higher the better, and the GPU resource that the more high then each thread of SM occupancies takes is fewer, i.e., the amount of calculation that each thread is completed is fewer,
And the maximum activity Warp quantity on each SM is certain, even if therefore Warp quantity movable in SM may occur and reach
Maximum, due in Warp each thread amount of calculation very little so that all movable Warp threads simultaneously enter long delay operation,
Can not fully hide latency.Therefore, an equalization point, ability are selected between the amount of calculation that SM occupancies and each thread are completed
GPU performance is set to perform to most preferably.By many experiments, the following thread method of salary distribution can be taken, it is assumed that the scale of volume data
For N3, then Block constant magnitude is (16,16,2), and Grid size becomes with N change to be turned to (N/16, N/16,1).
Back projection needs to calculate any point (x, y, z) in volume data and projected when projection angle is θ on detector
Point (u, v), needs repeatedly to calculate a trigonometric function relevant with geometrical relationship with projection angle and its in this calculating process
His intermediate variable.For each voxel, these identical variables are all calculated only once, it is assumed that volume data scale is N3, then these
Value can be computed repeatedly N3It is secondary, cause the significant wastage of system resource.For this problem, in one embodiment, by computing
In the variable unrelated with voxel (x, y, z) separated and merged, and the constant storage for being stored in GPU is calculated before back projection
In device, during backprojection operation, directly read the variable in constant storage and participate in calculating.
Further, variable unrelated with voxel in calculating is separated and merged, it is as follows comprising content:In volume data
Any point (x, y, z) projects point (u, v) on detector when projection angle is θ, and subpoint (u, v) is calculated as:
U=(x-vCenter) × cos (θ)+(y-vCenter) sin (θ)+pCenter
Dis=(u-pCenter) × a
V=(z- (s0+θ×h)-γ×h/a)×w+γ×h/a+pCenter
, separate and be after merging variable:
U=x × A [0]+y × A [1]+A [2]+pCenter
Dis=(u-pCenter) × a
V=(z-A [4]-γ × A [5]) × w+ γ × A [5]+pCenter
, wherein, vCenter represents volume data center, and pCenter is data for projection center, and α is voxel size, and θ is projection
Angle, r is rotation radiographic source radius of turn, and h is pitch, and γ projects the angle with central beam for beam on central plane.
The backprojection operation of each angle can extract 6 after separation and merging variable and voxel (x, y, z) is unrelated
Intermediate variable, it is assumed that projection number be 360, then whole back projection have 2160 variables (single-precision floating point type) need thrown counter
Calculate and be stored in GPU constant storages before shadow.Constant storage is the distinctive read-only memory spaces of GPU, can be delayed
Deposit, and during same data in the thread accesses constant storage from same half-Warp, in the event of cache hit,
Only need to a cycle and be obtained with data.In general, constant storage space is smaller, in such as Tesla C1060 only
64KB, but be fully able to the need for meeting in the present embodiment.During back projection, it is only necessary to read the variate-value ginseng in constant storage
Final back projection's parameter is just can obtain with conventional multiply-add operation.Therefore, this acceleration strategy can not only avoid being gone with GPU
The great trigonometric function of computing cost, and avoid computing repeatedly for GPU, have in terms of backprojection operation efficiency is lifted compared with
Good effect.
Reconstructed volumetric data is generally deposited in GPU global storages, and global storage occupies the video memory overwhelming majority, can be with
For depositing large-scale data, but global storage does not cache acceleration, although merging access mode can be greatly enhanced
Access speed, but generally still suffer from the access delay in hundreds of cycles.Research shows that GPU is used for the bottleneck of high-performance calculation
It is not to calculate consumption but memory access consumption.Therefore, the time for how reducing access global storage is the key that GPU accelerates.This
In embodiment, the acceleration strategy for instead throwing m width projected images simultaneously in a Kernel is devised.
Under normal circumstances, computing only is carried out to 1 width projected image in a Kernel, for 360 width data for projection, entirely
Back projection's process needs 360 × N of read/write global storage3It is secondary.In embodiment, a Kernel completes m Angles Projections image
Back projection, each Kernel needs to calculate back projection's parameter of m Angles Projections image, but only reads and writes global storage N3
It is secondary, you can so that the number of times of read/write global storage is changed into original 1/m.While global storage read-write number of times is reduced,
Algorithm can increase the computation burden of each thread in Kernel, if increase m simply, will certainly reduce movable in whole GPU
Block and activity Warp quantity, movable Block and activity Warp quantity are reduced can influence GPU to hide long delay in turn again
Operate the advantage of (access global storage).So, find a moderate m and be particularly important.Found by test of many times,
When the back projection's image completed in a Kernel is 3 width simultaneously, both sides reach a balance, and acceleration effect is ideal.
In another embodiment, dress is accelerated there is provided a kind of CT 3-dimensional reconstructions based on heterogeneous platform as shown in Figure 6
Put, heterogeneous platform uses PCI-Express as transmission data-signal and the interconnection bus of control signal, and is made with Ethernet
For the additional busses with outside progress networking control and data transfer;The framework of heterogeneous platform includes the application for providing functional module
There is provided the component of interface specification needed for component base of each functional module of application-oriented layer based on different processor and algorithm for reconstructing for layer
Layer, and component-oriented layer and application layer provide the supporting layer of service, and supporting layer is comprising the CPU for performing main program and performs kernel journey
Multiple OpenCL computing devices of sequence, CPU and OpenCL computing devices communicate connection, and described OpenCL computing devices are included
GPU、FPGA;The CT 3-dimensional reconstructions accelerator of framework based on described heterogeneous platform includes following content:
Calculate grain decomposing module 201, for according to FDK algorithm for reconstructing content by the algorithm with calculate particle shape formula split and from
Dispersion, is decomposed into the projection weighting for carrying out Data correction to data for projection and calculates grain, for entering to the data for projection after weighting
Grain is calculated in the filtering of row filtering, for filtered data for projection back projection to be calculated into grain to the back projection rebuild on object, and is used for
The reduction that reduction process is carried out to back projection's result calculates grain;
Accelerate processing module 202, for being transmitted by additional busses to the data for projection in heterogeneous platform calculate node,
Set according to user and reconstruction performance is assessed, Coordination Treatment is carried out by interconnection bus, respectively in GPU and FPGA acceleration components
Complete to calculate grain acceleration, rebuild data real-time storage and simultaneously feed back to user.
Need explanation, the CT 3-dimensional reconstruction accelerators of some of embodiments of the invention were implemented
Journey is identical with CT 3-dimensional reconstruction accelerated methods part, for details, reference can be made to method section Example, repeats no more here.
The software design block diagram of heterogeneous platform is as shown in Figure 5.Software architecture is divided into three layers, and wherein application layer is mainly to use
The functional module that family software possesses, respectively data for projection weighted correction, data for projection filtering, 3 D back projection rebuild and again
Build 4 modules of image viewing;Components layer is the application oriented component base based on different processor, and predominantly algorithm for reconstructing is each
Calculate the function code storehouse of grain and corresponding interface specification;Last layer is the supporting layer of the containers such as CPU, GPU, FPGA composition, is
Component and application software provide services and support.
When components layer is designed, intend developing using OpenCL frameworks.OpenCL full name are Open Computing
Language, i.e. open computing language, are proposed by Apple companies earliest, are a kind of brand-new calculating application programming interfaces
(API).OpenCL main function is to provide a cross-platform unified standard language for general-purpose computations field, what it was supported
Heterogeneous platform can by multi-core CPU, GPU, DSP, Cell/B.E.processor or other kinds of processor group into.OpenCL is
The exploitation of concurrent program provides the non-proprietary software solution across manufacturer so that program possesses preferable portability;Together
When, cross-platform isomery framework is beneficial to the performance potential of various equipment in performance system simultaneously.OpenCL platform models are by main frame
And coupled one or more OpenCL computing devices (Compute Device) are constituted (Host).Wherein, it is each to calculate
Equipment by one or more computing units (Computer Unit), each computing unit can be further divided into again one or
Multiple processing units (Processing Element), various calculating operations are all completed in processing unit.Main frame end pipe
Manage all computing resources on whole platform.Application program can be sent from host side to the processing unit of each OpenCL equipment
Calculation command.All processing units in a computing unit can perform identical a set of instruction flow.Heterogeneous platform
Multiple programming realize in, using CPU as main frame, and regard GPU and FPGA as OpenCL equipment.OpenCL programming model can
To be divided into two parts, a part is the main program (Host program) performed on CPU, and another part is held on Device
Capable kernel function (Kernel).Main program is by defining context (Context) and managing kernel program holding on Device
OK.In OpenCL programmings, first an index space must be created for the kernel function before host side creates a kernel function
(Index Space), the index space can be one-dimensional, two-dimentional or three-dimensional, and kernel function can be in each of the index space
Performed on node (Work Item).Index in each working node respective dimensions is defined as node in the dimension
Global ID (Global ID).All working node will all perform identical kernel function program, and each working node is to be equivalent to difference
Execution thread, performed by concurrent program between great deal of nodes and calculate the purpose accelerated so as to reach.OpenCL is to index
Space provides working group (work group) space that smaller particle size is also provided outside global index.Each working group exists
There is location index quilt of the node with respect to the working group inside a unique working group ID (Work Group ID), working group
Referred to as local I D (Local ID).The design of kernel function can both select the parallel schema between working node, can also select two
It is parallel between internal node in parallel and working group between working group in layer parallel schema, i.e. index space.This can both make full use of
The computing resource of GPU and FPGA bottom hardwares, also increases the flexibility of programming.
Under multiprocessing isomery and OpenCL programmed environments, the programming to CPU host sides uses standard C/C++ language, right
The programming of GPU and FPGA coprocessors uses the description language based on OpenCL specification.The programming language of OpenCL standard criterions
Levels of abstraction far above the hardware description language such as VHDL and Verilog.Traditional programming mode is needed to FPGA bottom hardwares
Unit is programmed description according to timing cycles, for complicated algorithm performs, it is necessary to design point machine control data path, together
When need to handle interface constraints at different levels and timing synchronization problem, programming difficulty is big, time-consuming, and program maintenance and upgrading are complicated,
It is highly detrimental to the quick application of actual product.And OpenCL programming modes are used, the hardware without paying close attention to bottom sequential level is set
Meter, can design the class C code of high-level language description according to backprojection algorithm, and OpenCL compilers can then be realized by OpenCL automatically
Code be converted into Hardware description language make peace configuration processor the step of.
For further checking effectiveness of the invention, explanation is further explained below by specific experiment:
As shown in fig. 7, selection oil rock core is as test object, testee lower 360 of full angle of collection is projected into
Row is rebuild, and scale of rebuilding is 5003, shown in the D profile Fig. 7 of reconstructed results centre:A), b), c) it is respectively the graphics of reconstruction
Cutting in the profile as on each central plane, i.e., tri- planes of x=0, y=0, z=0 fastened for world coordinates
Face figure.
" CPU ", " GPU+FPGA " two ways is respectively adopted for different reconstruction scales and carries out reconstruction test.Every group of test
It is repeated 10 times, average reconstruction time test result is as follows:
The reconstruction time test result of table 1
Calculated from result it will be seen that can significantly lift traditional CPU using GPU+FPGA accelerated mode
Speed so that user can quickly obtain three-dimensional reconstruction result.Simultaneously we can see that gained in reconstructed results image
Reconstructed results can clearly show three-dimensional internal information, meet three-dimensional reconstruction practical application.
Implementation of each embodiment only just for corresponding steps in illustrating is set forth above, Ran Hou
In the case that logic is not contradicted, each above-mentioned embodiment is can be mutually combined and form new technical scheme, and this is new
Technical scheme still in the open scope of present embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Understood based on such, technical scheme is substantially done to prior art in other words
Going out the part of contribution can be embodied in the form of software product, and the computer software product is carried on a non-volatile meter
In calculation machine readable storage medium (such as ROM, magnetic disc, CD, server cloud space), including some instructions are to cause a station terminal
Equipment (can be mobile phone, computer, server, or network equipment etc.) performs the method described in each embodiment of the invention.
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality
Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, the scope of this specification record is all considered to be.
Claims (10)
1. a kind of CT 3-dimensional reconstruction accelerated methods based on heterogeneous platform, it is characterised in that heterogeneous platform comprising main frame and
Isomery OpenCL computing devices, the accelerated method includes following content:FDK algorithm for reconstructing is carried out to calculate grain decomposition, each is analyzed
Calculate the parallel computation flow of grain;Each calculation grain is accelerated by the main frame in heterogeneous platform and isomery OpenCL computing devices
Optimization processing.
2. the CT 3-dimensional reconstruction accelerated methods according to claim 1 based on heterogeneous platform, it is characterised in that described
Main frame for operation main program CPU, OpenCL computing devices comprising operation kernel program isomery container GPU and FPGA,
Communicated between CPU, GPU and FPGA by PCI-E buses, main program manages the fortune of kernel program by defining context
OK.
3. the CT 3-dimensional reconstruction accelerated methods according to claim 2 based on heterogeneous platform, it is characterised in that right
FDK algorithm for reconstructing carries out calculating grain decomposition, comprising:According to FDK algorithm contents, it is decomposed into:For what is be weighted to data for projection
Grain is calculated in projection weighting, and grain is calculated for the filtering that is filtered to the data for projection after weighting, for by filtered data for projection
Grain is calculated to the back projection rebuild on object by back projection, and calculates grain for the reduction that reduction process is carried out to back projection's result.
4. the CT 3-dimensional reconstruction accelerated methods according to claim 3 based on heterogeneous platform, it is characterised in that according to
FDK algorithm for reconstructing formula:
By the fractionation to being integrated in formula and discretization, it is divided into:
Grain is calculated in projection weighting, is expressed as:Wherein, p'(θ, u, v) represent rotation point
The data spent after being weighted during for θ to data for projection,For weight coefficient;
Grain is calculated in filtering, is expressed as:Wherein, df(θ, u, v) is filtered number
According to h (u) is the unit impulse response of filter operator, [- um,um] represent the 2m data that detector is gathered per a line;
Grain is calculated by back projection, is expressed as:Wherein, f (x, y, z,
θ) represent to rebuild the object contribution margin of subpoint to f (x, y, z) when rotary index is θ;
Reduction calculates grain, is expressed as:Wherein, φmaxTo rebuild discrete when object rotates a circle adopt
The projection number of divisions obtained.
5. the CT 3-dimensional reconstruction accelerated methods according to claim 3 based on heterogeneous platform, it is characterised in that to each
Individual calculation grain carries out acceleration optimization processing, comprising:Grain is calculated to projection weighting using FPGA and carries out parallel processing, by asynchronous transmission extremely
GPU, is handled filtering calculation grain simultaneously in transmitting procedure;The data parallel operations of each tissue points during with reference to back projection,
Grain is calculated back projection by tissue points carry out the calculating of multi-threaded parallel back projection in GPU.
6. the CT 3-dimensional reconstruction accelerated methods according to claim 5 based on heterogeneous platform, it is characterised in that according to
Reconstruction regions are on rotation direction of principal axis up each layer data for projection and detector data for projection y direction in FDK algorithm for reconstructing
Each row data projection corresponding relation, using piecemeal Reconstruction Strategy, region to be reconstructed is divided into some pieces along rotation direction of principal axis,
Corresponding data for projection is taken out from external memory storage carry out reconstruction operation when rebuilding one of.
7. the CT 3-dimensional reconstruction accelerated methods according to claim 5 based on heterogeneous platform, it is characterised in that use
FPGA calculates grain to projection weighting and carries out parallel processing, comprising:Global storage is divided into 2 pieces of bank, it is real by loading distribution
The access of existing random access memory is balanced;Being stored by constant storage needs that the intermediate variable calculated is repeated several times.
8. the CT 3-dimensional reconstruction accelerated methods according to claim 5 based on heterogeneous platform, it is characterised in that
Grain is calculated back projection by tissue points carry out the calculating of multi-threaded parallel back projection in GPU, comprising:Using based on voxel type of drive,
Task division is carried out to GPU by reconstructed volumetric data output;Variable unrelated with voxel in calculating is separated and merged, and
Calculate and be stored in GPU constant storage before back projection, when back projection calculates, directly read the change in constant storage
Amount participates in calculating;Optimize the number of a back projection in kernel program.
9. the CT 3-dimensional reconstruction accelerated methods according to claim 8 based on heterogeneous platform, it is characterised in that will count
The variable unrelated with voxel is separated and merged in calculation, as follows comprising content:Any point (x, y, z) is in projection in volume data
Angle projects point (u, v) on detector when being θ, subpoint (u, v) is calculated as:
U=(x-vCenter) × cos (θ)+(y-vCenter) sin (θ)+pCenter
Dis=(u-pCenter) × a
<mrow>
<mi>w</mi>
<mo>=</mo>
<mfrac>
<msqrt>
<mrow>
<msup>
<mi>r</mi>
<mn>2</mn>
</msup>
<mo>-</mo>
<msup>
<mi>dis</mi>
<mn>2</mn>
</msup>
</mrow>
</msqrt>
<mrow>
<msqrt>
<mrow>
<msup>
<mi>r</mi>
<mn>2</mn>
</msup>
<mo>-</mo>
<msup>
<mi>dis</mi>
<mn>2</mn>
</msup>
</mrow>
</msqrt>
<mo>+</mo>
<mi>a</mi>
<mo>&times;</mo>
<mrow>
<mo>(</mo>
<mo>-</mo>
<mo>(</mo>
<mrow>
<mi>x</mi>
<mo>-</mo>
<mi>v</mi>
<mi>C</mi>
<mi>e</mi>
<mi>n</mi>
<mi>t</mi>
<mi>e</mi>
<mi>r</mi>
</mrow>
<mo>)</mo>
<mo>&times;</mo>
<mi>s</mi>
<mi>i</mi>
<mi>n</mi>
<mo>(</mo>
<mi>&theta;</mi>
<mo>)</mo>
<mo>+</mo>
<mo>(</mo>
<mrow>
<mi>y</mi>
<mo>-</mo>
<mi>v</mi>
<mi>C</mi>
<mi>e</mi>
<mi>n</mi>
<mi>t</mi>
<mi>e</mi>
<mi>r</mi>
</mrow>
<mo>)</mo>
<mi>c</mi>
<mi>o</mi>
<mi>s</mi>
<mo>(</mo>
<mi>&theta;</mi>
<mo>)</mo>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
,
V=(z- (s0+θ×h)-γ×h/a)×w+γ×h/a+pCenter
It is after separation and merging variable:
U=x × A [0]+y × A [1]+A [2]+pCenter
Dis=(u-pCenter) × a
<mrow>
<mi>w</mi>
<mo>=</mo>
<mfrac>
<msqrt>
<mrow>
<msup>
<mi>r</mi>
<mn>2</mn>
</msup>
<mo>-</mo>
<msup>
<mi>dis</mi>
<mn>2</mn>
</msup>
</mrow>
</msqrt>
<mrow>
<msqrt>
<mrow>
<msup>
<mi>r</mi>
<mn>2</mn>
</msup>
<mo>-</mo>
<msup>
<mi>dis</mi>
<mn>2</mn>
</msup>
</mrow>
</msqrt>
<mo>+</mo>
<mi>a</mi>
<mo>&times;</mo>
<mrow>
<mo>(</mo>
<mo>-</mo>
<mi>x</mi>
<mo>&times;</mo>
<mi>A</mi>
<mo>&lsqb;</mo>
<mn>1</mn>
<mo>&rsqb;</mo>
<mo>+</mo>
<mi>y</mi>
<mo>&times;</mo>
<mi>A</mi>
<mo>&lsqb;</mo>
<mn>0</mn>
<mo>&rsqb;</mo>
<mo>+</mo>
<mi>A</mi>
<mo>&lsqb;</mo>
<mn>3</mn>
<mo>&rsqb;</mo>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
,
V=(z-A [4]-γ × A [5]) × w+ γ × A [5]+pCenter
Wherein, vCenter represents volume data center, and pCenter is data for projection center, and α is voxel size, and θ is projection angle,
R is rotation radiographic source radius of turn, and h is pitch, and γ projects the angle with central beam for beam on central plane.
10. a kind of CT 3-dimensional reconstruction accelerators based on heterogeneous platform, it is characterised in that heterogeneous platform uses PCI-
Express carries out networking control as transmission data-signal and the interconnection bus of control signal, and using Ethernet as with outside
The additional busses of system and data transfer;There is provided application-oriented layer for application layer of the framework of heterogeneous platform comprising offer functional module
The components layer of interface specification needed for component base and algorithm for reconstructing of each functional module based on different processor, and component-oriented layer and
Application layer provides the supporting layer of service, and supporting layer includes the CPU for performing main program and the multiple OpenCL meters for performing kernel program
Equipment is calculated, CPU and OpenCL computing devices communicate connection, and described OpenCL computing devices include GPU, FPGA;Based on institute
The CT 3-dimensional reconstructions accelerator of the framework for the heterogeneous platform stated includes following content:
Grain decomposing module is calculated, for the algorithm to be split and discretization with calculating particle shape formula according to FDK algorithm for reconstructing content, point
Solve to calculate grain for the projection weighting that Data correction is carried out to data for projection, for what is be filtered to the data for projection after weighting
Grain is calculated in filtering, for filtered data for projection back projection to be calculated into grain to the back projection rebuild on object, and for back projection
As a result the reduction for carrying out reduction process calculates grain;
Accelerate processing module, for being transmitted by additional busses to the data for projection in heterogeneous platform calculate node, according to
Family is set and reconstruction performance is assessed, and Coordination Treatment is carried out by interconnection bus, completes to calculate respectively in GPU and FPGA acceleration components
Grain accelerates, and rebuilds data real-time storage and feeds back to user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710270520.8A CN107194864A (en) | 2017-04-24 | 2017-04-24 | CT 3-dimensional reconstructions accelerated method and its device based on heterogeneous platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710270520.8A CN107194864A (en) | 2017-04-24 | 2017-04-24 | CT 3-dimensional reconstructions accelerated method and its device based on heterogeneous platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107194864A true CN107194864A (en) | 2017-09-22 |
Family
ID=59872303
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710270520.8A Pending CN107194864A (en) | 2017-04-24 | 2017-04-24 | CT 3-dimensional reconstructions accelerated method and its device based on heterogeneous platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107194864A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921767A (en) * | 2018-06-06 | 2018-11-30 | 国网四川省电力公司信息通信公司 | A kind of image digital watermark processing system and its processing method based on FPGA |
CN108932212A (en) * | 2018-07-16 | 2018-12-04 | 郑州云海信息技术有限公司 | A kind of data processing method based on Heterogeneous Computing chip, system and associated component |
CN109671056A (en) * | 2018-12-03 | 2019-04-23 | 西安交通大学 | A kind of compound sleeper porosity defects detection method based on radioscopic image |
CN109949411A (en) * | 2019-03-22 | 2019-06-28 | 电子科技大学 | A kind of image rebuilding method based on three-dimensional weighted filtering back projection and statistics iteration |
CN110070597A (en) * | 2019-04-02 | 2019-07-30 | 同济大学 | A kind of Unity3D rendering accelerated method based on OpenCL |
CN111124920A (en) * | 2019-12-24 | 2020-05-08 | 北京金山安全软件有限公司 | Equipment performance testing method and device and electronic equipment |
CN113720271A (en) * | 2021-07-26 | 2021-11-30 | 无锡维度投资管理合伙企业(有限合伙) | Three-dimensional measurement acceleration system based on FPGA heterogeneous processing |
CN113935928A (en) * | 2020-07-13 | 2022-01-14 | 四川大学 | Rock core image super-resolution reconstruction based on Raw format |
CN113720271B (en) * | 2021-07-26 | 2024-05-17 | 无锡维度投资管理合伙企业(有限合伙) | Three-dimensional measurement acceleration system based on FPGA heterogeneous processing |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110164031A1 (en) * | 2010-01-06 | 2011-07-07 | Kabushiki Kaisha Toshiba | Novel implementation of total variation (tv) minimization iterative reconstruction algorithm suitable for parallel computation |
CN102354392A (en) * | 2011-06-08 | 2012-02-15 | 无锡引速得信息科技有限公司 | Parallel accelerating apparatus used in industrial computerized tomography (CT) image reconstruction |
CN103390285A (en) * | 2013-07-09 | 2013-11-13 | 中国人民解放军信息工程大学 | Cone beam computed tomography (CT) incomplete angle rebuilding method based on edge guide |
CN104142845A (en) * | 2014-07-21 | 2014-11-12 | 中国人民解放军信息工程大学 | CT image reconstruction back projection acceleration method based on OpenCL-To-FPGA |
CN105678820A (en) * | 2016-01-11 | 2016-06-15 | 中国人民解放军信息工程大学 | CUDA-based S-BPF reconstruction algorithm acceleration method |
CN105957028A (en) * | 2016-04-25 | 2016-09-21 | 西安电子科技大学 | GPU acceleration patch-based bilateral filter method based on OpenCL |
-
2017
- 2017-04-24 CN CN201710270520.8A patent/CN107194864A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110164031A1 (en) * | 2010-01-06 | 2011-07-07 | Kabushiki Kaisha Toshiba | Novel implementation of total variation (tv) minimization iterative reconstruction algorithm suitable for parallel computation |
CN102354392A (en) * | 2011-06-08 | 2012-02-15 | 无锡引速得信息科技有限公司 | Parallel accelerating apparatus used in industrial computerized tomography (CT) image reconstruction |
CN103390285A (en) * | 2013-07-09 | 2013-11-13 | 中国人民解放军信息工程大学 | Cone beam computed tomography (CT) incomplete angle rebuilding method based on edge guide |
CN104142845A (en) * | 2014-07-21 | 2014-11-12 | 中国人民解放军信息工程大学 | CT image reconstruction back projection acceleration method based on OpenCL-To-FPGA |
CN105678820A (en) * | 2016-01-11 | 2016-06-15 | 中国人民解放军信息工程大学 | CUDA-based S-BPF reconstruction algorithm acceleration method |
CN105957028A (en) * | 2016-04-25 | 2016-09-21 | 西安电子科技大学 | GPU acceleration patch-based bilateral filter method based on OpenCL |
Non-Patent Citations (4)
Title |
---|
JIANWEN CHEN 等: "A Hybrid Architecture for Compressive Sensing 3-D CT Reconstruction", 《IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS》 * |
JINGFEI DENG 等: "Parallel no-waiting pipelining accelerating CT image reconstruction based on FPGA", 《2010 3RD INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS》 * |
王超 等: "锥束CT图像重建算法在DSP上的加速方法研究", 《CT理论与应用研究》 * |
韩玉 等: "锥束CT_FDK重建算法的GPU并行实现", 《计算机应用》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921767A (en) * | 2018-06-06 | 2018-11-30 | 国网四川省电力公司信息通信公司 | A kind of image digital watermark processing system and its processing method based on FPGA |
CN108932212A (en) * | 2018-07-16 | 2018-12-04 | 郑州云海信息技术有限公司 | A kind of data processing method based on Heterogeneous Computing chip, system and associated component |
CN109671056A (en) * | 2018-12-03 | 2019-04-23 | 西安交通大学 | A kind of compound sleeper porosity defects detection method based on radioscopic image |
CN109671056B (en) * | 2018-12-03 | 2020-10-27 | 西安交通大学 | Composite sleeper pore defect detection method based on X-ray image |
CN109949411A (en) * | 2019-03-22 | 2019-06-28 | 电子科技大学 | A kind of image rebuilding method based on three-dimensional weighted filtering back projection and statistics iteration |
CN109949411B (en) * | 2019-03-22 | 2022-12-27 | 电子科技大学 | Image reconstruction method based on three-dimensional weighted filtering back projection and statistical iteration |
CN110070597A (en) * | 2019-04-02 | 2019-07-30 | 同济大学 | A kind of Unity3D rendering accelerated method based on OpenCL |
CN111124920A (en) * | 2019-12-24 | 2020-05-08 | 北京金山安全软件有限公司 | Equipment performance testing method and device and electronic equipment |
CN113935928A (en) * | 2020-07-13 | 2022-01-14 | 四川大学 | Rock core image super-resolution reconstruction based on Raw format |
CN113935928B (en) * | 2020-07-13 | 2023-04-11 | 四川大学 | Rock core image super-resolution reconstruction based on Raw format |
CN113720271A (en) * | 2021-07-26 | 2021-11-30 | 无锡维度投资管理合伙企业(有限合伙) | Three-dimensional measurement acceleration system based on FPGA heterogeneous processing |
CN113720271B (en) * | 2021-07-26 | 2024-05-17 | 无锡维度投资管理合伙企业(有限合伙) | Three-dimensional measurement acceleration system based on FPGA heterogeneous processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107194864A (en) | CT 3-dimensional reconstructions accelerated method and its device based on heterogeneous platform | |
Tang et al. | Collision-streams: Fast GPU-based collision detection for deformable models | |
CN102567944B (en) | Computed tomography (CT) image reconstruction hardware accelerating method based on field programmable gate array (FPGA) | |
Pratx et al. | GPU computing in medical physics: A review | |
Groen et al. | Analysing and modelling the performance of the HemeLB lattice-Boltzmann simulation environment | |
Stuart et al. | Multi-GPU volume rendering using MapReduce | |
Schroder et al. | Fast rotation of volume data on parallel architectures | |
CN113016009A (en) | Multi-level image reconstruction using one or more neural networks | |
US10691572B2 (en) | Liveness as a factor to evaluate memory vulnerability to soft errors | |
CN102609978A (en) | Method for accelerating cone-beam CT (computerized tomography) image reconstruction by using GPU (graphics processing unit) based on CUDA (compute unified device architecture) architecture | |
Montani et al. | Parallel volume visualization on a hypercube architecture | |
Chen et al. | A hybrid architecture for compressive sensing 3-D CT reconstruction | |
Li | Design of an FPGA-based computing platform for real-time three-dimensional medical imaging | |
US20230104199A1 (en) | Apparatus and method for ray tracing with shader call graph analysis | |
CN113628318B (en) | Distributed real-time neuron rendering method and system based on ray tracing | |
Chen et al. | Parallel performance optimization of large-scale unstructured data visualization for the earth simulator. | |
Tian et al. | A multi‐GPU finite element computation and hybrid collision handling process framework for brain deformation simulation | |
Bruckner | Efficient volume visualization of large medical datasets | |
Gu et al. | Accurate and efficient GPU ray‐casting algorithm for volume rendering of unstructured grid data | |
Shirazian et al. | Polygonization of Implicit Surfaces on Multi-Core Architectures with SIMD Instructions. | |
Xing et al. | FPGA-accelerated real-time volume rendering for 3D medical image | |
Reina et al. | A decade of particle-based scientific visualization | |
Biguri et al. | Numerically robust tetrahedron-based tomographic forward and backward projectors on parallel architectures | |
Cui | Fast and accurate PET image reconstruction on parallel architectures | |
Yalçın et al. | GPU algorithms for diamond-based multiresolution terrain processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170922 |
|
RJ01 | Rejection of invention patent application after publication |