CN101324962B

CN101324962B - Parallel processing method drawn by pre-projection light ray projection body

Info

Publication number: CN101324962B
Application number: CN2008101426400A
Authority: CN
Inventors: 黄波; 刘思源; 郑倩; 姜志阳; 文高进; 冯圣中; 樊建平
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Guangdong Carbon Neutralization Research Institute Shaoguan
Priority date: 2008-07-21
Filing date: 2008-07-21
Publication date: 2011-08-17
Anticipated expiration: 2028-07-21
Also published as: CN101324962A

Abstract

The invention discloses a pre-projection ray casting volume rendering parallel processing method. The following steps are included in the volume rendering data handling procedure: calculating bounding volume of a three dimensional volumetric model; projecting the bounding volume in the screen space; storing the corresponding coordinates of a projection area; and conducing the parallel processing of volume rendering data. The method adopts the parallel processing manner of pre-projection, so that all the computing nodes do not need to wait for all tasks to be calculated and finished and then transmit the result data to a main node without waiting, and instead, the computing nodes send intermediate result data to the main node immediately after a task block is calculated and start the calculation for the next task block. Fewer task blocks are distributed to the main node at first. The method provides the optimum estimation for the number of task blocks distributed to each computing node, thereby greatly reducing the processing volume of data and improving the processing speed of data.

Description

A kind of method for parallel processing of drawn by pre-projection light ray projection body

Technical field

The present invention relates to a kind of visualization in scientific computing treatment technology, the method for parallel processing of the pre-projection of image space in particular a kind of Volume Rendering Techniques.

Background technology

In the prior art, the scientific visualization technology that with the Volume Rendering Techniques is representative can become be easy to intuitionistic form---the figure being accepted and understand by the people to abstract data conversion, for understanding, find that various phenomenons, rule in the scientific computing process provide strong instrument.

Volume Rendering Techniques can produce the general image of 3 d data field, can comprise a large amount of details of data fields, draws high-quality image, but the data volume that it relates to is more, calculated amount is bigger, thereby the drafting time is longer, and is difficult to utilize traditional graphic hardware to realize drawing.Though the research worker is doing a lot of work aspect the optimization of serial volume rendering algorithm, but development along with application, data scale sharply enlarges, and drawing precision sharply increases, and only relies on can't satisfy demand to render speed to the optimization of serial algorithm itself.

Visual (Visualization) becomes to be easy to intuitionistic form---the figure being accepted and understand by the people to abstract data conversion.According to the difference that stresses face, visually can be divided into three branches: scientific visualization (Scientific Visualization), data visualization (Data Visualization) and information visualization (Information Visualization).Scientific visualization (Scientific Visualization), claim visualization in scientific computing (Visualization in Scientific Computing) again, refer to utilization computer graphics and image processing technique, the result of calculation that produces in the scientific computing process is converted to figure and image showed and carried out interaction process on screen theory, method and technology.

Visualization in scientific computing is a new research field that the later stage eighties 20th century proposes and grow up, and is an important research direction in the computer graphics.It combines data mining, figure generation technique, image processing technique and human-computer interaction technology, its major function is analysis and understands the multidimensional data that is input in the computing machine, and with pilot process or analysis result with figure or image image, show intuitively, for researcher with the related personnel observes and use.

The application of visualization in scientific computing is very extensive, almost relate to all natural subjects and engineering field, its main application fields has medical science, molecular model structure, industrial nondestructive examination, archaeology, geologic prospecting, meteorology, Fluid Mechanics Computation and finite element analysis etc.

In addition, visualization in scientific computing also can be applicable to space exploration, astrophysics, art of mathematics etc.Digital virtual with medical domain is artificially routine.Three-dimensional regular data fields such as slice of data based on computed tomography (CT) data, nuclear magnetic resonance (MRI) data and the real human body of two dimension, utilize organization of human body digitizing and scientific visualization technology can reconstruct the Performance Computers from Digitized Virtual Human of a three-dimensional, just with the organization of human body digitizing.By computer technology and image processing techniques, on computer screen, show a simulation human body, further with the in addition digitizing of the achievement in research of bodily fuctions's property, change it linguistic notation of computer into again, compose and be added on this human figure framework by information scientist; Through with the mixing together of virtual reality technology, by operator's regulation and control, this " visual human " can imitate true man and make various reactions.

Performance Computers from Digitized Virtual Human is set up organization of human body digitizing and visual can make the quantitative test of computing machine and accurate simulation become possibility with the mathematical model of Computer Processing.Along with information is obtained and the progress of treatment technology, the raising of data acquisition precision, the visual human will more accurately simulate the function and the behavior of human body in field more and more widely, for multidisciplinary researches such as medical science, national defence, automobile provide application foundation.

Visualization in scientific computing is the processing speed of expedited data greatly, makes all the time all effectively to be handled, utilize in the huge data that produce; It can lie in information in the science computational data by picture, image expression, for understanding, find that various phenomenons, rule in the scientific computing process provide strong instrument.In a word, speed and quality that visualization in scientific computing will greatly raising science calculates, thus make the variation of the looks generation essence of scientific research, bring huge facility finally for people's social life.

With the medical field is example.In image medical diagnosis in the past, mainly be to find the pathology body by the three-dimensional slice image of observing one group of CT, MRI, this mainly depends on the sheet experience of reading that the medical personnel enriches, and image is carried out qualitative analysis.Utilize the scientific visualization technology to carry out two dimension or three dimensional analysis and processing to two-dimensional slice image, show as segmented extraction, three-dimensional reconstruction and two dimension human organ, soft tissue and pathology body, can assist the medical personnel to carry out qualitative until accurate quantitative analysis to pathology body and other interesting areas, thereby can improve the accuracy and the correctness of medical diagnosis greatly, finally bring huge economic benefit and social benefit.

The core of visualization in scientific computing is the visual of 3 d data field, for three-dimensional data, the different visualized algorithm of two classes is arranged: iso-surface patch and volume drawing.The iso-surface patch technology is meant and at first constructs middle geometric graphic element in 3 d data field, and then realized the drafting of picture by traditional computer graphics techniques.The iso-surface patch method has multiple algorithm, and the difference of various algorithms is that approximation surface geometric units difference or the selection of geometric units yardstick adopted are different.Typical algorithm has: W.E.Lorenson and H.E.Cline[LORE87] the MC method (Marching Cubes), the A.Doi[DOI91 that propose] the MT method (Marching Tetrahedral), H.E.Cline and the W.E.Lorenson[CLIN98 that propose] the subdivision cube method (Dividing Cubes) that proposes etc.The iso-surface patch technology can produce contour surface image more clearly, and can utilize existing graphic hardware to realize drawing function, makes speeding up of image generation and conversion, is applicable to tissue and organ that the rendered surface feature is clearly demarcated.But the iso-surface patch technology is cut apart the requirement height to data, and interior of articles information can't keep, and can not reflect the overall picture and the details of whole raw data field.

Geometric graphic element in the middle of Volume Rendering Techniques is not constructed, but, be also referred to as direct volume drawing directly by the two dimensional image on the 3 d data field generation screen.Volume Rendering Techniques is a kind of 3 d data field method for visualizing that is developed rapidly in recent years.For other volume rendering algorithms such as footprint algorithm, voxel projections algorithm, the ray cast volume rendering algorithm can be drawn out has the more image of high-quality, the more interior details of embodiment, is basic, the most the most frequently used volume rendering algorithm.

Because the ray cast volume rendering algorithm need carry out ray cast and color calculation to each pixel on the screen, and, when direction of observation changes, context between the sampled point in the data fields also changes, will repaint all pixels like this, therefore calculated amount is very huge and caused a large amount of irregular visits, finally causes the render speed of ray cast volume rendering algorithm not reach the requirement of application.

In order to solve the problem of computing velocity, people have proposed various improving one's methods or accelerating algorithm, as the space jump algorithm that can skip over the 3-D view dummy section, the accumulation transparency stops ray cast when approaching numerical value ray premature termination method etc.Ray cast parallel drawing algorithm based on cluster also develops since the nineties.Below will be according to chronological order, be introduced to the improvement in the ray cast volume drawing and speed technology and based on the ray cast body parallel volume rendering algorithm of cluster.

1, ray cast volume drawing speed technology

First has proposed ray cast (Levoy is called " ray trace ") volume rendering algorithm M.Levoy.At first, volume data is carried out suitable pre-service, for example denoising, resampling etc.; Then, utilize look-up table to determine the opacity value of voxel, utilize the Phong illumination model to determine the color value of voxel simultaneously; Then, along the throw light uniform sampling, to sampled point cubic curve interpolation; At last, use from after forward synthetic color value of method and opacity value, form last image.In order to improve the quality of image, Levoy has adopted the method for over-sampling---between raw data points, insert more intermediate point.This method can reduce and lose shape, and improves picture quality, but can increase the expense of calculating.

M.Levoy proposes by the image combining method behind the forward direction.In this new method, opaque value is inevitable progressively to be increased.When opaque value levels off to (for example Levoy selects the condition that opaque value 0.95 stops as light) 1 the time, the image that this pixel is described has approached opaque fully, the volume elements of back can not contributed the image of this pixel more to some extent, thereby can no longer calculate.Therefore, can save the part color calculation, make to speed up by the image synthesizing method behind the forward direction.

M.Levoy adopts the object volume elements in the Octree method organizer data.Volume data is by the evenly sub-branch of recurrence and form a level octree structure.When adopting ray cast to carry out volume drawing, by the traversal Octree, light can be skipped the maximum empty zone that comprises current sampling point.

Tang improves the ray trace volume rendering algorithm.Earlier the volume data field is resampled according to throw light, and then carry out steps such as data value classification, color assignment, opacity assignment and light and shade calculating.The picture quality that algorithm construction after this improvement goes out increases; Simultaneously, owing to do not need total data point is composed color value and opacity value, can reduce calculated amount, save storage space.

The PARC algorithm utilizes the bounding box of object volume elements to remove the empty volume elements around the object volume elements in the volume data.This method is only effective to the object volume elements of being surrounded by dummy section, all can't do effective processing to containing the volume data that free volume elements is arranged in the volume data that comprises a plurality of object voxel region or the object voxel region.

People such as Lee have proposed a kind ofly to be used for the adaptive sub that acceleration bodies draws and to divide algorithm, and this algorithm is evenly divided into a certain size sub-block with volume data, and each sub-block comprises some volume elements.If the volume elements in the sub-block is empty, then this piece is identified as sky.In merging process thereafter, travel through sub-block in sequence, merge gap data block wherein, and generate big as far as possible empty piece.In the volume drawing process, this data organization is skipped the piece of having leisure effectively, thereby has quickened volume drawing.

People such as Wald have proposed minimum/maximum kd-tree method.Though each kd-tree node all needs to store minimum value and maximal value, increased by one times data volume; But,, reach more satisfactory frame per second owing to can slip over white space apace, integrate the homology zone by comparing each node minimum/maximum.

Knittel first utilize single cpu system to realize interactive light projectile rendering algorithm with software.Certainly, though the ULTRAVIS system that Knittel utilizes has only a CPU, the expansion that has picture MMX etc. and processor to have nothing to do; Simultaneously, employing has utilized the assembly language of manual optimization and SIMD instruction set to programme.

2, based on the ray cast body parallel volume rendering algorithm of cluster

J.Nieh and M.Levoy have proposed first parallel ray cast volume rendering algorithm.This parallel algorithm realizes (this environment provides a shared memory, and each sub-processor also has local storage simultaneously) on the multi-processor environment DASH of MIMD pattern, the maximum specific energy of quickening reaches 40 (48 processors), and maximum frame frequency is 3Hz.This basic idea is: before the beginning draw calculation, at first carry out once rough Task Distribution according to the processor number, be about to image space and be divided into equal and opposite in direction or approaching subregion, be assigned to each processor statically; Each sub-processor and then the subregion of correspondence is subdivided into the sub-piece of suitable size is so the sub-piece of all in each subregion is lined up a formation respectively.The size of each subregion is very approaching, so through the rough task division of the first step, the load balance of total system has had certain guarantee.Again owing to actual task scheduling is carried out under sub-piece granularity, so can further improve load balanced condition.

Concrete method is that in case the task queue of certain sub-processor correspondence is empty, it can directly take over the still uncompleted parton task of other processors.In fact, the processor of being taken in this adapter process needn't be known and participate in (" task is stolen " algorithm).In the ray cast volume rendering algorithm, all positions of calculating the physical space that takes place are difficult to pre-determine, so data correlativity spatially is extremely strong.The data that need to visit in computation process when certain sub-processor just can only obtain by data communication not at local storage.Nieh and Levoy keep a raw data field in shared memory, read for all sub-processors; Between the local storage of each processor, data distribute according to the mode of interleaved, and the mode that exchanges according to the page communicates.

People such as C.Montani have proposed a kind of hybrid ray cast parallel volume rendering algorithm, and this is divided into image space the strip-shaped sub-regions territory of rectangle, each several part equal and opposite in direction or approaching at last; Correspondingly, processor node is divided into groups every group of node that has similar number.The task division of draw calculation is that unit carries out according to the strip-shaped sub-regions territory, and promptly a sub regions is corresponding to a subtask.After the relative computability that estimates each grouping by token passing scheme, mode that can be static is carried out the subtask and is distributed between each group; Certainly, so just require the inner complete data fields copy of a cover that all has of each group.By reasonable division and the tissue to calculation task and processing node, this class algorithm can effectively reduce data traffic; Because each grouping computing power is pre-estimated,, still can be obtained satisfied load balance though the task division mode is static.But its shortcoming is that the data volume amount of redundancy is excessive, and in fact the storage consumption of total system not only depends on the scale of raw data, and is directly proportional with the number of group.

People such as K.Ma have proposed another one ray cast parallel volume rendering algorithm, are realized in CM-5 and cluster of workstations environment respectively by people such as Kwan-Liu, K.Ma.The same with other all rendering algorithms, this algorithm comprises two main computation processes: resampling and image to volume data are synthetic.Whole algorithm is carried out according to divide-and-conquer strategy, and the resampling process is divided into finally through recurrence and finishes by each processor is parallel, and image is synthetic then to be undertaken by trace-back process.

People such as Ren Jicheng have proposed non-regular data field parallel volume rendering algorithm (essence is " ray cast iso-surface patch algorithm "), this algorithm adopts static data to distribute, avoided in the drawing process heavily distribute with computing node between communicate by letter, and the image combining method optimized proposed, make and draw and synthetic executed in parallel, avoid the obstruction of network service, improved concurrency of algorithm.Static and the load balancing strategy that dynamically combines has further improved the efficient of algorithm.

People such as Deng Junhui have proposed a kind of ray cast parallel volume rendering algorithm based on the parallel virtual machine structure, and this algorithm is that unit divides the data with organizer with the two dimension slicing, has both reduced communication cost, has also guaranteed the data locality of each subtask.When Task Distribution, safeguard and usability exponent data storehouse that each subtask is determined on self-adapting type ground, has realized load balance preferably; Use a kind of asynchronous two separating methods simultaneously, reduced the time that all topographies merge.Parallelization in virtual machine environment realizes at visualized algorithm, has designed and Implemented one voluntarily based on TCP/IP and Socket standard development platform PIPVR.

People such as easy decree have proposed the parallel iso-surface patch algorithm of a kind of ray trace based on the BSP tree.This algorithm carries out the division of pixel space by analyzing the relation of viewpoint and scene space, the blindness of avoiding parallel processing task to distribute.

In each disposal route of above-mentioned prior art, need reorganize data usually, system overhead is very big, and the processing very complicated of data, and processing speed is slow.Therefore, prior art has yet to be improved and developed.

Summary of the invention

The object of the present invention is to provide a kind of method for parallel processing of drawn by pre-projection light ray projection body, the simplification that is implemented in data processing in the volume drawing data handling procedure is with quick.

Technical scheme of the present invention comprises:

A kind of method for parallel processing of drawn by pre-projection light ray projection body by at least one multi-purpose computer, may further comprise the steps in the data handling procedure of volume drawing:

The bounding box of A, calculating said three-dimensional body model;

B, this bounding box is carried out projection to screen space;

The corresponding coordinate of C, storage view field;

D, carry out the parallel processing of volume drawing data; Described step D comprises that also the treatment step when being applied in the parallel disposal system is as follows:

D1, by host node to each computing node distribute data Processing tasks;

Described step D1 also comprises:

D11, note M are the task piece number of dividing on the computing node, and N is number (M＞=2 of computing node; N＞=2), pre-estimate calculated amount: T _{Srart_part}Be the time of a communications of CPU startup, T _{Comm_part}The required time of result data of collecting other node for host node, then

D12, host node are collected the result data of other computing nodes after calculating.

D2, all computing nodes promptly send the intermediate result data to host node immediately, and begin the calculating of next task piece simultaneously after having calculated a task piece.

The method for parallel processing of a kind of drawn by pre-projection light ray projection body provided by the present invention, owing to adopted the processing mode of pre-projection, reduce the treatment capacity of data, when parallel processing, adopted and calculated and the delay concealing technology of communicating by letter overlapping, improved data processing speed.

Description of drawings

Fig. 1 is the method for parallel processing processing procedure synoptic diagram of drawn by pre-projection light ray projection body of the present invention;

Fig. 2 a-Fig. 2 d shows the original graph of the 50th, 100,150 and 200 sections of one of the inventive method embodiment Photo_Head_1_260 respectively;

Fig. 2 e shows the effect reference diagram of Photo_Head_1_260;

Fig. 2 f shows the whole design sketch of drawing of Photo_Head_1_260;

What Fig. 2 g and Fig. 2 h illustrated respectively is the design sketch that bone, muscle are drawn;

What Fig. 2 i and Fig. 2 j illustrated respectively is that bone and muscle make up two kinds of design sketchs when drawing;

Fig. 3 a and Fig. 3 b illustrated respectively another Embodiment C of the inventive method T_BostonTeapot with reference to the effect contrast figure;

Fig. 3 c, Fig. 3 d, Fig. 3 e and Fig. 3 f have represented the design sketch that utilizes the different colours mapping and from different perspectives CT_BostonTeapot is drawn respectively;

Fig. 4 a is depicted as the performance comparison synoptic diagram that the inventive method " pre-projection " causes;

Fig. 4 b is the load balance synoptic diagram of prior art when not using " pre-projection " technology;

Load balance synoptic diagram when the inventive method that is depicted as Fig. 4 c adopts " pre-projection " technology;

Fig. 5 a be in the inventive method communication than being 0.001 o'clock task piece number figure that influences to the overall operation time;

Fig. 5 b be in the inventive method communication than being 0.0001 o'clock task piece number figure that influences to the overall operation time;

Fig. 5 c is the speed-up ratio synoptic diagram of parallel volume rendering disposal route of the present invention.

Embodiment

Below in conjunction with accompanying drawing, will be described in more detail each preferred embodiment of the present invention.

The present invention proposes a kind of and " cutting " method and thought similar " pre-projection " speed technology first, promptly in a multi-purpose computer or the parallel processing system (PPS) formed by a plurality of multi-purpose computers, by in advance a bounding box that comprises all objects model according to predetermined angular to the screen picture space projection, the projection that part on the screen picture space beyond the projection does not then need to carry out light is handled, directly present the color of background, as shown in Figure 1, so just view data is cut apart, reduce the amount of data processing, improved the speed of data processing.

The method for parallel processing of drawn by pre-projection light ray projection body of the present invention, in application-specific such as medical treatment detection, industrial flaw detection, less owing to the effective object model number in the object space, as not need to be concerned about object space background difference, and effectively object model is not crossing for most of light that projects from the screen picture space, and directly present the color of background, " pre-projection " technology in the inventive method can reduce useless ray cast effectively to be calculated, and improves the processing speed of volume drawing.

Processing procedure of the present invention may further comprise the steps, by at least one multi-purpose computer in the volume drawing processing procedure:

1, calculates the bounding box of said three-dimensional body model;

2, bounding box is carried out projection to form (screen space);

3, the corresponding coordinate of storage view field;

4, carry out the processing of volume drawing data.

The processing procedure of volume drawing data of the present invention is to realize that in the distributed and parallel processing system (PPS) of a group of planes this implementation is known by prior art, does not therefore repeat them here.

The image-region that " pre-projection " technology can the data receptor influence whether in the inventive method makes a distinction, and helps the load balance of Task Distribution in the parallel algorithm.In ray cast parallel volume rendering algorithm in the past, directly division of tasks is carried out in the screen picture space often, dynamically or statically distribute to performance element, and because the existing of invalid view field (promptly not the view field of data receptor influence), the execution time of TU task unit differs greatly, and has caused the load of each performance element very uneven.Can reject the not view field of data receptor influence based on the task division of " pre-projection ", and less to otherness between the TU task unit of effective view field division, thus help load balance.Thereby the inventive method has been carried out the division and the distribution of task based on " pre-projection " technology to the view field in the screen picture space.

In ray cast parallel volume rendering algorithm in the past, mainly contain two kinds of Task Distribution strategies: divide equally strategy and master-slave strategy; Simultaneously, these two kinds of Task Distribution strategies have determined corresponding image synthesis strategy respectively.

In dividing equally strategy, all calculation procedure or node have the calculation task of equivalent, and after they had all finished calculating, host process or node just began to collect result data.In the process of collection, other nodes not only may also may be competed, wait for by idle because of the host node resource-constrained.

And in master-slave strategy, process or node are divided into computing node and collection node by function.Host node is responsible for the collection of result data specially, and does not distribute calculation task; And the task computation of being correlated with specially from node, and regularly send result of calculation to host node.In this method, the resource of host node also will be wasted greatly.

In order to make full use of the resource of all nodes, the inventive method has adopted based on calculating and the static task allocation algorithm of communicating by letter overlapping.In this algorithm, all computing nodes need no longer to wait for that all task computation finish just to host node transmission result data, but calculating a task piece, and begin the calculating of next task piece simultaneously later on just immediately to host node transmission intermediate result data; Host node begins to be assigned with less calculation task to make full use of computational resource, and this makes that the time of other nodes implicit expression wait synchronously in the end is shorter simultaneously.

The inventive method based on calculating and the static task allocation algorithm of communicating by letter overlapping, need pre-estimate the calculated amount in the ray cast volume drawing.Might as well establish total calculated amount (being equal to computing time here) and be designated as T _Serial, T then _Serial=T _{Serial_part}+ T _{Parallel_part}Wherein, but image rendering partly is a parallel section, and shared calculated amount is designated as T _{Parallel_part}Can not be designated as T by parallel section _{Serail_part}

In the ray cast parallel volume rendering algorithm in the past, after calculating with communicate by letter before, need carry out synchronous.Might as well suppose that these parallel algorithms have obtained load balance completely, the calculated amount of each processor is: T _{Serial_part}+ T _{Parallel_part}/ N (the processor number is designated as N, N 〉=2).Host node is collected the result data of other nodes after calculating, the required time is T _{Comm_part}The time T carried out of ray cast parallel volume rendering algorithm in the past then _ParallelFor:

T _parallel＝T _{serial_part}+T _{parallel_part}/N+T _{comm_part}；

Speed-up ratio is:

Speedup＝T _serial/T _parallel＝[T _{serial_part}+T _{parallel_part}]/[T _{serial_part}+T _{parallel_part}/N+T _{comm_part}]

The inventive method proposed based on calculate with the static task allocation algorithm of communicating by letter overlapping in, computing node needs no longer to wait for that all task computation finish just to host node transmission result data, but calculating a task piece later on just immediately to host node transmission intermediate result data, this makes that the time of other nodes implicit expression wait synchronously in the end is shorter.Simultaneously, host node is assigned with less calculation task, utilizes the resource of host node fully.

For simplifying the analysis, supposing the system has only a node to collect result data, i.e. host node.

Mark above inheriting, and remember that M is the task piece number of dividing on the computing node, N is number (M＞=2 of computing node; N＞=2), T _Comp(i) be the calculated amount (computing time) of i node.In order to allow task between each node balance of trying one's best, set up following equation:

T _comp(1)＝L＝T _comp(N-1)＝T _comp(0)+T _{comm_part}；

T _comp(0)+L+T _comp(N-1)＝T _{parallel_part}

Solving equation can get:

T _comp(1)＝L＝T _comp(N-1)＝(T _{comm_part}+T _{parallel_part})/N；

T _comp(0)＝(T _{comm_part}+T _{parallel_part})/N-T _{comm_part}

Carry out the result data transmission for M time because each computing node divides, the time of end product data transmission is: T _{Comm_part}/ M.If note T _{Start_part}Be the time of a communications of CPU startup, then each computing node will increase about M*T _{Start_part}Overhead (because CPU starts the time ratio data transmission period of communication little a lot, ignore in the superincumbent analysis and do not remember).Therefore, the time T of the ray cast parallel volume rendering algorithm of the inventive method execution _{Parallel '}For:

T _parallel′＝T _{serail_part}+M*T _{start_part}+(T _{comm_part}+T _{parallel_part})/N+T _{comm_part}/M

Because M*T _{Start_part}Other times are much smaller relatively, can be reduced to:

T _parallel′＝T _{serail_part}+(T _{comm_part}+T _{parallel_part})/N+T _{comm_part}/M

Speed-up ratio is:

Speedup′＝T _serial/T _parallel′＝[T _{serial_part}+T _{parallel_part}]/[T _{serail_part}+(T _{comm_part}+T _{parallel_part})/N+T _{comm_part}/M]

Then the ray cast parallel volume rendering algorithm of the inventive method with the ratio of the speed-up ratio of in the past parallel algorithm is:

Speedup′/Speedup＝T _parallel/T _parallel′

＝[T _{serial_part}+T _{parallel_part}/N+T _{comm_part}]/[T _{serail_part}+(T _{comm_part}+T _{parallel_part})/N+T _{comm_part}/M]

＝[T _{serial_part}+T _{parallel_part}/N+T _{comm_part}]/[T _{serail_part}+T _{parallel_part}/N+T _{comm_part}*(1/M+1/N)]

Because M and N are the positive integer greater than 2, then 1 〉=(1/M+1/N).So Speedup '/Speedup 〉=1, promptly the ray cast parallel volume rendering algorithm of this paper ratio parallel volume rendering algorithm in the past has better theoretical speed-up ratio.

Below, will---computing node allocating task piece number M---be elaborated to the most important parameter of load-balance model in the inventive method.For the purpose of simplifying the description, supposing the system has only a node to collect result data, i.e. host node; Simultaneously, think that node number N fixes.

As want T _{Parallel '}Get minimum value, only as F (M)=M*T _{Start_part}+ T _{Comm_part}/ M gets minimum value.That is, only work as

The time, F (M) gets minimum value.

On the basis of pre-shadow casting technique and above-mentioned task allocation algorithms, below on dawn 4000A Network of Workstation, propose and the ray cast parallel volume rendering processing procedure that realizes describes with regard to the inventive method, see the example of following program code:

This parallel processing process is primarily aimed at this performance bottleneck of image rendering and has carried out the parallel processing design, utilizes the coarse grain parallelism to screen picture space piecemeal to reduce communication overhead.Simultaneously, host node and general computing node are distinguished, in host node, considered the influence of communication, realize overall load balance.In order to reduce the overhead influence that the inventive method parallelization brings, allow and calculate and overlapping the carrying out of communicating by letter, covered most communication overhead and synchronization delay problem.

The ray cast volume drawing parallel processing process that this method method proposes can realize on dawn 4000A server shared platform, dawn 4000A server shared platform has 20 computing nodes, each computing node is furnished with the CPU of two AMD Opteron (tm), dominant frequency is 1.6GHz, in save as 4GB, local disk 1T, network hard disc 4T is by kilomega network communication, all computing nodes link to each other with an InfiniBand network by one 100,000,000 net, a kilomega network, communicate with MPI between the computing node.The data that test data adopted are the CT of the U.S. numeral people bone regular data field of axially sampling, and size of data is 587 * 1878 * 341Bytes.

In order to verify the correctness of algorithm of the inventive method, below provide the drafting design sketch of ray cast volume rendering algorithm on a PC and SMP platform.

Main two of the experiment porch of this embodiment of the invention:

PC:CPU is Intel Pentium 42.4GHz, internal memory 512MB.

SMP: the processor of two AMD Opteron 1.6GHz, internal memory 1GB.

The volume data that is used to test among this embodiment mainly contains:

CT_BostonTeapot:256x256x78,11.1MB, content is the CT data of teapot.

Photo_Head_1_260:587x341x260,148MB, content is the colored slice of data of head.

CT_Foot:256x256x256,16MB, content is the CT data of foot bones.

The volume data of the raw form that Photo_Head_1_260 is made up of the section of the colour of 260 heads has provided the original graph of the 50th, 100,150 and 200 section respectively shown in Fig. 2 a, Fig. 2 b, Fig. 2 c and Fig. 2 d.Wherein, different colours has been represented head, muscle and artificial processing to original noise data respectively.Fig. 2 e has provided Photo Head 1260 effect reference diagrams, and this figure draws (the volume drawing software based on MITK that 3D Med software is Institute of Automation Research of CAS's exploitation) by 3DMed software.Fig. 2 f has provided the whole design sketch of drawing of Photo_Head_1_260, and Fig. 2 g and Fig. 2 h represent design sketch that bone, muscle are drawn respectively.Fig. 2 i and Fig. 3 j are respectively bone and muscle and make up two kinds of design sketchs (color map or transport function are different) when drawing.

Fig. 3 a and Fig. 3 b provided CT_BostonTeapot with reference to design sketch.Fig. 3 a by Www.volren.orgProvide, Fig. 3 b draws (the volume drawing software based on MITK that 3D Med is Institute of Automation Research of CAS's exploitation) by 3DMed.Fig. 3 c, Fig. 3 d, Fig. 3 e and Fig. 3 f have represented the design sketch that utilizes the different colours mapping and from different perspectives CT_BostonTeapot is drawn respectively.

Provided on dawn 4000A server shared platform as Fig. 2 a, " pre-projection " technology is to the influence of the CT data rendering performance of U.S. numeral people bone, and wherein the form size is 2048x2048, and the object space sampling rate is 10, and the image space sampling rate is 2.Shown in Fig. 2 b and Fig. 2 c, provided respectively and whether used of the influence of " pre-projection " technology load balance.

Shown in Fig. 4 a, the inventive method adopts pre-shadow casting technique to help the overall performance of parallel algorithm." pre-projection " separates the screen picture zone of not data receptor influence, and only the view field to the data receptor influence carries out draw calculation, has reduced a large amount of invalid computation, helps the overall performance of ray cast parallel volume rendering algorithm.

Shown in Fig. 4 b and Fig. 4 c, the inventive method adopts " pre-projection " technology to help the load balance of Task Distribution in the ray cast parallel volume rendering algorithm, based on the task division of " pre-projection " the screen picture zone that data receptor influences is made a distinction, only the view field to the data receptor influence carries out task division, this makes that otherness is less between the TU task unit, thereby helps load balance.

In the inventive method one preferred embodiment, utilized kilomega network to communicate, CPU starts the time of a communications: T _{Start_part}=46us; But image rendering partly waits parallel section, and be designated as shared computing time: T _{Parallel_part}=12s; The communication overhead that the algorithm parallelization brings is: T _{Comm_part}=8000us.

Provide communicating by letter of the inventive method respectively as Fig. 5 a and Fig. 5 b and compared T _Comm/ T _CompBe 0.001 and 0.0001 o'clock, the processor number is 4,8,16 o'clock, and task piece number M is to the synoptic diagram that influences of overall operation time T.As can be seen from the figure, when the task piece M of each computing node is 8 and 16 the overall operation time minimum, promptly obtained best performance respectively.Analysis by the front is known:

This shows that the theoretical analysis of experimental result and the inventive method is consistent.

Provided the speed-up ratio of method proposed by the invention and ray cast parallel volume rendering algorithm in the past as Fig. 5 c, this shows, when the processor number is big more, parallel volume rendering processing procedure of the present invention have a better speed-up ratio; Along with the increase of processor number, the speed-up ratio of the parallel volume rendering algorithm of the inventive method increases faster, and extensibility is better.Because method for parallel processing of the present invention has been introduced processes such as more calculating and judgement, thereby increased some overheads when realizing, these expenses can cause in the processor number not good than hour performance; But when having ten above processor nodes, the performance increase that method for parallel processing of the present invention brought will be offset extra balanced load expense, therefore have better speed-up ratio, have better extensibility simultaneously.

To sum up, the inventive method is at the characteristics of group of planes architecture, a kind of ray cast parallel volume rendering disposal route has been proposed, this disposal route at first arrives the screen picture space to volume data " pre-projection ", then view field is carried out task division by continuous row, according to carrying out the calculating of pixel color value for corresponding computing node the Task Distribution of various computing amount based on calculating with the static task allocation strategy of communicating by letter overlapping, having drawn certain task just begins to collect the intermediate image result at every turn, allow color calculation and collection walk abreast, cover communication delay well, realized load balance simultaneously preferably.This disposal route realizes on dawn 4000A server shared platform, has obtained speed-up ratio preferably, has extensibility preferably.

All computing nodes need no longer to wait for that all task computation finish just to host node transmission result data in the inventive method, but calculating a task piece, and begin the calculating of next task piece simultaneously later on just immediately to host node transmission intermediate result data; Host node begins to be assigned with less task piece, and the inventive method has also provided the optimum estimate how many task pieces each computing node distributes, and has significantly reduced the treatment capacity of data, has improved data processing speed.

But should be understood that above-mentioned statement at preferred embodiment of the present invention is comparatively concrete, can not therefore think the restriction to scope of patent protection of the present invention, scope of patent protection of the present invention should be as the criterion with claims.

Claims

1. the method for parallel processing of a drawn by pre-projection light ray projection body by at least one multi-purpose computer, may further comprise the steps in the data handling procedure of volume drawing:

The bounding box of A, calculating said three-dimensional body model;

B, this bounding box is carried out projection to screen space;

The corresponding coordinate of C, storage view field;

D1, by host node to each computing node distribute data Processing tasks; Described step D1 also comprises:

D11, pre-estimate calculated amount: note T _{Start_part}Be the time of a communications of CPU startup, host node is collected the result data of other nodes after calculating, and the required time is T _{Comm_part}Note M is the task piece number of dividing on the computing node, then