HIGH PERFORMANCE COMPUTING FOR THREE DIMENSIONAL PROTON COMPUTED TOMOGRAPHY (HPC-pCT)
GRANT INFORMATION
[0001] Research in this application was supported in part by a grant from the Department of Army USA Medical Research Mat Command (Grant No.
W81XWH-10-1-0170). The Government has certain rights in the invention.
BACKGROUND OF THE INVENTION
1. TECHNICAL FIELD
[0002] The present invention relates to three-dimensional computing. In particular, the present invention relates to proton-computed tomography.
2. BACKGROUND ART
[0003] Generating an internal 3D image of an object with proton computed tomography (pCT) starts by firing many protons through the object. Each proton's path (i.e., trace history) is recorded and the collection of all the trace histories are used as input to a computer program that generates the image. Some posit that images generated through pCT will be more effective in proton-based treatments of cancer than images produced through X-rays.
[0004] The algorithms that produce images from proton trace histories are complex and involve many stages. The computer memory and speed needed to produce a quality 3D image of an object for which proton-based cancer treatment can be applied (e.g. , a human head or pelvis) within clinically meaningful time frames exceeds the capacity of even the most powerful desktop computers available today. Therefore, there is a need for an algorithm to make images from proton trace histories more effectively.
[0005] Detectors for pCT in the prior art employ larger bulky plastic cube shaped tubes (such as in Pemler, et al.) with large and bulky photon sensors
that require much larger volumes that present problems when mounted on a proton gantry with limited space around the patient. Pemler, et al. describes the possibility to perform radiography with protons and used a single X-Y plane in front of the object being analyzed and a single X-Y plane after. Pemler, et al. also uses vacuum photomultipliers, square fibers, and combinatorial readout. Amaldi, et al. describes a Proton Range Radiography system with an imaging area of 10 cm x 10 cm with a stack of thirty thin plastic scintillators. The current pCT detector at Loma Linda uses silicon wafers that are limited in available area (9 cm x 9 cm), thus requiring a mosaic of overlapping tiles to achieve large area (27 x 36 cm). The maximum size now produced of silicon wafers is 10 cm x 10 cm. The tiles are shingle overlapped or placed with edges butting, which create dead space in the detector or double layers that require "mathematical removal" during image calculations.
[0006] Therefore, there also remains a need for a detector for pCT that eliminates dead space and double layers created by the arrangement of tiles.
SUMMARY OF THE INVENTION
[0007] The present invention provides for a proton computed tomography (pCT) detector system, including two tracking (2D) detectors in sequence on a first side of an object to be imaged, two tracking (2D) detectors in sequence on an opposite side of the object to be imaged, and a calorimeter, wherein the tracking detectors include plastic scintillation fibers.
[0008] The present invention also provides for a method of imaging an object by emitting protons from a source through two tracking detectors, through and around the object, and through two opposite tracking detectors, detecting energy of the protons with a calorimeter, and imaging the object.
DESCRIPTION OF THE DRAWINGS
[0009] Other advantages of the present invention are readily appreciated as the same becomes better understood by reference to the following
detailed description when considered in connection with the accompanying drawings wherein:
[00010] FIGURE 1 is a layout of the pCT detector system of the present invention;
[0001 1] FIGURE 2 is a three-dimensional representation of the pCT detector system core idea of the present invention;
[00012] FIGURES 3A-3C are three-dimensional representations of one plane of the tracking detector of the present invention;
[00013] FIGURE 4 is a three-dimensional representation of the system of the present invention;
[00014] FIGURE 5 is flow-chart showing the process of generating images with the scanner system;
[00015] FIGURE 6A is a front view of a silicon strip detector, and FIGURE 6B is a graph of a proton beam detected;
[00016] FIGURE 7 is a perspective view of fibers of the tracking detector; and
[00017] FIGURE 8 is perspective view of a scintillating plate as part of the calorimeter.
DETAILED DESCRIPTION OF THE INVENTION
[00018] The present invention provides for a pCT detector system 10, shown generally in FIGURE 1 , for generating 3D images from proton trace histories. The pCT detector system 10 preferably includes four tracking detectors 12, a calorimeter 14, and a computer cluster (not shown).
[00019] The tracking detectors 12 are used to detect the coordinates in the X and Y plane of the protons. Each detector 12 has two planes, an X plane and a Y plane, with the fibers (strips) oriented in the appropriate direction for X and Y measurements of the proton tracks. Each detector is fabricated by placing one X and one Y plane in close proximity with a thin foam-like material in between.
[00020] There are preferably four tracking detectors 12 arranged around the object 16 to be imaged, two on one side of the object 16 and two on the
opposite side of the object 16. This arrangement is further described below. There can also be more than two tracking detectors 12 on each side of the object 16 in order to create redundancy. The tracking detectors 12 include 1 mm diameter thin circular plastic scintillation fibers 22, closely packed in a row, and covering the full area of the imaging field, preferably 27 cm by 36 cm, or 18 cm x 36 cm. The tracking detectors 12 can be any other suitable size. The core of the fibers is preferably polystyrene with cladding of PMMA (Poly(methyl methacrylate)). The tracking detectors 12 also include appropriate electronics in order to detect the position of the protons and relay this data to a computer cluster.
[00021] The use of scintillation fibers for proton imaging has been reported in the prior art, but the tracking detectors 12 of the present invention offer higher spatial resolution and larger area coverage than previous detectors and eliminate the tile construction of the prior art. Up to six meters can be covered with these plastic scintillating fibers 22 without dead area. The smaller circular fibers 22 produce better resolution to sub-millimeter level over the larger square tubes used in prior art devices. One crucial difference between the present invention and the prior art is the use of SiPM (or solid state photomultiplier) 20 while reading out scintillating fibers 22. Previous devices were based on using bulky vacuum photomultipiiers with poor light sensitivity in green diapazon of the light spectrum. The prior art used thick square scintillating fibers in order to increase light output. The tracking detectors 12 of the present invention use SiPM 20 with as much as two time higher light sensitivity and allows for use of thinner fibers 22.
[00022] The tracking detectors 12 are arranged so that two are placed in sequence on a first side of an object 16 to be imaged and two are placed in sequence on an opposite side of the object 16, as shown in FIGURES 1 and 2. Scanning magnets (or a source) 18 that emit the proton beams of the system 10 are located at a position opposite to the calorimeter 14 on either side of the outer tracking detectors 2. For example, the scanning magnets 18 can be located 240 cm from the center of the object 16, and the calorimeter 14 can be 65 cm from the center of the object 16. The inner tracking detectors 12 can be 15 cm from the center of the object 16 and the
outer tracking detectors 12 can be 30 cm from the center of the object 16. Other distances can be used as appropriate.
[00023] The tracking detectors 12 include the use of low cost Silicon Photomultipliers (SiPM) 20 attached to the plastic scintillating fibers 22 for signal amplification in lieu of phototubes as used in the prior art, as shown in FIGURES 3A-3C. The SiPMs 20 provide continuous coverage and remove the tile construction effect. SiPMs are unaffected by magnetics contrary to phototubes. SiPMs have been historically used in particle detectors. SiPMs are intrinsically fast, of order 100 nanosecond resolving time between events. Therefore, this design allows much higher data acquisition rates than previous detector systems. The plastic scintillating fibers 22 can have a cross-sectional shape of a square (shown in FIGURE 7), circle, or hexagon. The tracking detectors 12 with SiPMs 20 can include two scintillating fibers 22 simultaneously, alternatively, one fiber 22 can be used for one SiPM 20. Preferably, the fiber 22 diameter is in the range of 0.75 mm - 1 mm. The fibers 22 can be polystyrene scintillating fibers that produce light when protons cross their thickness. Two layers of fibers 22 increase detection efficiency in contrast to one layer. Another orthogonal double layer of fibers 22 can be included in order to have a 2D coordinate of the protons in space. As shown in FIGURES 3A-3B, the tracking detectors can include a mechanical support 24 for the SiPM, as well as a Rohacell support 26 that serves as a support for scintillating fiber 22 placing. Any other supports can be used as known in the art in order to create the tracking detectors 12.
[00024] In one example of a tracking detector 12, the area of 27x36 cm2 (tracking part) is covered. If the picked diameter of the scintillating fiber 22 is 1 mm then 270 fibers, 270 SiPMs, and 270 channels of electronics are needed. This is true for a one projection only (X). The 2D plane includes X and Y projections (270 + 360 = 630 channels). The total tracking detector 12 includes approximately 630 X 4= 2520 channels.
[00025] FIGURE 6A shows a side view of the tracking detector 12 of the present invention. In this detector 12, there are two layers of silicon strips. The silicon sensors which connect to the SiPms 20 are 89.5 mm x 89.5 mm with a strip pitch of 238 μΓη and 400 μίτι thickness. The strips of each layer
are individually connected to six ASICs*64 strips. FIGURE 6B shows a view of a proton beam detected by the tracking detector 12 with X and Y coordinates.
[00026] The calorimeter 14 is used to detect proton energy that is emitted from the source 18. SiPM 20 with a diameter of 1.2 mm, can provide a readout (digital or analog) through 1.2 mm wavelength shifter (WLS) fibers 36. The SiPM 20 used in the calorimeter 14 can be the same type as used in the tracking detectors 12. The calorimeter 14 can be arranged in a stack (at least two) of 3 mm scintillator plates 34, as shown in FIGURE 8. In this case, the total number of channels are about 120. This is arrived at by the following. One scintillating plate 34 includes 1 WLS fiber and 1 SiPM. The total number of plates is 120. So, there are 120 plates, 120 WLS fibers, 120 SIPM, and 120 channels of electronics (120 channels). The calorimeter 14 can further include appropriate electronics to relay data to the computer cluster.
[00027] In prior art devices (Amaldi, Premier), different combinations of readout systems were used. There has never before been the use of SiPM and scintillator fibers and SiPM and WLS fibers in combination. Amaldi, et al. used SiPM and WLS fibers for the calorimeter readout, but gas photomultipliers were used as a tracking detector. Premier, et al. used vacuum photomultipliers to readout scintillating fibers and vacuum photomultipliers to readout the calorimeter. In both mentioned cases, the targeted system name was a radiograph whereas the present invention is a 3D tomograph.
[00028] FIGURE 4 shows the system 10 with a rotational stage 28 that holds and rotates an object 16 during proton exposition.
[00029] Unique features of this detector include large area coverage without dead zones or overlapping detectors that can produce artifacts in the reconstructed images. This solution offers better signal to noise than a previous approach that uses silicon strip detectors. Thus, scintillation fibers attached to silicon photo multipliers reduce background noise to produce cleaner images and lower dose to the patient for imaging. This is the first pCT detector to record protons at a rate of 2 million per second and reduce imaging time to less than 10 minutes, an acceptable time for patient imaging.
[00030] The present invention provides for a method of imaging an object by emitting protons from a source through two tracking detectors, through and around the object, and though two opposite tracking detectors, detecting the protons with a calorimeter, and imaging the object.
[00031] More specifically, protons are emitted from a source 18 (scanning magnets) through two tracking detectors 12, through and around the object 16 to be imaged, through additional two tracking detectors 12, and finally pass through the energy detector (i.e., calorimeter 14) (FIGURE 1 ). Protons of sufficient energy can penetrate the human body and can be tracked upon entry and exit of the tracking detectors 12. The X and Y position of the protons are measured and detected on the X and Y planes of each of the tracking detectors 12. Preferably, the system 10 includes 4 detectors or eight planes of these fiber trackers to record the position of each proton track as it passes through each plane. The object 16 being imaged is located such that two detectors 12 are on each side of the image. The detectors 12 rotate about the object 360 degrees during operation.
[00032] Generating 3D proton computed tomography images requires data from a large number of proton histories to be stored in memory. Previous computer programs executed on stand-alone general purpose graphical processing unit (GPGPU) workstations implemented in prior art were constrained by the demand on computer memory. The approach described here is not limited to demands on computer memory in the same manner as the reconstruction is executed on multiple workstations simultaneously.
[00033] The resulting data recorded from the detector and the calorimeter can be fed into an image reconstruction software program on a computer cluster to allow high-resolution images to be formed in under 10 minutes after irradiation.
[00034] The computer program is written in terms of the Message Passing Interface (MP!) standard. Writing the program in terms of the MPI standard enables one to run the program on a cluster of many computers as opposed to a single computer. The program can be run on approximately 768 CPUs (computers) plus 96 GPUs (graphic processing units). In this way, it is
possible to bring to bear the combined memory and computational power of many computers at one time, thus substantially reducing the time it takes to run the program (by orders of magnitude) as well as increasing the problem size (as measured by size of image space and the sharpness of the image) the program can image. Therefore, data acquired from the calorimeter 14 can be analyzed on a cluster of multiple computers with this standard.
[00035] A compact way was also developed in which to store in the computer's memory the information needed to generate the image. This compact design reduces the memory required to solve the problem by orders of magnitude. Previous computer programs implemented in the prior art, which executed on stand-alone workstations, were constrained by the amount of computer memory their programs required.
[00036] By contrast the compact memory representation enables one to deploy the computer program to solve the entire 3D image space at one time. This results in images of higher quality as it allows one to use the information from the trace histories of more protons as compared to the "slice and assemble" technique used in the prior art.
[00037] The new computer program of the present invention is capable of producing high-quality 3D internal images for cancer patients being treated with proton therapy. Most notably, the new computer program, and specifically the two novel techniques of adding MPI and the compact memory representation, is able to produce quality images faster than any other known methods and, in fact, so fast (i.e., less than 10 minutes) that the program can not only be used to augment or replace X-ray generated images used in proton treatment of cancer today, but medical care providers can find new uses for such images to improve the quality of care delivered to the patient (e.g., just before treating the patient with proton therapy generate one more set of images to refine the treatment plan). pCT imaging and this new computer program can become the state of the art, and thus become the de facto standard, for generating images for every cancer patient globally treated with proton therapy.
[00038] The computer program is further described in Example 1 below, and an example of code is shown in EXAMPLE 2. Briefly, a foreman computer
distributes equal amounts of proton histories to multiple worker computers and the foreman computer itself. An Integrated Electron Density (I ED) and Most Likely Path (MLP) are computed by each of the computers. A solution vector is solved for and stored on computer readable memory in each of the computers. Copies of the solution vector from the worker computers are sent to the foreman computer and combined and stored on computer readable media. The combined solution vector is tested by the foreman computer, and if the combined solution vector is determined to be done, an image is then produced of the object. If it is determined that the combined solution vector is not done, the combined solution vector is transmitted to the worker computers, and each of the above steps are repeated until the combined solution vector is determined to be done by the foreman computer, and an image is produced of the object.
[00039] FIGURE 5 shows a flow chart of the general process of the present invention. The proton detection process is shown in the bottom row, and data from the tracking detectors 12 and calorimeter 14 is analyzed from the top using parallel processors and the algorithms described above. Finally, an image is generated, which looks generally like a CT image.
[00040] Thus, the advantages of this approach are that it significantly reduces both the time and memory, each by orders of magnitude, required to generate an image. These dramatic reductions in both time and space not only change how the images are produced, but also how they can be used in cancer therapy.
[00041] The invention is further described in detail by reference to the following experimental examples. These examples are provided for the purpose of illustration only, and are not intended to be limiting unless otherwise specified. Thus, the invention should in no way be construed as being limited to the following examples, but rather, should be construed to encompass any and all variations which become evident as a result of the teaching provided herein.
EXAMPLE 1
Memory and CPU Count Estimate for Block Methods
March 2010
1 The Approach
Our proposed solution to implement string averaging methods is to use MPI with set of worker processes where one worker is designated as the foreman. The foreman reads the proton histories and distributes them evenly across all the workers, saving an equal share for himself. From their portion of the histories each worker computes (one time only) IED and MLP and initializes their solution vector which represents the entire voxel space (i.e., each worker has its own copy of the entire solution).
The program then enters an iteration loop in which:
1. Each worker uses their IED and MLP to modify their copy of the solution. This is an iterative process with its own stopping criterion.
2. Each worker that is not the foreman sends their copy of the solution to the foreman.
3. The foreman collects and combines with his own the solution vectors from all the other workers and tests combined solution vector. If combined solution is "done" , then foreman tells all the other workers to end. If "not done'! the foreman broadcasts the combined solution vector to all the other workers who, in turn and along with the foreman, use that as their starting point for their next iteration.
2 Input Parameters
We define the following input paramters that characterize the imaging space and the computational resources.
Plane detector parameters
Ph, Pw height and width (e.g., in cm), respectively, of the plane detectors
Pr resolution of detector strips (e.g., 238 μ)
Imaging parameters
Ia number of discrete angles (360°) from which imaging protons are fired
Voxel space parameters
¾j Vw, Vd height, width, and depth (e.g., in cm), respectively, of voxel space
VT desired resolution of image (e.g., 1 mm)
Vs MLP image resolution oversample rate (e.g., 2)
Vp protons per voxel; number of imaging protons travelling through each voxel
Computational resource parameters
M memory per process
/ number of bytes to store a single precision float
p number of bytes to store a pointer
3 Memory Contrairit and Process Count
In this section we compute the number of histories each worker can process, which we denote with Hw, as constrained the memory per process M, and from Hw and the total number of histories determine the number processes will be required, denoted by P, to solve the problem.
Indices and Counting. We find a repeated need to compute the minimum number of bytes needed for counting and to also index various discretized spaces. For notational convenience we therefore define these terms
β V; the number of voxels in our voxel space,
• bv; the number of bytes needed to index the linearlized voxel space,
• bph and b^; the number of bytes needed to index dimensions ¾ and Pw, respectively, and
• bja, the number of bytes needed to index the discrete firing angles
Input. Each proton history is characterized by an 11-tuple comprised of four < x, y > tuples (one for each detector plane), input and output energies (_¾ra and i5out) , and a projection angle. These values require ph + bpw bytes of storage for each of the four < x, y > tuples plus three floats for the last three values. Each history is processed to determine the 3-D surface of the phantom (i.e., space carving). In this process the voxel space is represented by an array, one element for each voxel, and the elements of the array are set or cleared if the phantom does nor does not occupy the voxel, respectively. We choose initially to represent each voxel with a byte rather than a bit to reduce the time to access the array. Therefore, the memory (Im) required to store the processed input projection histories for each worker is
I
m = H
w[4 b
ph + b
pw) + 3f] + V
IED. Each proton history is used to compute an Integrated Electron Density (IED) , which is stored as a float. The memory needed to store the IEDs (IEDm) for each worker is
IEDm = Hwf (2)
MLP. Each proton history is used to compute the proton's Most Likely Path (MLP) by identifying those voxels through which the proton passed- Assuming near-linear trajectories, the set of voxels "touched" by a single proton's path is very small compared to the total voxel space and so a sparse representation is appropriate. We choose initially to list each voxel in a proton's path by identifying the voxel by its index in the linearized voxel space (i.e. , requiring bv bytes for each index value) . Proton history data is generated by firing many imaging protons from each of a number of different angles. Again assuming near-
those voxels that were "touched"). Initially, we choose to compute an estimate the chord length, a floating point number, as a function of the voxel cube dimensions (V¾, ^, d) and the firing angle. Without assuming anything regarding how the proton histories will be partitioned over the set of worker processes (e.g., a worker may receive all histories from a single firing angle or from many, or all, firing angles) a simple memory layout that "connects" a chord length and the angle that produced it to the voxel indices of the protons fired from that angle is to maintain a vector of chord lengths and two additional vectors in tandem as follows;
• chord lengths - This is a vector of floats that records the computed chord length for each discrete firing angle (i.e., length = Ia).
• firing angle index - This is a vector of integer indices into the first vector of chord lengths that records the angle from which each proton history was fired (i.e., length = Hw).
• voxel index array - This is a vector of pointers where each proton history points to the vector of voxel indices (estimated length Id) the proton "touched" on its path (i.e., length— Hw).
Under these approximations and simple memory layout scheme, an upper limit approximation of the memory needed to store the MLP voxels and their chord lengths (MLPm) for each worker is
MLPm = Iaf + Hw{bIa + p+ {l^\ + l
1In practice MLP evaluations will start and end with the entry and exit 3-tuples < x, y, z > generated from the space carving pre-processing of the input.
Solution Vector. The solution vector is an array of floats whose length is determined by the number of voxels V . We note that each worker maintains its own copy of the entire solution vector, and so, the memory needed to store the solution vector (Vm)
m each worker
Histories per worker. Each worker, including the foreman, must have enough space to store each of the above memory components, however, we note that once the input has been processed its memory may be re-used for the solution vector.
IEDm + MLPm + MAX(Im, Vm) < M (5)
Re-writing inequality (5) we see that we can compute the maximum number of histories each worker can process, H
w, as constrained by their memory M as follows:
+MAX(HJi{ l^ + fiSeSl) + 3/, + [» , (r¾i¼) < M (,)
Number of processes. The number of voxels V and the number of protons per voxel V
p tells us how many input proton histories there will be in all. That, when combined with maximum number of histories each worker can process as constrained by their memory, H
w , tells us how many worker processes we will need for to solve the problem which we denote P.
Case Study- compute here Hw and P using the following values:
20 cm
la = 180
vh, vm, vd = 20 cm
VT = 1 mm
vs = 2
M = 2 GB
f,P = 4 bytes
Substituting those values into inequality (6) we get2
=
= 2, 781.¾ + 720 + MAX(2SHW + 8xl06, 32xl06) < 2 GB
: I 28¾ + 8x10s > 32xl06: Rewriting inequality (8)
2, 781¾ + 720 + 28¾, + 8xl0
6 < 2 GB
Hw < 761, 652.9
Case II 28ίΤω + 8xl06 < 32xl06: Rewriting inequality (8)
2, 78L¾, + 720 + 32xl0
6 < 2 GB
Hw < 760, 691.5
Since the memory required to store the solution under Case II Vm = 32xl06 is greater than the amount of memory to store the input Im = 28HW + 8xl06 ~ 29xl06 we use the value Hw = 760, 692 computed under Case II. Applying that value to equation (7) we find that we need
- _
106
2Rounded ( f——— ^*^—~1) up from 3 to 4 so that value can be stored in int.
Appendix - for spreadsheet
Prom inequality (6) we solve for HW under the two following Case I Solving for HW in (6) assuming IM > VM-
+14(rifi¾il1 + ri¾tl -VhVwVd
or expressed using more convenient notation
4/ + bIa + p + ( ^ [™αΪ J^ + 1)¾ + 4(¾,Λ + bpw) >^ 0»
Substituting the parameters from section 4 we compute H
W as follows:
2,139,482,928
[16 + (1) + 4 + (692 + 1) (4) + 4(2 + 2)1
761,652.9 > H
U
Case II Solving for H
W in (6) assuming I
m <VM-
v 'r3
xpressed using more convenient notation
M- (Ia + V)f
>-ff (10)
/ + i;.+p + (L¾ + i)¾
Substituting the parameters from section 4 into (10) we again compute H
W as follows
3
(flis^ l) + 4+ (L^ j + i)(ri≥≥¾^l)
2,115,482,928
4+ (1) + 4+ (693) (4)
760,691.5 > HU
3Rounded ( [i^ti∑]21 ) Up from 3 to 4 so that value can be stored in int.
EXAMPLE 2
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi . h>
const int RandomizedRange = 10000;
/* command line configurables */
int Randomized; /* -r <nelem> */
int N;
int parse_command_line_args ( int argc, char **argv, int my_id)
{
int i ;
int error;
/* default values */
Randomized = 0 ;
N = 0;
for (i = 1, error = 0; ! error && i < argc; i ++)
{
if ( !strcmp(argv[i] , "-r"))
{
Randomized = 1 ;
if (i + 1 < argc && (N = atoi ( argv[ i+1 ] ) ) > 0)
i ++;
else
error = 1 ;
}
else
error = 1 ;
} /* endfor */
if (error && !my__id)
{
/* only Master prints usage message */
fprintf (stderr, "\n\tusage: %s {-r <nelem>}\n\n" , argv[0]);
fprintf (stderr, " here\n\n" ) ;
f rintf ( stderr ,
"\t-r <nelem>\t- Randomized input with N=nelem\n\n" ) ;
} /* endif */
return error;
} /* end parse_command_line_args ( ) */
struct stats
{
float mean;
float sse;
unsigned int n;
}; /* end struct stats */
,/*
"Note on a Method for Calculating Corrected Sums of Squares and Products," B.P. Welford, Technometrics , Vol 4, No. 3, Aug 1962, 419-420.
The Art of Computer Programming, D.E. K uth, vol 2, 2nd Edition, p 216 The Art of Computer Programming, D.E. Knuth, vol 2, 3rd Edition, p 232
Reccurrence forumla that keeps a running mean (M_k) and running SSE (S_k)
for some stream of values X i
For x_l
M_l = x_l
S__l = 0
For x_i i>l
M_i = M_(i-1) + [(x_i - M_(i-l))/i]
S__i = S_(i-1) + [ (x__i - M (i-1 ) ) * (x i - M_i) ]
At any time k>l
running mean = M_k
running _sample_ (i.e., not _jpopulation_) variance
"Updating Formulae and a Pairwise Algorithm for Computer Sample Variances," T.F. Chan, G.H. Golub, and R.J. LeVeque,
Technical Report STAN-CS-79-773, Department of Computer Science,
Stanford University, 1979,
ftp: //reports . Stanford. edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf
Given two sample sets where the mean and SSE have been computed
for each; <M_k,l S_k,l N_l> and <M_k,2 S_k,2 N_2>
we can compute the combined mean and SSE as follows
N = N_l + N_2
d = M_k, 2 - M_k, 1
combined mean = M_k,l + d*N_2/N
combined SSE = S_k,l + S_k,2 + d"2*N_l*N_2/N
*/
static void push(float *m, float *s, unsigned int *n, float x)
{
*n = *n + 1;
if (*n == 1)
{
*m = x;
*s = 0.0;
}
else
{
float old_mean = *m;
float d = x - old_mean;
*m += (d/(*n));
*s += (d* (x - *m) ) ;
} /* endif */
/* printf( "after pushing %f RunningMean %f RunningSSE %f\n", x, *m, *s); */ return;
} /* end push( ) */
void combine_stats ( void *operand_vec ,
void *operand_result_vec ,
int *len,
MPI_Datatype *dtype)
{
struct stats *op = (struct stats *) operand_vec;
struct stats *op_res = (struct stats *) operand_result_vec;
int i;
unsigned int combined_n;
float delta;
for (i = 0; i < *len; i ++)
{
combined_n = op[i] .n + op_res[i] . n;
delta = op[i] .mean - op_res [ i ] .mean;
op_res [ i] .mean += (delta * op[i].n / combined_n);
op_res [ i] . sse += (op[i].sse +
(delta * delta * op[i].n * op_res[i].n / combined_n) ) ; op_res[i].n = combined_n;
} /* endfor */
return ;
} /* end combine_stats ( ) */
#if 0
void combine_stats ( void *operand_vec ,
void *operand_result_vec ,
int *len,
MPI_Datatype *dtype)
{
struct stats *op = (struct stats *) operand_vec;
struct stats *op_res = (struct stats *) operand_result_vec ;
int i;
for (i = 0; i < *len; i ++)
{
op_res [ i] .mean += op[i].mean;
op_res [ i ] . sse += op [i]. sse;
op_res[i].n += op[i].n;
} /* endfor */
return;
} /* end combine_stats ( ) */
#endif
static float SampleData[ ] = {17.0, 19.0, 24.0};
int main(int argc , char **argv)
{
int numprocs, my_id;
int i ;
struct stats s[2];
MPI_0p op;
float *src_data;
/* for derived struct */
MPI_Datatype Stats_mpi;
int block_lengths [ 3 ] ; /* number of "things" in each element of */
/* our new struct, for us, 1 each */
MPI_Aint displs[3]; /* displacement of each element */
MPI_Datatype types [3]; /* type of each element */
MPi_Aint start_addr;
MPI_Aint addr;
MPI_Init( Sargc , &argv);
MPI_Comm_size (MPI_COMM_WORLD, Snumprocs) ;
MPI_Comm_rank ( MPI_COMM_WORLD , &my_id ) ;
printf ( "hello, world: my_id %d numprocs %d\n" , my_id, numprocs);
if (parse_coinmand_line_args ( argc , argv, my_id) )
{
MPI_Finalize ( ) ;
exit ( 1 ) ;
} /* endif */
if (Randomized)
{
if ( ( src_data = malloc (N*sizeof ( float )) ) == NULL)
{
fprintf ( stderr , "ERROR: failed malloc %n bytes \n " , N*sizeof ( float )) ; exit ( 1 ) ;
} /* endif */
for (i = 0; i < N; i ++)
{
/* about 1/2 positive and 1/2 negative */
src_data[i] = (random) ) % RandomizedRange ) - (RandomizedRange/2 ) ; printf ( "my_id %d: sample_data %f\n" , my_id, src_data[i] ) ;
}
}
else
{
src_data = SampleData;
N = sizeof (SampleData) /sizeof (float) ;
} /* endif */
/**************************************/
/* compute local running mean and SSE */ s [ 0 ] . n = s [ 1 ] . n = 0 ;
for (i = 0; i < N; i ++)
{
/* just push the same sample data into both 'regions' of s[] */ push (&( s [ 0 ] .mean) , S(s[0].sse), &(s[0].n), src_data[i] ) ;
push (&( s [ 1 ] .mean) , & ( s [ 1 ] . sse ) , &(s[l].n), src_data[i] ) ;
} /* endfor */ /* compute global running mean and SSE */ block_lengths [ 0 ] = block_lengths [ 1 ] = block_lengths [ 2 ] = 1;
types [ 01 = MPI_FLOAT;
types[l] = MPI_FLOAT ;
types [ 2 ] = MPIJJNSIGNED ;
MPI_Address ( & ( s [ 0 ] .mean) , &start_addr ) ;
displs[0] = 0;
MPI_Address(&(s[0] .sse) , &addr);
displs[l] = addr - start_addr;
MPI_Address ( & ( s [ 0 ] .n) , &addr);
displs[2] = addr - start_addr;
/* build new derived datatype */
MPI_Type_struct(3, /* nelements */
block_lengths ,
displs ,
types ,
&Stats_mpi ) ;
MPI_Type_commit(&Stats_mpi) ;
MPI_Op_create ( (MPI_JJser_function* ) combine_stats , // my user-defined func
1, // commute flag
Sop); // op handle
MPI_Allreduce ( s , // operand
s , // result
2 , II count
Stats__mpi, // data type
op, // operator
MPI_COMM_WORLD) ;
MPI_0p_free ( &op )
printf("myid %d: after s[0].mean %f s[0].sse %f s[0].n %d\n", my_id, s[0] .mean, s[ 0 ] . sse, s [ 0 ] .n) ;
printf("myid %d: after s[l].mean %f s[l].sse %f s[l].n %d\n", my_id, s[l] .mean, s[ 1 ] . sse , s [ 1 ] .n) ;
if (my_id == 0)
{
/* computing mean and SSE the old-fashioned way for checksums */
float sum
float sse
float mean
for (i = 0; i < N; i ++)
sum += (src_data[i] * numprocs);
mean = sum/ (N*numprocs) ;
for (i = 0; i < N; i ++)
sse += ( ( src_data [ i ] - mean) * ( src_data[ i ] - mean ) *numprocs ) ;
printf("myid %d: 2-pass mean %f sse %f n %d\n" ,
my_id, mean, sse, N*numprocs ) ;
} /* endif */
if (Randomized)
{
free ( src_data ) ;
} /* endif */
MPI_Finalize( )
exit ( 0 ) ;
} /* end main( ) */
#ifndef PCT_IED_H
#define PCT_IED_H
#include "pct.h" // struct entry_exit
#if defined( cplusplus)
extern "C" {
#endif
void compute_IED ( int nhistories, float *input_floats , float *IED);
void adjust_IED ( int nhistories,
unsigned char * history_bitmask,
unsigned short *input_ushorts ,
float *input_floats ,
float * IED,
struct entry_exit_t *entry_exit ) ;
#if defined( cplusplus)
}
#endif
#endif /* PCT_IED_H */
#ifndef mlp m
#define mlp m
/* here we use 'unsigned int's to store voxel indices */
class MLP_m
{
public :
MLP_m( int my_nh ) ;
~MLP_m( ) ;
/
/* proton histories */
/********************/
void set_my_nhistories_for_MLP ( int mnh_for_MLP )
{ my_nhistories_for_MLP = mnh_for_MLP; }
void add_voxel_idx_to_histor ( int h_idx, unsigned int v_idx) ;
/* iterators */
unsigned int *get_first_voxel_idx_from_history( int h_idx);
unsigned int *get_next_voxel_idx_from_history ( int h_idx,
unsigned int *last_voxel_idx__p) ; /* faster iterator - assumes voxel idxs for a given history */ /* stored in a contiguous mem block */ unsigned int get_history_idx_to_ntouched_voxels_idxs ( int h_idx) ;
/***************** /
/* chord lengths */ float get_chord_length(int h_idx);
void register_firing_angle ( int h_idx, float theta) ;
/*************************** /
/* memory management stats */
/
int get_curr_mem__block_idx( ) { return curr_mem_block_idx; } int get_n_mem_blocks ( ) { return n_mem_blocks }
unsigned long get_n_recorded_voxel_idxs ( )
{ return n_recorded_voxel_idxs ; }
int get__my_nhistories_for_MLP ( ) { return my_nhistories_for_ LP; } unsigned long get_n_copies ( ) { return n_copies; }
unsigned long get_n_copied_bytes ( ) { return n_copied_bytes ; } unsigned long get_nbytes_allocd_for_voxel_idxs ( ) ;
int get_n_allocd_blocks ( ) { return curr_mem_block_idx+l ; }
unsigned int get_block_nbytes ( int b_idx);
private :
/********************I
I* proton histories */
/********************/
/* my_nhistories = as many as I was given from input */
/* my_nhistories_for_MLP = number of histories remaining after */ /* throwing out as many as we know did */
/* not touch the phantom ... used only */
/* for initial big block malloc */ int my_nhistories ;
int my_nhistories_for_MLP
unsigned long n_recorded_voxel_idxs ; // for estimating subsequent
// big block malloc 's
unsigned int **history_idx_to_touched_voxels idxs ; // len=my__nhistories unsigned int *history_idx_to_ntouched_voxels_idxs ; // len=my_nhistories int n_mem_blocks ;
unsigned int **mem_block_pointers ; // len = n_mem_blocks
unsigned int *mem_block_nidx_lengths ; // len = n_mem_blocks
/* slot for next voxel idx */
int curr_mem_block_idx;
unsigned int *next_voxel_idx_p;
/* chord lengths */
/* int n_firing_angles ; */
float *chord_lengths; // len = N_FIRING_ANGLES int *history_idx_to_chord_length_idx_map; // len = my_nhistories void compute_chord_lengths ( ) ;
/***************************/
/* memory management stats */
/***************************/
unsigned long n_copies;
unsigned long n_copied_bytes ;
} ;
#endif
#ifndef PCT_PCT_H
#define PCT PCT H
#ifdef MAIN
#define EXTERN
#else
#define EXTERN extern
#endif
#include <iostream>
#include <cmath>
#include <math . h>
#include <sstream>
#include <vector>
#include <iomanip>
#include <fstream>
#include <string>
#include <cstdio>
#include <cstdlib>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <time.h>
#include <sys/time . h>
#ifndef NO_MPI
#include <mpi.h>
#endif
#include "mlp_m.h"
using namespace std;
/
/* Problem Space */
/***************** /
/*** START DEFINING THE SENSORS ***/
// Data space definitions
/ / Sensor definitions - distances in CM, origin in center of volume
const float TRACK_ Zl -30.0; // sensor plane closest to proton gun
const float TRACK_ ~Z2 = -20.0; // 2nd sensor plane from gun
const float TRACK_ ~Z3 = 20.0; // 3rd sensor plane from gun
const float TRACK_ ~Z4 = 30.0; // 4th and last sensor plane from gun
const float TRACK_ ~_Z 1_ _X = TRACK_ Zl + 0 1 // Slightly different distances const float TRACK_ 'z1 _Y = TRACK_ "zi - 0 1 // x and y sensor planes
const float TRACK_ ~Z2 _x = TRACK_ ~Z2 + 0 1
const float TRACK_ ~Z2 _Y = TRACK_ [Z2 - 0 1
const float TRACK_ "z3~ X = TRACK_ ~Z3 - 0 1
const float TRACK_ "Z3" _Y = TRACK_ ~Z3 + 0 1
const float TRACK_ "Z " _x = TRACK_ ~Z4 - 0 1
const float TRACK_ ~Z4~ _Y = TRACK_ "Z4 + 0 1
// X-axis is HORIZONAL as you stand behind the gun facing it shooting direction // X-axis range values in CM
const float MIN_TRACK__X = -15.00;
const float MAX_TRACK_X = 15.00;
const int MAX_TRACK_X_IDX = 3000;
// Y-axis is VERTICAL as you stand behind the gun facing it shooting direction // Y-axis range values in CM
const float MIN_TRACK_Y = -15.00;
const float MAX_TRAC _Y = 15.00;
const int MAX_TRACK _Y_IDX = 3000;
const float RANGE_TRACK_X = KAX_TRACK_X - MIN_TRACK_X ;
const float RANGE_TRACK_Y = MAX_TRACK_Y - MIN_TRACK_Y ;
// sensor resolution in CM
const float TRACK_DX = ( RANGEJTRACK_X/MAX_TRACK_X_ID ) ;
const float TRACK_DY = (RANGE_TRACK_Y/MAX_TRACK_Y_ID ) ;
/*** END DEFINING THE SENSORS ***/
/*** START DEFINING VOXEL SPACE BETWEEN SENSORS ***/
/* The data reconstruction space has two coordinate systems. */
/* The first is measured in centimeters and has the same XYZ orientation */
/* as the sensor grid at beam rotation angle of 0. The origin is the at */
/* the center of the space between the sensor planes. X is positive to */
/* the right (as seen from the central origin). Y is positive up. Z is */
/* positive in the direction of the beam. */
/* The second coordinate system is based on a plane-major, row major ordering */
/* scheme with planes parallel to the beam rotation plane. The column axis */
/* is parallel to X. Row axis is parallel to Z. Plane axis is parallel to Y. */
/* The reconstruction volume is an array a[MAXJPLANE ] [ AX_ROW] [MAX_COL] */
/* Minimum index values are always 0. Maximum values depend on the */
/* reconstruction bounds and the reconstruction resolution. */
// The desired bounds on the reconstruction volume may not be exactly
// what is used, due to sampling constraints.
// In code, use VOLUME_{MIN,MAX}_{X,Y, Z} . See below.
const float DESIRED_ _V0LUME_ _MIN_ _X = -14 .00;
const float DESIRED VOLUME MAX X = 14 .00;
const float DESIRED_ _VOLUME_ _MIN_ _z = -14 .00;
const float DESIRED _VOLUME_ _MAX_ _z = 14 .00;
// The height
const float DESIRED_ _VOLUME_ _MIN_ _Y = -7. 00;
const float DESIRED_ _VOLUME_ _MAX_ _Y = 10 .00;
// Voxel sampling sizes in cm
const float D_COL = 0.1
const float D_ROW = 0.1
const float D_PLA E = 0.1
const float D_DATA[ 3 ] = {D_COL, D_RO , D_PLANE} ;
// NICK: these (later) need to be determined by specified distances and
// NICK: specified resolutions.
/* const int MAX_R0WS = 300 // Should be even */
/* const int MAX_C0LS = 300 */
/* const int MAX_PLANES = 300 */
const int MAX_C0LS
( int ) ( (DESIRED_VOLUME_MAX_X DESIRED_VOLUME_MIN_X) / D_C0L ) ;
const int MAX_R0WS
(int) ( (DESIRED_VOLUME_MAX_Z DESIRED_VOLUME_MIN_Z ) / D_ROW);
const int MAX_PLANES =
( int ) ( (DESIRED_VOLUME_MAX_Y DESIRED_VOLUME_MIN_Y) / D_PLANE ) ;
const int MAX_DAT [ 3 ] = {MAX_COLS , MAX_ROWS, MAX_PLANES} ;
const float VOLUME_ _MIN_ _X = DESIRED_VOLUME_ _MIN_
const float VOLUME_ _MAX_ X = VOLUME_MIN_X + MAX] COLS * D COL ;
const float OLUME" _MIN_ _Y = DESIRED_VOLUME_ '
const float VOLUME_ _MAX_ _Y = VOLUME_MIN_Y + MAX_ PLANES * D PLANE;
const float VOLUME _MIN_ z = DESIRED_VOLUME_ _MIN_
const float VOLUME_ _MAX_ z = VOLUME_MIN_Z + MAX_ _ROWS * D_ _ROW;
// NICK: VOXEL_CUBE_DIAG_VOXEL_LEN and NVOXELS (later) need to be corrected // NICK: when MAX_{ROWS , COLS,PLANES} are convereted from nvoxels (which is // NICK: what they are now) to cm ... VOXEL_CUBE_DIAG_VOXEL_LEN and
// NICK: must both be measured in nvoxels
const float VOXEL_CUBE_DIAG_VOXEL__LEN = sqrt ( (MAX_ROWS*MAX ROWS)
+ (MAX_COLS*MAX_COLS)
+ (MAX_PLANES*MAX_PLANES ) ) ;
const int NVOXELS = MAX_COLS*MAX_ROWS*MAX_PLANES ;
/*** END DEFINING VOXEL SPACE BETWEEN SENSORS ***/
// for space carving and entry-exit, cut-off between "hit" or "miss" phantom const float ENERGY_THRESHOLD = 190.0;
const float ENERGY_DIFF_THRESHOLD = 2.7;
const float ELECTRON_DENSITY_WATER = 1.0;
const float ELECTRON_DENSITY_AIR = 0.001;
const float INITIAL_SOLUTION_INSIDE_PHANTOM = ELECTRON_DENSITY_WATER;
const float INITIAL SOLUTION OUTSIDE PHANTOM = ELECTRON DENSITY_AIR;
// for MLP
/ / constants in nature
const float RADIATION_LENGTH_WATER = 36.1; //cm
// not used const float RADIATION__LENGTH_AIR = 28818; //units? // distance in CM between planes on one side of phantom
const float SSD_SEPARATION = ( TRACK_Z2-TRACK_Z 1 ) ;
// distance in CM between innermost planes
const float SEPARATION = (TRACK_Z3-TRACK_Z2 ) ;
// no longer used const float DIMENSION = 256;
// something in CM
// no longer used const float RECON_SIZE = 21;
// no longer used const float PIXEL_SIZE = RECON_SIZE/DIMENSION;
// in cm
const float STEP_LENGTH = .05; //originally -> 0.5 * PIXEL_SIZE;
const int NUM_STEPS = (int) ( SEPARATION/STEP_LENGTH) ;
const int N_FIRING_ANGLES = 180; // should be factor or multiple of 3 const float THETA_IDX_PER_DEGREE = (((float) N_FIRING_ANGLES ) /360.0 ) ; inline int degrees_theta_to_idx( float theta)
{ return ( ( int ) ( theta*THETA_IDX_PER_DEGREE + 0.5)); }
inline float idx_to_degrees_theta ( int idx)
{ return (((float) idx) /THETA_IDX_PER_DEGREE ) ; }
/* For Linear Solvers */ const float DEFAULTJRELAXATION = 0.3;
/*************************/
/* For Memory Management */ const int MLP_MEMBLOCK_PTRS_PER_ALLOC = 4;
// const float PORTION HISTORIES FOR MLP = .95; // estimate portion
// of histories remaining // _after_ throwing out // histories we know // don't touch phantom const float PORTION_HISTORIES_FOR_MLP = 0.60;
/* const float PORTION_VOXEL_CUBE_DIAG_FOR_MLP = 0.60; */
/* const float PORTION_VOXEL_CUBEJDIAG_FORJ¾LP = 0.40; */
const float PORTION_VOXEL_CUBE_DIAG_FOR_MLP = 0.30;
/* for 38M, setting PORTION_VOXEL_CUBE_DIAG_FOR_MLP = 0.20, */
/* results in initial alloc .2GB when we need .3GB */
/* const float PORTION_VOXEL CUBE DIAG FOR MLP = 0.20; */
EXTERN char Nowstring [ 100 ] ;
EXTERN char Relaxparamstring [ 100 ] ;
/* mem mgmt counters */
EXTERN unsigned long N_mallocs;
EXTERN unsigned long N_reallocs;
EXTERN unsigned long N__allocated_bytes ;
EXTERN unsigned long Largest_single_mem_request;
EXTERN char Largest_single_mem_request_fname [ 500 ] ;
EXTERN int Largest_single_mem_request_line_no ;
#ifdef DEBUG
EXTERN unsigned long N_frees;
EXTERN unsigned long N_freed_bytes ;
EXTERN unsigned long N_allocated_bytes_high_water_mark;
EXTERN unsigned long N_outstanding_allocated_bytes ;
struct alloc_entry
{
void *p;
unsigned long len;
} ;
const int N_ALLOCJENTRIES = 1000;
EXTERN struct alloc_entry Allocs [ _ALLOC_ENTRIES ] ;
EXTERN int Next_alloc_idx;
#endif
/* theoretically possible (but very unlikely) that the first alloc 'd */
/* block is not big enough to hold all the voxel idxs touched by the */
/* the first history alone. */
/* in that case we need to REALLOC the first block (not malloc another */
/* because all the voxel id ' s from a single history must reside in a */
/* single allocated block (i.e., they cannot span multiple blocks). */
/* we could change that 'no span' design decision if we want to, but */
/* that will at least require us to change the MLP: :get_{ first, next} */
/* interators used by the linear solvers and_ (worst yet) add another */ /* vector len=my_nhistories of idxs that would index mem_block_pointers [ ] */
/* to record the 'starting' block for the iterators. */
/* we also use this value to pad the estimate for remaining block allocs. */
/* ... in short, set this value relatively low (i.e., "10%). */ /* const float PERCENT_PAD_SUBSEQUENT_BLOCK_ALLOCS = 0.10; */
const float PERCENT_PAD_SUBSEQUENT_BLOCK_ALLOCS = 0.20;
/* computed from previous settings */
const int AVERAGE_TOUCHED_VOXELS_PER_HISTORY_EST =
( int ) ( PORTION_VOXEL_CUBE_DIAG_FOR__MLP*VOXEL_CUBE_DIAG_VOXEL_LE ) ;
/**************************/
/* Other Useful Constants */
const int ForemanRank : = 0;
EXTERN int Myid, Nproc
/* MPI Abort codes */
const int ABORT_ I VALID_CMD_LINE_ARGS = i
const int ABORT_ _FAILED_MALLOC = 2;
const int ABORT_ _FAILED_FOPEN = 3;
const int AB0RT_ _TOO_MANY_HISTORIES = 4;
const int AB0RT_ _INSUFFICIENT_INPUT = 5;
const int ABORT _INVALID_IDX = 6;
const int ABORT_ _INVALID_PTR = 7;
const int AB0RT_ _INVALID_ARG = 8;
const int AB0RT_ _TRACE_OUTSIDE_VOLUME = 9;
const int AB0RT_ JENCONSISTENT_INTERNAL_STATE = 10;
const int ABORT_ _STATS_ERROR = 11;
/* MPI Message tags */
const int INPUT_FLOAT_TAG = 1;
const int INPUT_USHORT_TAG = 2; j
I* Input Data Layout */
/ **********************
/* Describing input_ushorts */
const int X1_DISPL = 0;
const int Y1_DISPL = 1;
const int X2_DISPL = 2;
const int Y2_DISPL = 3;
const int X3_DISPL = 4;
const int Y3_DISPL = 5;
const int X4_DISPL = 6;
const int Y4_DISPL = 7;
const int INPUT_USHORT_STRIDE
/* Describing input_floats *
const int EIN_DISPL = 0 ;
const int EOUT_DISPL = 1;
const int THETA_DISPL = 2;
const int INPUT FLOAT STRIDE
/********************************/
/* Utility Functions and Macros */
// A 3-D array in C/C++ defined a[NPLANES ] [NROWS ] [NCOLS ] is stored in memory
// with changes in the NPLANES dimension having the largest stride
// and changes in the NCOLS dimension with the smallest stride.
inline int convert_3D_voxel_to_idx( int p_idx, int r_idx, int c_idx)
{
return ( ( p_idx * MAX_ROWS * MAX_COLS )
+ (r_idx * MAX_COLS)
+ c_idx) ;
} /* end convert_3D_voxel_to_idx ( ) */
inline void convert_sensor_to_voxel ( float dst[3],
float src [ 3 ] ,
float cos_theta,
float sin_theta)
{
// Input is in cm.
// Convert to world coordinates
// Assuming clockwise rotation about y axis as seen from positive y // Correct assumption?
/*
dst[0] = src[0] * cos_theta - src [ 2 ] * sin_theta;
dst[l] = src[0] * sin_theta + src [ 2 ] * cos_theta;
dst[2] = src[l] ;
*/
// No. Rotation is counterclockwise about y axis as seen from positive y dst[0] = src[0] * cos_theta + src[2] * sin_theta;
dst[l] = -src[0] * sin_theta + src[2] * cos_theta;
dst[2] = src[l] ;
// Translate to data origin and scale to data space.
dst[ 0 ] = (dst[0] - VOLUME_MIN_ ) / D_COL ;
dst[ 1 ] = (dst[ 1] - VOLUME_MIN_Z ) / D_RO ;
dst[2] = (dst[2] - VOLUME_MIN_Y ) / D_PLANE;
return ;
} // end convert_sensor_to_voxel ( )
/* input_ushort accessors */
inline unsigned short get_xl (unsigned short *v, int idx)
{ return v[ ( idx*INPUT_USHORT_STRIDE ) + X1_DISPL]; }
inline unsigned short get_yl (unsigned short *v, int idx)
{ return v[ ( idx*INPUT_USHORT_STRIDE ) + Y1_DISPL] ; }
inline unsigned short get_x2 (unsigned short *v, int idx)
{ return v[ ( idx*INPUT_USHORT_STRIDE ) + X2_DISPL] ; }
inline unsigned short get_y2 (unsigned short *v, int idx)
{ return v[ ( idx*INPUT_USHORT_STRIDE ) + Y2_DISPL] ; }
inline unsigned short get_x3 (unsigned short *v, int idx)
{ return v[ ( idx*INPUT_USHORT_STRIDE ) + X3_DISPL] ; }
inline unsigned short get_ 3 (unsigned short *v, int idx)
{ return v[ ( idx*INPUT_USHORT_STRIDE ) + Y3JDISPL] ; }
inline unsigned short get__x4 (unsigned short *v, int idx)
{ return v[ ( idx*INPUT_USHORT_STRIDE ) + X4_DISPL]; }
inline unsigned short get_y4 (unsigned short *v, int idx)
{ return v[ ( idx*INPUT_USHORT_STRIDE ) + Y4_DISPL] ; }
/* input_floats accessors */
inline float get_ein( float *v, int idx)
{ return v[ (idx*INPUT_FLOAT_STRIDE) + EIN_DISPL]; }
inline float get_eout ( float *v, int idx)
{ return v[ ( idx*INPUT_FLOAT_STRIDE ) + EOUT_DISPL ] ; }
inline float get_theta ( float *v, int idx)
{ return v[ ( idx*INPUT_FLOAT_STRIDE ) + THETA_DISPL ] ; }
/* Abort stuff */
#ifdef N0_MPI
#define PCT_ABORT ( CODE ) assert(O);
#else
#define PCT_ABORT ( CODE ) MPI_Abort ( MPI_COMM_WORLD , (CODE));
#endif
/* Bitmask stuff */
#define CLEAR (v, b) (v)[(b)/8] &= (~(0xl « (7-((b) 8))) & (Oxff))
#define SET(v, b) (v)[(b)/8] |= (0x1 « (7-((b)%8)))
#define TEST(v, b) ((v)[(b)/8] S (0x1 « ( 7- ( (b) %8 ) ) ) )
/ * Memory Management * /
/* these two I want kept a #define so that LINE and FILE resolve */
/* to the place in the source code where the potential error occurred */
#ifdef NO_MPI
#ifdef DEBUG
/* N0_MPI DEBUG */
/* must use assert(O) instead of MPI_Abort() */
#define PCT_MALLOC ( P , S, T) \
{ \
if ( (S) <= 0) \
{ \
fprintf (stderr, \
"ERROR: %s %d: proc %d: " \
"malloc called for %ld bytes\n", \
FILE , LINE , Myid, \
(long) (S)); \
assert ( 0 ) ; \
} \
if ( ! ( (P) = (T * ) malloc ( (S) ) ) ) \
{ \
fprintf (stderr, \
"ERROR: %s %d: proc %d: " \
"failed malloc %ld bytes\n", \
FILE , LINE , Myid, \
(long) (S)); \
assert ( 0 ) ; \
} \
N_mallocs ++; \
N_allocated__bytes += ( S ) ; \
if (Next_alloc_idx == N_ALLOC_ENTRIES ) \
{ \
fprintf (stderr, \
"ERROR: %s %d: proc %d: " \
"Next_alloc_idx %d (max %d)\n", \
FILE , LINE , Myid, \
Next_alloc_idx, \
N_ALLOC_ENTRIES-l ) ; \
assert ( 0 ) ; \
} \
Allocs [Next_alloc_idx] .p = (P); \
Allocs [Next_alloc_idx] . len = (S); \
Next_alloc_idx ++; \
N_outstanding_allocated_bytes += ( S ) ; \
if (N_outstanding_allocated_bytes \
> N_allocated_bytes_high_water_mark) \
{ \
N_allocated_bytes_high_water_mark = \
N_outstanding_allocated_bytes ; \
} \
if ((unsigned long) (S) \
> Largest_single_mem_request) \
{ \
Largest_single_mem_request = (S); \
strcpy (Largest_single_mem_request_fname, FILE ); \
Largest_single_mem_request_line_no = LINE ; \
} \
}
#else
/* NO_MPI ! DEBUG */
#define PCT_MALLOC ( P , S, T) \
{ \
if (!((P) = (T *) malloc( (S) ) ) ) \
{ \
fprintf (stderr, \
"ERROR: %s %d: proc %d: " \
"failed malloc %ld bytes\n", \
FILE , LINE , Myid, \
(long) (S)); \
assert ( 0 ) ; \
} \
N_mallocs ++; \
N_allocated_bytes += (S); \
if ((unsigned long) (S) \
> Largest_single_mem_request) \
{ \
Largest_single_mem_request = (S); \
strcp (Largest_single_mem_request_fname, FILE ); \
Largest_single_mem_request__line_no = LINE ; \
} \
}
#endif
#else
#ifdef DEBUG
/* iNO_MPI DEBUG */
/* can use MPI_Abort() */
#define PCT_MALLOC(P, S, T) \
{ \
if ((S) <= 0) \
{ \
f rintf ( stderr , \
"ERROR: %s %d: proc d: " \
"malloc called for %ld bytes\n", \
FILE , LINE , Myid, \
(long) (S)); \ MPI_Abort(MPI_COMM_WORLD, ABORT_INVALID_ARG) ; \
} \ if (!((P) = (T. *) malloc ( (S) ) ) ) \
f rintf ( stderr ,
"ERROR: %s %d: proc %d: "
"failed malloc %ld bytes\n",
FILE , LINE , Myid,
(long) (S));
MPI_Abort ( MPI_COMM_WORLD , ABORT_INVALID_IDX ) ;
}
N__mallocs ++;
N_allocated_bytes += (S);
if (Next_alloc_idx == N_ALLOC_ENTRIES )
{
fprintf ( stderr ,
"ERROR: %s %d: proc %d: "
"Next_alloc_idx %d (max %d)\n",
FILE , LINE , Myid,
Next_alloc__idx,
N_ALLOC_ENTRIES-l ) ;
MPI_Abort ( MPI_COMM_WORLD , ABORT_INVALID_IDX ) ;
}
Allocs [Next_alloc_idx] .p = (P);
Allocs [Next_alloc_id ]. len = (S);
Next_alloc__idx ++;
N_outstanding_allocated_bytes += (S);
if (N_outstanding_allocated_bytes
> N_allocated_bytes_high_water_mark)
N_allocated_bytes_high_ ater_mark =
N_outstanding_allocated_bytes ;
}
if ((unsigned long) (S)
> Largest_single_mem_request) \
Largest_single_mem_request = (S); \
strcpy ( Largest_single_mem_request_fname , FILE ) ; Largest_single_mem_request_line_no = _ LINE
} \
}
#else
/* !NO_MPI ! DEBUG */
#define PCT_MALLOC ( P , S, T)
{
if (!((P) - (T *) malloc ( (S) ) ) )
{
f rintf ( stderr ,
"ERROR: %s %d: proc %d: "
"failed malloc %ld bytes\n",
FILE , LINE , Myid,
(long) (S));
MPI_Abort ( MPI_COMM_WORLD , ABORT_FAILED_MALLOC )
}
N_mallocs ++;
N_allocated_bytes += (S);
if ((unsigned long) (S) \
> Largest single mem request) \
{ \
Largest_single_mem_request = (S); \
strcpy (Largest_single_mem_request_fname, FILE );
Largest_single_mem_request_line_no = LINE ;
\
#endif
#endif
#ifdef NO_MPI
#ifdef DEBUG
/* NO_MPI DEBUG */
/* must use assert(O) instead of MPI_Abort() */
#define PCT_REALLOC(P, S, T) \
{ \ int aidx; \ if ((S) <= 0) \
{ \ fprintf (stderr, \
"ERROR: %s %d: proc %d: " \
"realloc called for %ld bytes\n", \ FILE , LINE , Myid, \
(long) (S)); \ assert ( 0 ) ; \
} \ for (aidx = 0; \ aidx < Next_alloc_idx \
SS Allocs [ aid ] .p ·= (P); \ aidx ++) \
\ if (aidx == Next_alloc_idx) \
{ \ fprintf ( stderr , \
"WARNING: %s %d: proc %d: " \
"free() could not find addr %p\n" , \ FILE , LINE , Myid, \
(void *) (P)); \
} \ N_outstanding_allocated_bytes -= Allocs [ aidx] . len; \ N_freed_bytes += Allocs [ aidx] . len; \
N_frees ++; \ if (!((P) = (T *) realloc((P),(S)))) \
{ \ fprintf ( stderr , \
"ERROR: s %d: proc %d: " \
"failed realloc %ld bytes\n", \ FILE , LINE , Myid, \
(long) (S)); \ assert ( 0 ) ; \
} \
Allocs [aidx] .p = (P); \
Allocs [aidx] . len = (S); \
N__reallocs ++; \
N_allocated_bytes += (S) \
N_outstanding_allocated_bytes += ( S ) ; \ if (N_outstanding__allocated_bytes \
> N_allocated_bytes_high_water__mark) \
{ \
N_allocated_bytes_high_water_mark = \
N_outstanding_allocated_bytes ,- \
} \ if ((unsigned long) (S) \
> Largest__single_mem_request ) \
{ \
Largest_single_mem_request = ( S ) ; \
strcpy (Largest_single_mem_jrequest_fname, FILE ); \
Largest_single_mem_request_line_no = LINE ; \
} \
}
#else
/* NO_MPI ! DEBUG */
#define PCT_REALLOC ( P , S, T) \
{ \
if (!((P) = (T *) realloc( (P) , (S) ) ) ) \
{ \
fprintf ( stderr, \
"ERROR: %s %d: proc %d: " \
"failed realloc %ld bytesVn", \
FILE , LINE , Myid, \
(long) (S)); \
asser ( 0 ) ; \
} \
N_reallocs ++; \
N_allocated_bytes += (S); \
if ((unsigned long) (S)
> Largest_single_mem_request ) \
{
Largest_single_mem_request = (S); \
strcpy (Largest_single_mem_request_fname, FILE ); \
Largest_single_mem_request_line__no = LINE ; \
} \
}
#endif
#else
#ifdef DEBUG
/* !NO_MPI DEBUG */
/* can use MPI_Abort() */
#define PCT_REALLOC(P, S, T) \
{ \ int aidx; \ if ((S) <= 0) \
{ \ fprintf ( stderr, \
"ERROR: %s %d: proc %d: " \
"realloc called for %ld bytes\n", \ FILE , LINE , Myid, \
(long) (S)); \ MPI_Abort(MPI_COMM_WORLD, ABORT_INVALID_ARG) ; \
} \ for ( aidx = 0 ; \ aidx < Next_alloc_idx \
&& Allocs [aidx] .p != (P) \ aidx ++) \
\ if (aidx == Next_alloc_idx) \
{ \ fprintf (stderr, \
"WARNING: %s %d: proc %d: " \
"free() could not find addr %p\n", \ FILE , LINE , Myid, \
(void *) (P) ) ; \
} \ N_outstanding_allocated__bytes -= Allocs [ aidx] . len \ N_freed_bytes += Allocs [ aidx] . len; \
N_frees ++; \ if ( ! ( (P) = (T *) realloc( (P) , (S) ) ) ) \
{ _ \ fprintf ( stderr, \
"ERROR: %s %d: proc %d: " \
"failed realloc %ld bytes\n", \ FILE , LINE , Myid, \
(long) (S)); \
MPI_Abort(MPI_COMM_WORLD, ABORT_FAILED_MALLOC ) ; \
} \
Allocs [ aidx] .p = (P); \
Allocs [ aidx] . len = (S); \
N_reallocs ++; \
N_allocated_bytes += ( S ) ; \
N_outstanding_allocated_bytes += (S); \
if (N_outstanding_allocated_bytes
> N_allocated_bytes_high_water_mark)
N_allocated_bytes_high_water_mark =
N_outstanding_allocated_bytes ;
}
if ((unsigned long) (S)
> Largest_single_mem_request ) \
Largest_single_mem_request = (S); \
strcpy (Largest_single_meiti_request_fname, FILE .);
Largest_single_mem_request_line_no = LINE ;
} \
}
#else
/* iNO_MPI ! DEBUG */
#define PCT_REALLOC(P, S, T)
{
if (!((P) = (T *) realloc( (P) , (S) ))
{
fprintf ( stderr ,
"ERROR: %s %d: proc %d: "
"failed realloc.%ld bytes\n" ,
FILE , LINE , Myid,
(long) (S));
MPI_Abort ( MPI_COMM_WORLD , ABORT_FAILED_MALLOC ) ;
}
N_reallocs ++;
N_allocated_bytes += (S);
if ((unsigned long) (S) \
> Largest_single_mem_request ) \
{
Largest_single_mem_request = (S); \
strcpy (Largest_single_mem_request_fname, FILE _)
Largest_single_mem_request_line_no = LINE ;
\
#endif
#endif
#ifdef DEBUG
#define PCT_FREE(P)
{
int aidx;
for ( aidx = 0 ;
aidx < Next_all idx
&& Allocs [aidx] Ί- (P)
aidx ++)
if (aidx == Next_alloc_ic )
{
f rintf ( stderr ,
"WARNING: %s %d: proc %d: "
"free() could not find addr ¾p\n" FILE , LINE , Myid,
(void *) (P));
}
N_outstanding_allocated bytes -= Allocs [ aidx] . len;
N_freed_b tes += Allocs [ aid ] . len ;
Allocs [ aidx] .p = NULL;
N_frees ++;
free( (P));
#else
#define PCT_FREE(P) free((P));
#endif
/ ******************* j
/* Data Structures */
/*******************/
// For space carving
// A high energy trace
#define HIGH_TRA.CE ( 1«0 )
// A low energy trace
#define LOW_TRACE (1«1)
// All energy levels turned on
#define ALL_TRACE (HIGH_TRACE | LOW_TRACE)
struct entry_exit_t
{
float entryjx, entry^jy, entry_z , exit x, exit y, exit_z ;
} ; struct stats
{
float mean?
float sse;
unsigned int n;
#ifdef DEBUG
/* for sanity checking only, not needed for stats */
float min, max;
#endif
} ;
/* partitioning internal exit detector plane into regions
const int REGION_ROWS = 30;
const int REGION COLS = 30;
const int ST T_DELTA_X 0
const int STAT_DELTA_Y 1
const int STAT_DELTA_ENERGY 2
const int S AT_LAST_UNUSED 3
const int STAT_NSTATS = STAT_LAST_UNUSED ;
/* linearized layout - from fastest changing idx -> slowest changing idx /* x,y, energy {0,1,2} > REGION COLS -> > RREEGGIIOONN_RROOWWSS -> N FIRING ANGLES */ const int N STAT CELLS ( STAT_NST TS
* REGION_RO S * REGION_COLS
* N_FIRING_ANGLES ) ;
const int STAT_CELL_IDX_ANGLE_STRIDE = ( S AT_ STATS
* REGION ROWS * REGION COLS ) ; const int STAT_CELL_IDX_ROW_STRIDE ( S AT_NSTATS * REGION_COLS ) ;
const int STAT CELL IDX COL STRIDE STAT NSTATS;
const float MAX SIGMA 3.0;
/* Voxel level debugging */
/* This is to help isolate calculations that */
/* work with a specific voxel in the data space. */
/* The particular voxel index to be examined. The constants below
refer to the plane, row, and column number respectively. */
#ifdef DEBUG_VOXEL
#define DEBUG_VOXEL_IDX (225 * (MAX_ROWS*MAX_COLS ) + 201 * MAX_COLS + 42) #endif
#ifndef NO MPI
/* MPI Profiling */
/* MPI functions we profile */
const int MPI_INIT_IDX = 0 ;
const int MPI_COMM_SIZE_IDX = 1;
const int MPI_COMM_RAN _IDX = 2;
const int MPI_ADDRESS_IDX = 3;
const int MPI_TYPE_STRUCT_IDX = 4 ;
const int PI_TYPE_COMMIT_IDX = 5 ;
const int MPI_OP_CREATE_IDX = 6;
const int MPI_OP_FREE_IDX = 7;
/* put msg passing ones at end for pretty report */
const int MPI_ALLREDUCE_IDX = 8 ;
const int MPI_BCAST_IDX = 9;
const int MPI_SEND_IDX = 10;
const int MPI_RECV_IDX = 11;
const int MPI_LASTUNUSED_IDX = 12;
const int Nfuncs = MPI_LASTUNUSED_IDX;
struct funcjprofile
{
char func_name [50];
unsigned call_count;
double total_time_secs ;
int does_communicate;
unsigned long total_bytes;
}; /* end struct func_profile */
EXTERN struct func_profile func_profiles [Nfuncs ] ;
#endif
/* Input/Output data location */
/* PADS */
/* const char InputDirBase [ ] = " /home/karonis/38M/ Input " ; */
const char InputDirBase [ ] = "/gpfs/pads/pro ects/CI-CCR000048/karonis/Input" ;
/* const char InputDirBase [ ] = "/home/karonis/pCT/" ; */
/* const char OutputDirBase [ ] = "/home/karonis/38M/Output" ; */
const char OutputDirBase [ ] = "/gpfs/pads/projects/CI-CCR000048/karonis" ;
/* const char OutputDirBase [ ] = "/autonfs/home/karonis/pCT/Code/" ; */
/* Mac */
/* const char InputDirBase [ ] = " /Volumes/Software/pCT/ " ; */
/* const char OutputDirBase [ ] == " /Volumes /Software/pCT/Code/ ' */
/* laptop */
/* const char InputDirBase [ ] = "/Users/karonis/pCT/" ; */
/* const char OutputDirBase [ ] == "" //UUsseerrss//kkaarroonniiss//ppCCTT//CCooddee// "" ;; */
/* hopper.cs.niu.edu */
/* const char InputDirBase [ ] = /home/hopper/research/pet/data/SimProjections /Input" ;
/
/* const char OutputDirBase [ ] = "/home/turing/karonis/pCT/Code/" ; */
#endif /* PCT_PCT_H */
#define MAIN
#include "pct.h"
/************************** /
/* LOCAL GLOBAL VARIABLES */
/* for linear solver loop */
const int Niters = 10;
static int Iterations = 0;
static char InputDir [ 256 ] , InputFnamePrefix[50] , InputFnameSuffix[ 50 ] ;
static unsigned TotalHistories ; // relevant for foreman only
const float MB = 1024.0*1024.0;
const float GB = MB*1024.0;
/* Timer Indexes and count */
const int SetupTimeldx = 0 ;
const int IEDTimeldx = 1 ;
const int VolumeBitmaskTimeldx = 2;
const int TouchedPhantomTimeldx = 3;
const int BinnedCleanupIdx = 4;
const int AdjustlEDIdx = 5;
const int MLPTimeldx = 6;
const int LinearSolverTimeldx = 7;
const int LastUnusedldx = 8;
const int Ntimers = LastUnusedldx;
suseconds_t Timers [Ntimers ] = {0};
unsigned long StartNhistories [Ntimers ] ;
const char *TimerNames [Ntimers ] =
{
"Set up",
" IED" ,
"Volume Bitmask" ,
"Connect to Phantom",
"Binned cleanup",
"Adjust IED",
"MLP " ,
"Linear Solver"
} ;
/*******************/
/* LOCAL FUNCTIONS */ static void accumulate_stats_val ( float *m,
float *s,
#ifdef DEBUG
float *min,
float *max,
#endif
unsigned int *n,
float val ) ;
static void compute_and_bcast_vbm( int my_nhistories ,
unsigned char *history_bitmask,
float *input_floats ,
unsigned short *input_ushorts r
int volume_bitmask_len,
char *volume_bitmask) ;
static void compute_nhistories ( int *my_nhistories ,
int *histories_per_worker , // relevant foreman only int n_input_histories ,
int max_histories_per_foreman_or_worker ,
int *must_fill_foreman) ;
static float compute_stat_val ( float *input_floats ,
unsigned short *input_ushorts ,
int h,
int stat_displ);
static int count_histories_in__bitmask( int my_nhistories ,
unsigned char *history_bitmask) ;
static void fill_buffers_from_files ( float *input_floats ,
unsigned short *input_ushorts ,
int *deg,
FILE **fp,
int *histories_read_from_this_file, int max_histories_per_input_file , int nhistories__buffsize,
int *my_nhistories ,
int imist_fill);
static void find_entry_exit_from__vbm_or_remove (int nhistories,
unsigned char *history_bitmask, float *input_floats ,
unsigned short *input_ushorts , struct entry_exit_t *entry_exit, int volume_bitmask_len, char *volume_bitmask) ;
static int get_stat_cell_idx ( float *input_floats ,
unsigned short *input_ushorts ,
int h,
int stat_displ);
static FILE *open_input_file ( int degree);
static void parse_command_line_args ( int argc ,
char *argv[ ] ,
int *n_input_histories ,
int *max_histories_per_foreman_or_worker , int *max_histories_per_input_file , float *linear_solver_relaxation) ;
static void read_and_distribute input ( int max_histories_per_foreman_or_worker , int max_histories_per_input_file, int *my_nhistories ,
int must_fill_foreman,
float *input_floats ,
unsigned short *input_ushorts ) ;
static void remove_based_on_binned_stat ( int nhistories,
unsigned char *history_bitmask, float *input_floats ,
unsigned short *input_ushorts , struct stats s [ ] ,
float sigma_cutoff [ ] ,
int stat_displ ) ;
static void remove_histories_from_binned_stats ( int my_nhistories ,
unsigned char *history_bitmask, float *input_floats ,
unsigned short *input_ushorts ) ; static int test_solution_for_done ( float *solution) ;
static inline void record_start_nhistories (unsigned long nhistories,
const int idx,
unsigned long my_nhistories )
{
StartNhistoriesfidx] = nhistories;
printf("%d: %18s: starting nhistories %ld - %% my N histories %5.1f %%\n", Myid, TimerNames [idx] , StartNhistories [ idx] ,
(my_nhistories
? ((float) nhistories ) *100.0/ (( float) my_nhistories )
: 0.0 ));
} /* end record__starting_nhistories ( ) */
// assumes after. tv_sec >= before . tv_sec
//
// there are only two cases ...
// Case I: afte .tv_usec >= before . tv_usec (no matter how many sees passed)
// elapsed usees = ( af er . v_sec - before . tv_sec ) *1 , 000 , 000
// + ( after . tv_usec - before . tv_usec )
//
// Case II: afte .tv_usec < before . tv_usec (in this case after. tv_sec must be
// greater than before.tv_sec)
// elapsed usees = ( after . v_sec - before . tv_sec ) *1 , 000 , 000
// - (before.tv_usec - after . tv__usec )
// = (after .tv_sec - before . tv_sec ) * 1 , 000 , 000
// + after . tv_usec - before.tv usec
//
// BOTH CASES ARE THE SAME
static inline void record_time ( struct timeval Sbefore,
struct timeval safter,
const int idx)
{
Timers[idx] = ( after . tv_sec - before . tv__sec ) *1000000
+ after .tv_usec - before . tv_usec ;
} /* end record_time( ) */
#ifndef NO_MPI
/* need to write this because MPI_LAND in */
/* MPI_Reduce is _not_ defined for MPI_CHAR */
/* must be externally visable for MPI */
void andem(void *operand_vec ,
void *operand_result_vec,
int *len,
MPI_Datatype *dtype )
{
char *op = (char *) operand_vec;
char *op_res = (char *) operand_result_vec ;
int i;
for (i = 0; i < *len; i ++)
op_res[i] &= opfi];
return;
} /* end andem( ) */
/* operator for MPI_Allreduce ( ) */
/*
"Updating Formulae and a Pairwise Algorithm for Computer Sample Variances," T.F. Chan, G.H. Golub, and R.J. LeVeque,
Technical Report STAN-CS-79-773 , Department of Computer Science,
Stanford University, 1979,
ftp: //reports . stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773 -pdf
Given two sample sets where the mean and SSE have been computed
for each; <M_k,l S_k,l N_l> and <M_k,2 S_k,2 N_2>
we can compute the combined mean and SSE as follows
N = N_l + N_2
d = M_k,2 - M_k,l
combined mean = M_k,l + d*N_2/N
combined SSE = S_k, 1 + S_k, 2 + d~2*N_l*N_2/N
*/
void combine_stats ( oid *operand_vec ,
void *operand_result_vec ,
int *len,
MPI_Datatype *dtype)
{
struct stats *op = (struct stats *) operand_vec;
struct stats *op_res = (struct stats *) operand_result_vec ;
int i;
unsigned int combined_n;
float delta;
for (i = 0; i < *len; i ++)
{
if (op[i].n > 0 && op_res[i].n > 0)
{
combined_n = op[i].n + op_res[i].n;
delta = o [i].mean - op_res [ i] .mean;
op__res[i] .mean += (delta * op[i].n / combined_n) ;
op_res [i ] . sse += (op[i].sse +
(delta * delta * op[i].n * op_res[i].n / combined_n)) op_res[i].n = combined__n;
#ifdef DEBUG
if (op_res[i] .min > op[i].min) op_res [ i ] .min = o [ i ] .min ; if (op_res[i] .max < op[i].max) op_res [ i] .max = op[i].max;
#endif
}
else if (op[i].n > 0 && op_res[i].n == 0)
{
op_res [i].n = op[i].n;
op_res [i] .mean = op[i].mean;
op_res [ i] . sse = op [i]. sse;
#ifdef DEBUG
op_res [ i] .min = o [i].min;
op_res [ i] .max = op[i].max;
#endif
} /* endif */
} /* endfor */
return;
} /* end combine_stats ( ) */
#endif
/**********************/
/* EXTERNAL FUNCTIONS */
/ * * * * * A- * * * * * -A * * A A * * A * A A /
/* IED func */
//extern void compute_IED ( int my_nhistories , float *input_floats , float *IED}; #include "ied.h"
// intersect2. cc
extern bool compute_entry_exit_from_vbm( int h,
float *input_floats ,
unsigned short *input_ushorts ,
#ifdef DEBUG
int volume_bitmask_len,
#endif
char *volume_bitmask,
struct entry_exit_t *entry_exit) ;
extern void compute_initial_vbm( int my_nhistories ,
unsigned char *history_bitmask,
float *input_floats ,
unsigned short *input_ushorts ,
int volume_bitmask_len ,
char *volume__bitmask) ;
extern void morph__clean( int volume_ itmask_len , char *volume_bitmask ) ;
/* MLP func */
extern void compute_mlp( int my_nhistories ,
unsigned char *history__bitmask,
float *input_floats r
unsigned short *input__ushorts ,
struct entry_exit_t *entry_exit,
int volume_bitmask_len,
char *volume_bitmask,
MLP_m *ml ) ;
/******************/
/* Linear solvers */
/* string averaging methods */
extern void carp(MLP__m *, float x[], float b[],
int count [ ] , unsigned char history_bitmask[ ] ,
int n_rows , int n_cols, float relax) ;
extern void sap(MLP_m *, float x[], float b[],
int count[], unsigned char history_bitmask[ ] ,
int n ro s , int n cols, float relax);
typedef void ( * linear_solver_func ) (MLP_m *,
float x[ ] ,
float b[ ] ,
int count [ ] ,
unsigned char history_bitmask[ ] , int n_rows ,
int n_cols,
float relax) ;
const int CARP_IDX = 0;
const int SAP_IDX = 1;
linear_solver_func linear_solvers [ ] = {carp, sap};
int main (int argc, char *argv[])
{
int i ;
float *input_floats ;
unsigned short *input_ushorts ;
int my_nhistories ;
int n_input_histories , max_histories_per_foreman_or_ orker ;
int histories_per_worker ; // relevant for foreman only int max_histories_jper__input_file ; // relevant for foreman only unsigned long linear_solver_iterations = 0 ; / / relevant for foreman only
#ifdef DEBUG
unsigned long nvoxels_inside_phantom = 0 // for foreman only unsigned long nvoxels_outside_phantom = 0 // for foreman only unsigned long nvoxels_inside_phantom_mlp_touched 0; // for foreman only unsigned long nvoxels_outside_phantom_mlp touched 0; // for foreman only unsigned long nvoxels_inside_phantom_not touched 0; // for foreman only unsigned long nvoxels_outside_phantom__not touched 0; // for foreman only
#endif
unsigned long start_n_hist;
int must_fill_foreman;
/* int volume_bitmask_len = MAX_ROWS*MAX COLS*MAX PLANES; */
int volume_bitmask_len = NVOXELS;
char *volume_bitmask
unsigned char *history_bitmask;
int history_bitmask_len;
float *IED;
struct entry_exit_t *entry_exit ;
MLP_m *mlp;
float linear_solver_relaxation;
float * solution;
int *count;
int linear solver idx = CARP IDX;
struct timeval before, after;
time_t total_time_start, total_time_secs ;
total_time_start = time(NULL); // Mark the start time
#ifdef NOJYIPI
Nprocs = 1 ;
Myid = 0;
#else
MPI_Init ( &argc , Sargv) ;
MPI_Comm_size (MPI_COMM_WORLD, SNprocs ) ;
MPI_Comm_rank(MPI_COMM_WORLD, &Myid) ;
#endif
J
I* Set up */
/********** /
record_start_nhistories ( 0 , SetupTimeldx, 0);
gettimeofday ( &before , NULL);
{
N_mallocs = N_reallocs = N_allocated_bytes
= Largest_single_mem_reguest = 0 ;
#ifdef DEBUG
N_frees = Next_alloc_idx = N_freed_bytes =
N_outstanding_allocated_bytes = N_allocated_bytes_high_water_mark = 0 ;
#endif
if (Myid == ForemanRank) printf("%d: nprocs %d\n", Myid, Nprocs);
if (Myid == ForemanRank)
{
/* recording current time to embed in output fnames */
time_t now = time (NULL);
struct tm *tp = localtime ( Snow) ;
sprintf (Nowstring, " %4d-%02d-%02d_%02d: %02d: %02d" ,
1900+tp->tm_jyear ,
l+tp->tm_mon ,
tp->tm_mday ,
l+tp->tm_hour ,
l+tp->tm_min ,
l+tp->tm_sec ) ;
} /* endif */
#ifndef NO_MPI
MPI_Bcast ( SNowstring, // buff
100, // count
MPI_CHAR, // data type
ForemanRank, // root
MPI_COMM_WORLD) ;
#endif
parse_command_line_args ( argc ,
argv,
&n_input_histories ,
£max_histories_per_foreman_or_worker ,
Smax_histories_per_input_file,
&linear_solver_relaxation ) ;
sprintf (Relaxparamstring, "relaxparaiti-%4.2f " , linear_solver_relaxation) ;
compute_nhistories ( &my_nhistories ,
Shistories_per_worker ,
n_input_histories ,
max_his ories_per_foreman_or_worker ,
&must_fill_foreman) ;
/* at this point: */
/* - for all work procs: my_nhistories == histories_per_worker */
/* - for foreman: my_nhistories >= histories_per_worker */
mlp = new MLP_m(my_nhistories ) ;
/* fprintf ( stderr , "%d: after new MLP_m: :MLP_m( ) : curr_mem_block_idx %d n_mem_blocks %d \n", Myid, mlp->get_curr_mem_block_idx( ) , mlp->get_n_mem_blocks ( ) ) ; */ if (Myid == ForemanRank)
printf("%d: Input: %s/%s*%s\n".
Myid, InputDir, InputFnamePrefix, InputFnameSuffix) ;
PCT_MALLOC ( input_floats ,
my_nhistories
*INPUT_FLOAT_STRIDE
*sizeof (float) ,
float)
PCT_ ALLOC ( input_ushorts ,
my_nhis ories
*INPUT_USHORT_STRIDE
*sizeof (unsigned short),
unsigned short)
read_and_distribute_input ( histories per worker,
max_histories_per_input_file ,
&my_nhistories ,
must^fill__foreman,
input_floats ,
input_ushorts ) ;
if (Myid == ForemanRank)
printf("¾d: my_nhistories %d\n", Myid, my_nhistories ) ;
history_bitmask_len = (my_nhistories/8 ) + (my_nhistories % 8 ? 1 : 0);
PCT_MALLOC(history_bitmask, history_bitmask_len, unsigned char)
/* initially claim all histories are used in MLP */
memset ( history_bitmask, Oxff, history_bitmask_len) ;
}
gettimeofday (&after, NULL);
record_time(before, after, SetupTimeldx) ;
/* f rintf ( stderr, "%d: after setup: curr_mem_block_idx %d n_mem_blocks %d\n"r Myid, ml p->get_curr_mem_block_id ( ) , mlp->get_n_mem_blocks ( ) ) ; */
/*********************************** * /
/* gentlemen, start your engines .. * /
/* ... computing starts HERE * /
/A*************************** * * A * * * * * /
/* compute IED for _all_ histories */
record_start_nhistories (my_nhistories , IEDTimeldx, my_nhistories ) ;
gettimeofday ( &before , NULL ) ;
/* if (0) */
{
PCT_MALLOC ( IED , my_nhistories *sizeof ( float ) , float)
compute_IED(my_nhistories , input__floats , IED);
}
gettimeofday ( safter, NULL);
record_time( before, after, IEDTimeldx);
/* f rintf ( stderr , "%d: after IED: curr_mem_block_idx %d n_mem_blocks %d\n" , Myid, mlp- >get_curr_mem_block_idx( ) , mlp->get_n_mem_blocks ( ) ) ; */
/*******************/
start_n_hist = count_histories_in_bitmask(my_nhistories , history__bitmask) ;
record_start_nhistories ( start_n_hist , VolumeBitmaskTimeldx, my_nhistories ) ;
gettimeofday ( &before , NULL ) ;
/* if (0) */
{
PCT_MALLOC(volume__bitmask, volume_bititiask_len , char)
compute_and_bcast_vbm(my_nhistories ,
history_bitmask,
input_floats ,
input_ushorts ,
vo1ume_bitmask_len,
volume_bitmask) ;
}
gettimeofday (&after, NULL);
record_time( before, after, VolumeBitmaskTimeldx) ;
/* fprintf ( stderr, "¾d: after VBM: curr_mem_block_idx %d n_mem_blocks %d\n", Myid, mlp- >get_curr_mem_block_idx( ) , mlp->get_n_mem_blocks ( ) ) ; */
if (0)
{
if (Myid == ForemanRank)
{
FILE *fp;
char fname[300];
sprintf ( fname, " s/volume_bitmask. %s . len . %d" ,
OutputDirBase , No string, volume_bitmask_len ) ;
if ((fp = fopen(fname, "w" ) ) != NULL)
{
for (i = 0; i < volume_bitmask_len; i ++)
fprintf (fp, "%d\n", volume bitmaskfi] ) ;
fclose ( f ) ;
}
else
{
fprintf ( stderr , "ERROR: foreman could not fopen >%s<\n", fname); } /* endif */
/* printf("%d: foreman should print solution here\n", Myid); */
} /* endif */
}
/* remove remaining bad histories before MLP calculation */
/***************************■***************************·***/
PCT_MALLOC ( entry_exit ,
my_nhistories*sizeof (struct entry_exit_t ) ,
struct entry_exit_t ) ;
/* removing histories that don't touch phantom */
start_n_hist = count__histories_in_bitmask (my_nhistories , history_bitmask ) ;
record_start_nhistories ( start__n__hist , TouchedPhantomTimeldx, my_nhi stories ) ;
gettimeofday( Sbefore, NULL);
/* if (0) */
{
/* populates entry__exit */
find_entry_exit_from_vbm_or_remove (my_nhistories ,
history_bitmask,
input floats ,
input_ushorts ,
entry_exit,
volume_bitmask_le ,
volume_bitmask) ;
}
gettimeofday(&after, NULL);
record_time(before, after, TouchedPhantomTimeldx);
/* removing histories that are statistical outliers */
start_n_hist = count_histories_in_bitmask (my_nhistories , history_bitmask ) ;
record_start_nhistories ( start_n_his , BinnedCleanupIdx, my_nhistories ) ;
gettimeofday ( &before , NULL ) ;
/* if (0) */
{
/* printf("%d: before remove_histories_from_binned_stats ( ) \n" , Myid); */ remove_histories_from_binned__stats (my_nhistories ,
history_bitmask,
input_floats ,
input_ushorts ) ;
/* printf("%d: after remove_histories_from_binned_stats ( ) \n" , Myid); */
}
gettimeofday ( safter, NULL);
record_time (before, after, BinnedCleanupIdx);
/* fprintf (stderr, "%d: after remove remaining: curr_mem_block_idx d n_mem_blocks %d\ ", Myid, mlp->get_curr_mem_block_idx( ) , mlp->get_n_mem_blocks ( ) ) ; */
/
/* Adjust IED */
/***************** /
start_n_hist = count_histories_in_bitmask(my_nhistories , history_bitmask) ;
record_start_nhistories ( start_n_hist, AdjustlEDIdx, my_nhistories ) ;
gettimeofday ( &before , NULL ) ;
/* if (0) */
{
adjust_IED (my_nhistories , history_bitmask, input_ushorts , input_floats ,
IED , entry_exit ) ;
}
gettimeofday (&after , NULL);
record_time( before, after, AdjustlEDIdx);
/* fprintf (stderr, "%d: after Adjust IED: curr_mem_block_idx %d n_mem_blocks %d\n" Myid, mlp->get_curr_mem__block_idx( ) , mlp->get_n_mem_blocks ( ) ) ; */
/* MLP - 'A' in Ax = b */
***********************/
start_n_hist = count_histories_in_bitmask(my_nhistories, history_bitmask) ;
record_start_nhistories ( start_n_hist , MLPTimeldx, my_nhistories ) ;
gettimeofday (&before, NULL);
/* if (0) */
{
mlp->set_my__nhistories_for_MLP ( count_histories_in_bitmask(my__nhistories ,
history__bitmask) ) ;
compute_mlp(my_nhistories ,
his ory_bitmask,
input_floats ,
input_ushorts ,
entry_exit ,
volume_bitmask_len ,
volume_bitmask,
mlp) ;
}
gettimeofday(&after, NULL);
record_time( before, after, MLPTimeldx);
/* fprintf ( stderr , "%d: after MLP: curr_mem_block_idx %d n_mem_blocks %d\n" , Myid, mlp >get_curr_mem_block_idx( ) , mlp->get_n_mem_blocks ( ) ) ; */
PCT_FREE(entry_exit) ;
PCT_FREE ( input_ushorts ) ;
PCT_FREE ( input_floats ) ;
I* solve for ' ' in Ax = b */
/***************************/
start_n_hist = count_histories_in_bitmask(my__nhistories, history_bitmask) ;
record_start_nhistories ( start_n_hist , LinearSolverTimeldx, my_nhistories ) ; gettimeofda ( &before , NULL ) ;
/* if (0) */
{
char done = 0 ;
#ifndef NO_MPI
int *dummy_i_recv_buff = (int *) NULL; // needed for bug in MVAPICH2 float *dummy_f_recv_buff = (float *) NULL; // needed for bug in MVAPICH2
#endif
PCT_MALLOC( solution, NVOXELS*sizeof ( float ) , float)
PCT_MALLOC( count, NVOXELS*sizeof ( int ) , int)
/* seed solution with zeros */
/* bzero( solution, NVOXELS*sizeof ( float )) ; */
/* seed solution in phantom with ones and outside phantom 0.001 */ for (i = 0; i < NVOXELS; i ++)
{
if (volume_bitmask[ i ] )
{
solution[ i] = INITIAL_SOLUTION_INSIDE_PHANTOM;
#ifdef DEBUG
if (M id == ForemanRank)
nvoxels_inside_phantom ++;
#endif
}
else
{
SOlution[i] = INITIAL_SOLUTION_OUTSIDE_PHANTOM;
#ifdef DEBUG
if (Myid == ForemanRank)
nvoxels_outside_phantom ++;
#endif
} /* endif */
} /* endfor */
while ( ! done )
{
linear_solver_iterations ++;
/* printf("%d: starting linear solver iteration %ld\n", */
/* Myid, linear_solver_iterations ) ; */
bzerojoount, NVOXELS*sizeof (int) ) ;
/* solve for ' x ' in Ax = b */
linear_solvers [ linear_solver__idx] (mlp, // A
solution, // x
IED, // b
count ,
history_bitmask,
itiy__nhistories ,
NVOXELS ,
linear_solver_relaxation ) ;
#ifndef NO_MPI
MPI_Reduce( (Myid == ForemanRank
? MPI_IN_PLACE // res clobbers oprnd in root
: solution), // operand for non-root
(Myid == ForemanRank
? solution // result, relevant at root only
: dummy_f_recv__buff ) ,
NVOXELS, // count
MPI_FLOA , // data type
MPI_SUM, // operator
ForemanRank, // root
MPI_COMM_WORLD) ;
MPI_Reduce ( (Myid == ForemanRank
? MPI_IN_PLACE // res clobbers oprnd in
: count) , // operand for non-root
(Myid == ForemanRank
? count // result, relevant at root only
: dummy_i_recv_buff ) ,
NVOXELS, // count
MPI_INT, // data type
MPI_SUM, // operator
ForemanRank, // root
MPI_COMM_WORLD) ;
#endif
(Myid == ForemanRank)
/* NICK: not sure about count [i] == 0 here
/* if that's an error or not :-( if ( linear_solver_iterations == 1)
{
/* volume_bitmask[ ] and count[] should be the same */ /* for all iterations, so suffices to do this only */ /* once and for the first iteration */ for (i = 0; i < NVOXELS; i ++)
{
if ( count [i] > 0)
{
/* this voxel touched in MLP by >0 histories */ if (volume_bitmask[ i ] )
{
nvoxels_inside_phantom_mlp_touched ++
}
else
{
nvoxels_outside_phantom_mlp_touched ++;
} /* endif */
}
else
{
/* this voxel not touched in MLP by any histories * if (volume_bitmask[ i ] )
{
nvoxels_inside_phantom_not_touched ++ ;
}
else
{
nvoxels_outside_phantom_not_touched ++;
} /* endif */
} /* endif */
} /* endfor */
} /* endif */
for (i = 0; i < NVOXELS; i ++)
{
if ( count [i] > 0)
solution[i] /= count [i];
else
{
/* count[i] == 0 means that none of the histories, */ /* from any proc, touched this voxel_idx=i ... */ /* ... however, to make the MPI_Reduce sum for */ /* count and solution_vector work easily, each proc */ /* (inside linear solver) set their local copy of */
/* solution vectorfi] = 0 for those voxels=i that */
/* It determined was not touched by any of _its */
/* histories ... the result of all this is that */
/* if _none_ of the histories touched a voxel then */
/* its summed value in the foreman is now 0.0 and */
/* it needs to be set back to the initialization */
/* guess based on whether we think the voxel is in */
/* the phantom or not */ if (volume_bitmask[i] )
{
Solution[i] = INITIAL_SOLUTION_INSIDE_PHANTOM;
}
else
{
solution[i] = INITIAL_SOLUTION_OUTSIDE_PHA TOM
} /* endif */
} /* endif */
endfor */
done = test_solution_for_done { solution) ;
/* if (0) */
{
if ( ! done )
{
FILE *fp;
char fname[300];
sprintf ( fname, " %s/solution_vector . %s.%s.%021d" ,
OutputDirBase ,
Nowstring,
Relaxparamstring ,
linear_solver_iterations ) ;
if ((fp = fopen (fname, "w" ) ) != NULL)
{
fprintf(fp, "%d %d %d\n" ,
MAX_COLS , MAX_ROWS , MAX_PLANES ) ;
for (i = 0; i < NVOXELS; i ++)
fprintf(fp, " f\n", solution[ i] ) ;
fclose ( fp ) ;
}
else
{
f rintf ( stderr,
"ERROR: foreman could not fopen >%s<\n", fname ) ; } /* endif */
} /* endif */
} /* endif */
} /* endif */
#ifndef NO__MPI
// foreman beast done
MPI_Bcast( &done, // buff
1, // count
MPI_CHAR, // data type
ForemanRank, // root
MPI_COMM_WORLD) ;
( ! done )
foreman beast done
I_Bcast ( solution, // buff
NVOXELS , // count
MPI FLOAT, // data type
ForemanRank, // root
MPI_COMM_WORLD) ;
} /* endif */
#endif
} /* endwhile */
}
gettimeofday ( &after , NULL);
record_time( before, after, LinearSolverTimeldx) ;
/* fprintf ( stderr , "%d: after linsolv: curr_mem_block_idx %d n_mem_blocks %d\n". My mlp->get_curr_mem__block_idx( ) , mlp->get_n_mem_blocks ( ) ) ; */
PCT_FREE (volume_bitmask) ;
PCT_FREE ( count ) ;
PCT_FREE(history_bitmask) ;
PCT_FREE ( IED ) ;
/* printing solution */
/* if (0) */
{
if (Myid == ForemanRank)
{
FILE *fp;
char fname [ 300 ] ;
sprintf ( fname , " %s/solution_vector . %s . %s . final" ,
OutputDirBase, Nowstring, Relaxparamstring) ;
if ((fp = fopen (fname, "w" ) ) != NULL)
{
fprintf (fp, "%d %d %d\n",
MAX_COLS, MAX_R0 S, MAX_PLANES);
for (i = 0; i < NVOXELS; i ++)
fprintf (fp, "%f\n", solution[i] ) ;
fclose ( f )
}
else
{
fprintf (stderr, "ERROR: foreman could not fopen >%s<\n", fname ) ; } /* endif */
/* printf("%d: foreman should print solution here\n", Myid); */
} /* endif */
}
PCT_FREE ( solution ) ;
/* print summary stats */
total_time_secs = time (NULL) - total_time_start;
/* Time */
printf ( "\n" ) ;
printf("%d: Performance SummaryXn", Myid);
#ifdef OPT
printf ("%d: Build Version: OptimizedXn" , Myid);
#endif
#ifdef NOOPT
printf ("%d: Build Version: Not optimized\n" , Myid);
#endif
#ifdef DEBUG
printf ("%d: Build Version: DebugXn", Myid);
#endif
printf ("%d: Input: %s/%s*%s\n",
Myid, InputDir, InputFnamePrefix, InputFnameSuffix) ;
if (Myid == ForemanRank)
{
printf ("%d: Output filename timestamp: %s\n", Myid, Nowstring);
if (max_histories_per_input_file
== UNSPECIFIED_MAX_NINPUT_HISTORIES_PER_FILE )
printf("%d: Max Hist per File: unspecified\n" , Myid);
else
printf("%d: Max Hist per File: %d\n",
Myid, max_histories_per_input_file) ;
printf("%d: Total N histories: %d\n", Myid, TotalHistories ) ;
switch ( linear_solver_idx )
{
case CARP_IDX:
printf("%d: Linear Solver Method: CARP\n", Myid);
break;
case SAP_IDX:
printf("%d: Linear Solver Method: SAP\n" , Myid);
break;
default :
printf("%d: Linear Solver Method: unknown :-(\n", Myid); break;
} /* end switch( ) */
printf("%d: Linear Solver Relaxation: %f\n",
Myid, linear_solver_relaxation) ;
printf("%d: Linear Solver Iterations: %ld\n",
Myid, linear_solver_iterations ) ;
} /* endif */
printf("%d: My N histories: %d\n" ,
Myid, my_nhistories ) ;
#ifdef HISTOGRAM
printf("¾d: GENERATED HISTOGRAM DATA\
#endif
/* if (0) */
{
printf("%d: — Timers \n", Myid);
for (i = 0; i < Ntimers; i ++)
{
printf("%d: %18s: usees: %ld (%ld sees) nhistories %ld\n" ,
Myid, TimerNames [ i ] , (long) Timers [i],
(long) Timers[i]/1000000,
StartNhistories [ i ] ) ;
if (StartNhistories [i] != 0)
printf ( " %d : 18s usecs/history %ld\n",
Myid, " ",
( long ) Timers [ i ] /StartNhistories [ i ] ) ;
} /* endfor */
printf ("%d: —\n" , Myid);
printf ("%d: Total execution time (sees): %ld\n" , Myid, total_time_secs ) ; printf (" %d : \n" , Myid);
/* Memory */
/* if (0) */
{
unsigned long nbytes_allocd_for_voxel_idxs =
mlp->get_nbytes_allocd_for_voxel_idxs ( ) ; unsigned long n_histories_for_mlp = mlp->get_my_nhistories_for_MLP ( ) ; unsigned long n_recorded_voxel_idxs = mlp->get_n_recorded__voxel_idxs ( ) ; unsigned long nbytes_for_recorded_voxel_idxs =
sizeof ( unsigned int ) *n_recorded_voxel_idxs ; int n_allocd_blocks = mlp->get_n_allocd_blocks ( ) ;
unsigned int n_bytes_block;
printf ( " %d: Memory Summary\n" , Myid) ;
printf ("%d: — mem for MLP voxel idx' sXn" , Myid);
printf ("%d: N bytes allocd for voxel idxs: "
"%ld (%7.1f MB %4.1f GB ) \n" ,
Myid, nbytes_allocd_for__voxel_idxs ,
( ( float ) nbytes_allocd_for_voxel_idxs ) /MB ,
( ( float ) nbytes_allocd_for_voxel_idxs ) /GB ) ;
printf ("%d: N bytes used for voxel idxs: "
"%ld (%7.1f MB %4.1f GB ) \n" ,
Myid, nbytes_for_recorded_voxel_idxs ,
( ( float) nbytes_for_recorded_voxel_idxs ) /MB,
( ( floa ) nbytes_for_recorded_voxel__idxs ) /GB ) ;
printf ("%d: N bytes unused for voxel idxs: "
"%ld (%7.1f MB %4.1f GB ) \n" ,
Myid, nbytes_allocd_for_voxel_idxs-nbytes_for_recorded_voxel_idxs , ( ( float) (nbytes_allocd_for_voxel_idxs-nbytes_for_recorded_voxel_idxs ) ) /MB, ( ( float) (nbytes_allocd_for_voxel_idxs-nbytes_for_recorded_voxel_idxs ) ) /GB) ; printf ("%d: Percent alloc used for voxel idxs: %4.1f %% \n",
Myid,
(nbytes_allocd_for_voxel_idxs != 0
? 100.0* (( float) nbytes_for_recorded_voxel_idxs )
/( (float) nbytes_allocd_for_voxel_idxs )
: 0.0));
printf("%d: — counts for MLP voxel idx's\n", Myid);
printf("%d: N recorded voxel idxs: %ld\n", Myid, n_recorded_voxel_idxs ) ; printf ("%d: N histories for MLP: %ld\n" , Myid, n_histories_for_mlp ) ; printf ("%d: Avg rec v_idxs/hist: %ld\n" ,
Myid,
(n_histories_for_mlp != 0
? n_recorded_voxel_idxs/n_histories_for_mlp
: 0L));
#ifdef DEBUG
if (Myid == ForemanRank )
{
printf ("%d: — Voxel space\n", Myid);
printf("%d: N voxels inside phantom: %ld (%4.1f %%)\n", Myid, nvoxels_inside_phantom,
(100.0*( (float) nvoxels_inside_phantom)/( (float) NVOXELS))); printf("%d: N voxels outside phantom: %ld (%4.1f %%)\n", Myid, nvoxels_outside_phantom,
(100.0*( (float) nvoxels_outside_phantom)/( (float) NVOXELS))); printf ("%d: N voxels MLP-touched inside phantom: "
"%ld (%4.1f %% of inside voxels - want 100%% )\n",
Myid, nvoxels__inside_phantom__mlp_touched,
(100.0*( (float) nvoxels_inside_phantom_jrilp_touched)
/ ( ( float) nvoxels_inside_phantom) ) ) ;
printf ("%d: N voxels MLP-touched outside phantom: "
"%ld (should be zero)\n",
Myid, nvoxels_outside_phantom_mlp_touched) ;
printf ( " %d : N voxels *not* MLP-touched inside phantom: "
"%ld (%4.1f %% of inside voxels - want zero)\n",
Myid, nvoxels_inside_phantom_not_touched,
(100.0*( (float) nvoxels_inside_phantom_not_touched)
/( (float) nvoxels_inside_phantom) ) ) ;
printf ("%d: N voxels *not* MLP-touched outside phantom: " "%ld (%4.1f %% of outside voxels - should 100%%)\n",
Myid, nvoxels_outside_phantom_not_touched,
(100.0*( (float) nvoxels_outside_phantom_not_touched)
/ ( ( float) nvoxels__outside_phantont) ) ) ;
} /* endif */
#endif
printf ("%d: — Block statsVn" , Myid);
printf("%d: N copies: %ld\n" , Myid, mlp->get_n_copies ( ) ) ;
printf ("%d: N bytes copied: %ld\n" , Myid, mlp->get_n_copied_bytes ( ) ) ; printf("%d: N allocd blocks: %d\n" , Myid, n_allocd_blocks ) ;
for (i = 0; i < n_allocd_blocks ; i ++)
{
n_bytes_block = mlp->get_block_nbytes ( i) ;
printf("%d: - blk %2d N bytes: %d (%7.1f MB %4.1f GB)\n", Myid, i, n_bytes__block,
({ float ) n_bytes_block ) /MB ,
( ( float) n_bytes_block) /GB) ;
} /* endif */
delete mlp;
printf ( "%d Basic memory stats\n", Myid);
printf ( "%d N calls to malloc(): %ld\n" , Myid, Njnallocs) ;
printf ( "%d N calls to realloc(): %ld\n" , Myid, N_reallocs);
#ifdef DEBUG
printf ("%d: N calls to free(): %ld\n" , Myid, N_frees);
#endif
printf ("%d: N allocated bytes: %ld (%7.1f MB 4.1f GB)\n
Myid, N_allocated_bytes ,
((float) N_allocated_bytes)/MB, ((float) N_allocated_bytes ) /GB) ;
printf("%d: Largest single mem req bytes: %ld (%7.1f MB %4.1f GB)\n
Myid, Largest_single_mem_request ,
( (float) Largest_single_mem_request) /MB,
( (float) Largest_single_mem_request) /GB) ;
printf ("%d: : file: %s line: %d\n",
Myid, Largest_single_mem_request_fname,
Largest_single_mem_request_line_no) ;
#ifdef DEBUG
printf("%d: N freed bytes: ¾ld (%7.1f MB %4.1f GB ) \n
Myid , N_freed_bytes ,
( ( float ) N_freed_bytes ) /MB , ( ( float ) N_freed_bytes ) /GB ) ; printf("%d: N not freed bytes: %ld (%7.1f MB %4.1f GB ) \n
Myid, N_allocated_bytes-N_freed_bytes ,
( ( floa ) ( N_allocated_bytes-N_freed bytes ) ) /MB ,
( ( floa ) ( N_allocated_bytes-N_freed_bytes ) ) /GB ) ;
printf("%d: High water mark bytes: %ld (%7.1f MB %4.1f GB ) \n
Myid, N_allocated_bytes_high_water_mark,
( (float) N_allocated_bytes_high_water_mark) /MB,
( ( float) N_allocated_bytes_high_water_mark) /GB) ;
#endif
printf ( " \n" ) ;
}
#ifdef DEBUG
#ifndef NO_MPI
/* MPI Profiles */
/* if (0) */
{
printf ("%d: — MPI FunctionsXn" , Myid);
for (i = 0; i < Nfuncs; i ++)
{
if ( func_profiles [ i ] . does_communicate )
{
printf ("%d: %15s: Ncalls %3d sees %f bytes %ld (%6.1f GB/sec)\n", Myid,
func_profiles [ i ] . func_name ,
func_profiles [i] . call_count,
func_profiles [ i ] . total_time_secs ,
func_profiles [ i ] . total_bytes ,
( func_profiles [ i ] . total_time_secs > 0.0
? ( func_profiles [ i] -total_bytes/GB)
/ func profiles [ i ] . total_tirne__secs
: 0.0)
) ;
}
else
{
printf("%d: %15s: Ncalls %3d sees %f\n" ,
Myid,
func_profiles [ i] . func_name,
func_profiles [ i ] . call_count ,
func_profiles [ i ] . total_time_secs ) ;
} /* endif */
} /* endfor */
printf ("\n");
}
#endif
#endif
/************ /
/* epilogue */
/************/
#ifndef NO_MPI
MPI_Finalize( ) ;
#endif
exit ( 0 ) ;
} /* end main( ) */
/*
"Note on a Method for Calculating Corrected Sums of Squares and Products," B.P. Welford, Technometrics , Vol 4, No. 3, Aug 1962, 419-420.
The Art of Computer Programming, D.E. Knuth, vol 2, 2nd Edition, p 216 The Art of Computer Programming, D.E. Knuth, vol 2, 3rd Edition, p 232
Reccurrence forumla that keeps a running mean (M_k) and running SSE (S_k) for some stream of values X i
For x_l
M_l = x_l
S_l = 0
For x_i i>l
M_i = M_(i-1) + [(x_i - M_(i-1) )/i]
S_i = S_(i-1 ) + [ (x_i - M (i-1) ) * (x i - M_i) ]
At any time k>l
running mean = M_k
running _sample_ (i.e., not _population_) variance = S_k/(k-l)
*/
static void accumulate_stats_val ( float *m,
float *s,
#ifdef DEBUG
float *min,
float *max,
#endif
unsigned int *n,
float val )
{
*n = *n + 1;
if (*n == 1)
{
*m = val;
*s = 0.0;
#ifdef DEBUG
*min = *max = val ;
#endif
}
else
{
float old_mean = *m;
float d = val - old_mean;
*m += (d/(*n));
*s += (d* (val - *m) ) ;
#ifdef DEBUG
if ( *min > val) *min = val;
if (*max < val) *max = val;
#endif
} /* endif */
/* printf ( "after pushing %f RunningMean %f RunningSSE %f\n" , val, *m, *s ) ; */ return;
} /* end accumulate_stats_val ( ) */
static void compute_and_bcast_vbm( int my_nhistories ,
unsigned char *history_bitmask,
float *input_floats ,
unsigned short *input_ushorts ,
int volume_bitmask_len,
char *volume_bitmask)
{
#ifndef N0_MPI
MPI_0p op;
char *dummy_recv_buff = (char *) NULL; // needed for bug in MVAPICH2 #endif
/ / makevbm2. cc
compute_initial__ bm(my_nhistories ,
his ory_bitmask,
input_floats ,
input_ushorts ,
volume_bitmask_len ,
volume_bitmask) ;
#ifndef N0_MPI ,
MPI_Op_create ( (MPI_User_function* ) andem, // my user-defined func
1 , // commute flag
Sop); // op handle
MPI_Reduce( (M id = ForemanRank
? MPI_IN_PLACE // result clobbers operand for root : volume_bitmask) , // operand for non-root
(Myid == ForemanRank
? volume_bitmask // result, relevant at root only : dummy__recv__buff ) ,
volume_bitmask_len , // count
MPI_CHAR, // data type
op, // operator
ForemanRank , // root
MPI_C0MM_W0RLD ) ;
MPI_Op_free ( &op ) ;
#endif
// formeman only morph_clean,
if (Myid == ForemanRank)
{
// morph_clean2.cc
morph_clean(volume_bitmask_len, volume_bitmask) ;
} /* endif */
#ifndef NO_MPI
// foreman beast morph_cleaned volume bitmask
MPI_Bcast ( volume_bitmask, // buff
volume_bitmask_len, // count
MPI_CHAR, // data type
ForemanRank, // root
MPI_COMM_WORLD) ;
#endif
#if 0
// intersect2.ee
// NICK: push all this into mlp code so that you do NOT
// have to malloc and store everything in entry_exit
compute entry_exit__from_vbm(my_nhistories ,
input_floats ,
input_ushorts ,
entry_exit ) ;
#endif
return;
} /* end compute_and_bcast_vbm( ) */
static void compute_nhistories ( int *my_nhistories ,
int *histories_per_worker , // relevant foreman only int n_input_histories ,
int max_histories_per_foreman_or_worker,
int *must_fill_foreman)
{
/* max _histories_per__foreman_or_worker is always defined */
/* ... in that context there are two cases : */
/* (1) we don't know how many input histories there are: */
/* read max_histories_per_foreman_or_worker into */
/* each process, must fill all workers, foreman is */
/* last to get his and so may be less than full */
/* (2) we know how many input histories there are: */
/* spread histories evenly across all Nprocs, make */
/* sure under max histories per foreman or worker, */
/* all workers get the same number of histories and */
/* foreman may get more due to rounding */
if (n_input_histories == U SPECIFIED_NINPUT_ HISTORIES )
{
*my_nhistories =
*histories_per_worker = max_histories_per_foreman_or_worker;
*must_fill_foreman = 0 ;
}
else
{
/* n_input_histories specified on command line */
*histories_per_worker = n_input_histories/Nprocs ;
if (Myid == ForemanRank)
{
/* Foreman must absorb rounding slack */
*my__nhistories = n_input_histories -
( (Nprocs-1 ) * ( *histories_per_worker ) ) ;
}
else
{
*my__nhistories = *histories_per_ orker ;
} /* endif */
*must_fill_foreman = 1;
}/* endif */
if ( *my_nhistories > max_histories_per_foreman_or_worker )
{
fprintf ( stderr ,
"ERROR: %S %d: proc %d: "
"my_nhistories %d > max_histories_per_foreman_or_ orker %d\n" , FILE , LINE , Myid,
*my_nhistories , max_histories_per_foreman_or_worker ) ;
PCTVABORT ( ABORT_TOO_MANY_HISTORIES )
} /* endif */
return;
} /* end compute_nhistories ( ) */
static float compute_stat_val ( float *input_floats ,
unsigned short *input_ushorts ,
int h,
int stat_displ) float val;
switch (stat_displ)
{
case STAT_DELTA_X :
{
int xl = get_xl ( input_ushorts , h);
int x2 = get_x2 ( input_ushorts , h ) ;
int x3 = get_x3 ( inputjushorts , h ) ;
int x4 = get_x4 ( input_ushorts , h ) ;
val = ((float) (x4 - x3 ) - (x2 - xl))
}
break;
case STAT_DELTA_Y :
{
int yl = get_y 1 ( input j shorts , h ) ;
int y2 = get_y2 ( input_ushorts , ) ;
int y3 = get y3 ( inputjushorts , h ) ;
int y4 = get_y4 ( input_ushorts , h);
val = ((float) (y4 - y3 ) - (y2 - yl ) )
}
break;
case STAT_DELTA_ENERGY :
{
float ein = get_ein(input_floats, h
float eout = get_eou ( input_floats , h
val = ein-eout;
break;
default:
val = 0.0; // supress annoying compiler warning
fprintf ( stderr,
"ERROR: %s %d: proc %d: "
"remove_based_on_binned_sta ( ) : "
"passed unrecognizable stat_displ %d\n",
FILE , LINE , Myid,
stat_displ ) ;
PCT_ABORT (ABORT_STATS_ERROR)
break;
} /* end switch() */
return val;
} /* end compute_stat_val ( ) */
static int count_histories_in_bitmask( int my_nhistories ,
unsigned char *history_bitmask)
{
int n_full_bytes = my_nhistories / 8;
int n_remaining_bits = my_nhistories % 8;
int i ;
int rc = 0 ;
/* first counting the bits in full bytes */
for (i = 0; i < n_full_bytes ; i ++)
{
switch (history_bitmask[i] )
{
/* nbits = 0 */
case 0x0: rc += 0; break;
/* nbits = 1 */
case 0x10: case 0x1: case 0x20: case 0x2: case 0x40: case 0x4: case 0x80: case 0x8:
rc += 1; break;
/* nbits = 2 */
case 0x11: case 0x12 : case 0x14: case 0x18: : case 0x21 : case 0x22 case 0x24: case 0x28: case 0x30: case 0x3 : case 0x41 : case 0x42 case 0x44: case 0x48: case 0x50: case 0x5 : case 0x60: case 0x6: case 0x81 : case 0x82: case 0x84: case 0x88: : case 0x90 : case 0x9: case OxaO : case Oxa: case OxcO : case Oxc :
rc += 2; break;
/* nbits = 3 */
case 0x13 : case 0x15: case 0x16: case 0x19: case Oxla: case Oxlc: case 0x23 : case 0x25: case 0x26: case 0x29: case 0x2a: case 0x2c : case 0x31 : case 0x32: case 0x34: case 0x38 : case 0x43 : case 0x45: case 0x46 : case 0x49: case 0x4a: case 0x4c: case 0x51 : case 0x52 : case 0x54 : case 0x58: case 0x61 : case 0x62 : case 0x64 : case 0x68 : case 0x70 : case 0x7: case 0x83: case 0x85 : case 0x86 : case 0x89 : case 0x8a: case 0x8c: case 0x91: case 0x92: case 0x94 : case 0x98: case Oxal : case 0xa2 : case 0xa4 : case 0xa8 : case OxbO : case Oxb: case Oxcl : case 0xc2 : case 0xc : case 0xc8 : case OxdO : case Oxd: case OxeO : case Oxe :
rc += 3; break;
/* nbits = 4 */
case 0x17: case Oxlb: case Oxld: case Oxle: case 0x27 : case 0x2b case 0x2d: case 0x2e: case 0x33: case 0x35: case 0x36 : case 0x39 case 0x3a: case 0x3c : case 0x47: case 0x4b: case 0x4d: case 0x4e case 0x53 : case 0x55: case 0x56: case 0x59: case 0x5a: case 0x5c case 0x63: case 0x65: case 0x66: case 0x69: case 0x6a: case 0x6c
case 0x71: case 0x72: case 0x74: case 0x78: case 0x87 : case 0x8b: case 0x8d: case 0x8e: case 0x93: case 0x95: case 0x96 : case 0x99: case 0x9a: case 0x9c: case 0xa3 : case 0xa5 : case 0xa6 : case 0xa9 : case Oxa : case Oxac : case Oxbl: case 0xb2: case 0xb4 : case 0xb8: case 0xc3 : case 0xc5 : case 0xc6 : case 0xc9 : case Oxc : case Oxcc : case Oxdl : case 0xd2: case 0xd4 : case 0xd8: case Oxel : case 0xe2 : case 0xe4 : case 0xe8 : case OxfO: case Oxf :
rc += 4; break;
/* nbits = 5 */
case Oxlf : case 0x2f : case 0x37: case 0x3b: case 0x3d: case 0x3e case 0x4f : case 0x57: case 0x5b: case 0x5d: case 0x5e: case 0x67 case 0x6b: case 0x6d: case 0x6e: case 0x73: case 0x75: case 0x76 case 0x79 : case 0x7a: case 0x7c: case 0x8f : case 0x97 : case 0x9b case 0x9d: case 0x9e: case 0xa7 : case Oxab: case Oxad: case Oxae case 0xb3 : case 0xb5: case 0xb6: case 0xb9 : case Oxba: case Oxbc case 0xc7 : case Oxcb : case Oxcd: case Oxce: case 0xd3: case 0xd5 case 0xd6: case 0xd9: case Oxda: case O dc : case 0xe3 : case 0xe5 case 0xe6 : case 0xe9 : case Oxea: case O ec : case Oxf1 : case 0xf2 case Oxf4 : case Oxf8:
rc += 5; break;
/* nbits = 6 */
case 0x3f : case 0x5f : case Ox6f : case 0x77: case 0x7b: case 0x7d: case 0x7e: case 0x9f : case Oxaf : case 0xb7: case Oxbb: case Oxbd: case Oxbe: case Oxcf : case 0xd7: case Oxdb: case Oxdd: case Oxde: case 0xe7 : case Oxeb: case Oxed: case Oxee: case Oxf3 : case Oxf5: case Oxf6: case Oxf9: case Oxfa: case Oxfc:
rc += 6; break;
/* nbits = 7 */
case 0x7f: case Oxbf: case Oxdf: case Oxef: case 0xf7: case Oxfb: case Oxfd: case Oxfe:
rc += 7; break;
/* nbits = 8 */
case Oxff: rc += 8; break;
default: break;
} /* end switch () */
} /* endfor */
/* now counting the bits in last non-full byte */
for ( i = 0 ; i < n_remaining_bits ; i ++ )
if (TEST(history_bitmask+n_full_bytes , i))
rc ++;
return rc;
} /* end count_histories_in_bitmask( ) */
static void fill_buffers_from_files ( float *input_floats ,
unsigned short *input__ushorts ,
int *deg,
FILE **fp,
int *histories_read_from_this_file , int max_histories_per_input_file ,
int nhistories_buffsize,
int *my_nhistories ,
int must_fill) // must_fill == 0 iff foreman
{
float *float_p;
unsigned short *ushort_p;
int done;
float_p = input_floats ;
ushort_p = input_ushorts ;
done = 0 ;
for ( *my_nhistories = 0;
! done && *my_nhistories < nhistories_buffsize ;
*my_nhistories = *my_nhistories + 1 )
{
if (max_hist.ories_per_input._file
! = UNSPECIFIED_ AX_NINPUT_HISTORIES_PER_FILE
&& *histories_read_from_this_file >= max_histories_per_input_file) {
/* reached limit of histories that can be read from this file */ /* ... time to start reading from the next input file */ fclose( *fp) ;
*deg = *deg + 2;
*fp = open_input_file ( *deg) ;
*histories_read_from_this_file = 0;
} /* endif */
if (fscanf(*fp, "%hu %hu %hu %hu %hu %hu hu %hu %f f %f\n",
ushort_p+Xl_DISPL , ushort_p+Yl_DISPL ,
ushort_p+X2_DISPL , ushort_p+Y2_DISPL ,
ushort_p+X3_DISPL, ushort_p+Y3_DISPL,
ushort_p+X4_DISPL, ushort_p+Y4_DISPL,
float_p+EIN_DISPL, float_p+EOUT_DISPL , float_p+THETA_DISPL ) == EOF)
{
/* ran out of data from open file, */
/* closing and moving to next file */
fclose(*fp) ;
*deg = *deg + 2;
*fp = open_input_file ( *deg) ;
*histories_read_from_this_file = 0;
if (fscanf (*fp, "%hu %hu %hu %hu %hu %hu %hu %hu %f %f %f\n", ushort_p+Xl_DISPL, ushort_p+Yl_DISPL ,
ushort_p+X2_DISPL , ushort_p+Y2_DISPL ,
ushort_p+X3_DISPL, ushort_p+Y3_DISPL,
ushort_p+X4_DISPL, ushort_p+Y4_DISPL,
float_p+EIN_DISPL, float_p+EOUT_DISPL, float_p+THETA_DISPL )
== EOF)
{
if (must_fill )
{
/* ran out of data for non-foreman workers */
fprintf ( stderr ,
"ERROR: %s %d: proc %d: "
"ran out of data data deg %d\n",
FILE , LINE , Myid,
*deg) ;
PCT_AB0RT (ABORT_TOO_MANY_HISTORIES )
}
else
{
done = 1;
} /* endif */
} /* endif */
} /* endif */
float_p += INPUT_FLOAT_STRIDE
ushort_p += INPUT_USHORT_STRIDE;
*histories_read_from_this_file = *histories_read_from_this_file + 1; } /* endfor */
if (done)
{
*my_nhistories = *my_nhistories - 1;
} /* endif */
return;
} /* end fill_buffers_from_files ( ) */
static void find_entry_exit_from_vbm_or_remove ( int nhistories,
unsigned char *history_bitmask, float *input_floats ,
unsigned short *input_ushorts , struct entry_exit_t *entry_exit, int volume_bitmask_len, char *volume__bitmask)
{
int h;
#ifdef DEBUG
int n_cleared_histories = 0;
#endif
for (h = 0; h < nhistories ; h++)
{
if (TEST(history_bitmask, h) )
{
// intersect2. cc
if (! compute_entry_exit_from_vbm( h,
input_floats ,
input_ushorts ,
#ifdef DEBUG
volume_bitmask_len,
#endif
volume_bitmask,
entry_exit ) )
{
/* could NOT find entry/exit on phantom ... */
/* declare a rogue history — need to toss it */ CLEAR ( history_bitmask, h ) ;
/* NICK: need to write code to re-claim space used */
/* NICK: in this block to record voxel ids thus */
/* NICK: far for this history. */
#ifdef DEBUG
n_cleared_histories ++;
#if 0
fprintf ( stderr,
"ERROR: %s %d: proc %d: "
"computed vbm_index %d (max %d) "
"from history %d "
"xl %d yl %d x2 %d y2 %d x3 %d y3 ¾d x4 %d y4 %d "
"entry_x %f entry__y %f entry_z %f "
"exit_x %f exit_y %f exit_z %f "
"\n",
FILE , LINE , Myid,
voxel_idx, volume_bitmask_len,
h,
get xl ( input ushorts , h), get_yl ( input_ushorts , h),
get_x2 ( input_ushorts , h ) , get_y2 ( input_ushorts , h ) ,
get_x3 ( input__ushorts , h), get_ 3 ( input_ushorts ,. h),
get_ 4 ( input_ushorts , h), get_y4 ( input_ushorts , h),
entry_exit.entry_x, entry_exit.entry_y, entry_exi . entry_z , entry_exit . exit_x , entry_exit . exit_y , entry_exi . exit_z ) ;
#endif
/ * MPI : : COMM_WORLD . Abort ( BORT_INVALID_ID ) ; * /
#endif
} /* endif */
} /* endif */
} /* endfor */
#ifdef DEBUG
printf("%d: could not connect to phantom n_cleared_histories %d\n", Myid, n_cleared_histories ) ;
#endif
return;
} /* end find_entry_exit_from_vbm_or_remove( ) */
static int get_stat_cell_idx( float *input_floats ,
unsigned short *input_ushorts ,
int h,
int stat_displ)
{
/* binning based on INTERNAL ENTRY plane */
int x = get_x2 ( input_ushorts , h) ;
int y = get_ /2 ( input_ushorts , h);
int converted_x = (x * REGI0N_C0LS ) /MAX_TRACK_X_IDX ;
int converted_y = (y * REGI0N_R0WS ) /MAX_TRACK_Y_ID ,- float theta = get_theta( input_floats , h);
int theta_idx = (int) (theta * THETA_IDX_PER_DEGREE + 0.5);
#ifdef DEBUG
{
int rc = ( (STAT_CELL_IDX_ANGLE_STRIDE * theta_idx)
+ ( STAT_CELL_IDX_ROW_STRIDE * converted_ )
+ ( STAT_CELL_IDX_COL_STRIDE * converted_ )
+ stat_displ ) ;
if (rc < 0 [ I rc >= N_STAT_CELLS )
{
f rintf ( stderr ,
"ERROR: %s %d: proc d: "
"computed invalid stats_idx %d "
"from history %d x %d y %d theta %f stat_displ %d "(must be 0 <= stats_idx < N_STAT_CELLS ¾d)\n",
FILE , LINE , Myid,
rc, h, x, y, theta, stat_displ, N_STAT_CELLS ) ;
PCT_ABORT ( ABORT_STATS_ERROR )
} /* endif */
return rc ;
}
#else
return ( ( STAT_CELL_IDX_ANGLE_STRIDE * theta_idx)
+ ( STAT_CELL_IDX_RO _STRIDE * converted_y )
+ ( STAT_CELL_IDX_COL_STRIDE * converted^)
+ stat_displ);
#endif
} /* end get_stat_cell_idx( ) */
static FILE *open_input_file ( int degree)
{
FILE *fp;
char inputfilename[256] ;
sprintf (inputfilename, "%s/%s%03d%s" ,
InputDir, InputFnamePrefix, degree, InputFnameSuffix) ;
if ( ! ( f = fopen( inputfilename, "r")))
{
fprintf ( stderr, "ERROR: could not open input file >%s<\n",
inputfilename ) ;
PCT_ABORT (ABORT_FAILED_FOPEN )
} /* endif */
return f ;
} /* end open_input_file ( ) */
/* extern "C" int getopt(int argc, char * const argv[], const char *optstring) throw( ) ; */
extern "C" int getopt ( int argc, char * const argv[], const char *optstring);
static void parse_command_line_args ( int argc ,
char **argv,
int *n_input_histories ,
int *max_histories_per_foreman_or_worker , int *max_histories_per_input_file ,
float *linear_solver_relaxation) int c ;
enum input_sets {threeM, thirtyeightM, twoB} input;
int error;
/* int getopt(int argc, char * const argv[], const char *optstring); */
extern char *optarg;
/* extern int optind, opterr, optopt; */
error = 0 ;
/* default values */
*n_input_histories = UNSPECIFIED_NINPUT_HISTORIES ;
*max_histories_per_input_file = UNSPECIFIED_MAX_NINPUT_HISTORIES_PER_FILE ;
*max_histories_per_foreman_or_worker = DEFAULT_MAXHISTORIES_PER_WORKER;
input = thirtyeightM;
*linear_solver__relaxation = DEFAULT_RELAXATION;
while ( (c = getopt (argc, argv, " sblh : f :rr. : r
{
switch (c
{
case input = threeM;
case input = thirtyeightM;
case input = twoB;
case *n_input_histories = atoi(optarg) ;
case *max_histories_per_input_file=atoi ( optarg ) ;
case *max_histories_per_foreman__or_worker=atoi ( optarg ) ;
case * linear_solver_relaxation=atof ( optarg) ;
default error = 1;
} /* end switch ( )
} / endwhile */
( error )
if (M id == ForemanRank)
{
/* only Master prints usage message */
fprintf ( stderr,
"\n\tusage: %s {-s | -b | -1} "
"{-h <N input histories>} "
"{-f <max N input histories per file>} "
"{-m <max N histories per worker or foreman>} "
"{-r <linear solver relaxation param>} "
"\n\n",
argv[0]);
fprintf (stderr, "where\n\n" ) ;
f rintf ( stderr, "\t-s - small input, 2D, ~3M histories\n" ) ;
f rintf ( stderr, "\t-b - big input, 3D, "38M histories ( defaul ) \n" ) ;
fprintf ( stderr, "\t-l - large input, 3D, "1.9B historiesXn" ) ; fprintf ( stderr, "\n");
#ifdef NO_MPI
exit ( 1 ) ;
#else
MPI_Abort ( MPI_COMM_WORLD , ABORT_INVALID_CMD_LINE_ARGS ) ;
#endif
} /* endif */
} /* endif */
switch (input)
{
case threeM:
sprintf ( InputDir , "¾s/Input-2D-3.2M/ " , InputDirBase ) ;
strcpy ( InputFnamePrefix, "projections. " ) ;
strcpy ( InputFnameSuffix, " -dat" ) ;
break;
case thirtyeightM:
sprintf (InputDir, " %s/Input-Full3D-38.6M/ " , InputDirBase);
strcpy( InputFnamePrefix, "angle_" ) ;
strcp ( InputFnameSuffix, " .txt" ) ;
break;
case twoB:
sprintf ( InputDir , "%s/Input-Full3D-GeneratedAtNIU-l .9B/ConvertedFromPen fold/" ,
InputDirBase ) ;
strcpy ( InputFnamePrefix, "angle_" ) ;
strcpy ( InputFnameSuffix, " . txt" ) ;
break;
defaul :
f rintf ( stderr ,
"ERROR: parse_command_line_args ( ) : "
"encountered invalid input = %d\n",
input ) ;
#ifdef N0__MPI
exit ( 1 ) ;
#else
MPI_Abort (MPI_COMM_WORLD , AB0RT_INVA1ID_CMD_LINE_ARGS ) ;
#endif
break;
} /* end switch */
return;
} /* end parse_command_line_args ( ) */
static void read_and_distribute_input ( int histories_per_worker,
int max_histories_per_input_file ,
int *my_nhistories ,
int must_fill_foreman,
float *input_floats ,
unsigned short *input_ushorts )
{
int deg;
int histories_read_from_this_file ;
#ifndef NO_MPI
int i;
MPI_Status status;
#endif
FILE *fp;
/* NICK: fix this - set to received for worker or last read for foreman*/
/* max_histories_per_foreman_or_worker is always defined */
/* in that context there are two cases: */
/* (1) we don't know how many input histories there are: */
/* read max_histories_per_foreman_or_worker into */
/* each process, must fill all workers, foreman is */
/* last to get his and so may be less than full */
/* (2) we know how many input histories there are: */
/* spread histories evenly across all Nprocs, make */
/* sure under max_histories_per_foreman_or_worker , */
/* all workers get the same number of histories and */
/* foreman may get more due to rounding */
/*****************************/
/* Read and distribute input */
/*****************************/
if (Myid == ForemanRank)
{
/* I am the foreman */
TotalHistories = 0;
/* opening first file */
deg = 0 ;
fp = open_input_file (deg) ;
histories_read_from_this_file = 0 ; ;
#ifndef NO_MPI
/* must read and distribute input to other workers */
for (i = 1; i < Nprocs; i ++)
{
fill_buffers_from_files ( input_floats ,
input_ushorts ,
°,
Sfp,
&histories_read_from_this_file, max_histories_per_input_file ,
histories per_worker ,
my_nhistories ,
1 ) ; // must_fill
INPUT_FLOAT_TAG, /* tag */ MPI_COMM_WORLD) ; /* comm */ MPI_Send( input_ushorts , /* buff */ histories_per_worker*INPUT_USHORT_STRIDE , /* count */ MPI_UNSIGNED_SHORT , /* type */
/* dest */
INPUT_USHORT_TAG , /* tag */ MPI_COMM_WORLD) ; /* coram */
TotalHistories += histories_per_worker;
} /* endfor */
#endif
/* now fill foreman's buffer, which may not be full */
fill_buffers_from_files ( input_floats ,
input_ushorts ,
°,
&fp,
&histories_read_from_this_file ,
max_histories_per_input_file ,
*my_nhistories ,
my_nhistories , // may reset foreman's val must_fill_foreman ) ; // must_fill
TotalHistories += ( *my_nhistories ) ;
}
#ifndef NO_MPI
else
{
/* I am NOT the foreman */
/* must receive my input from the foreman */
MPI_Recv( input_floats ,
histories_per_worker*INPUT_FLOAT_STRIDE, /* count */
MPI_FLOAT, /* type */
ForemanRank, /* source */
INPUT_FLOAT_TAG, /* tag */
MPI_COMM_WORLD , /* comm */
Sstatus ) ; /* status */ MPI_Recv ( input_ushorts ,
histories_per_worker*INPUT_USHORT_STRIDE, /* count */
MPI_UNSIGNED_SHORT , /* type */
ForemanRank, /* source */
INPUT_USHORT_TAG , /* tag */
MPI_COMM_WORLD , /* comm */
Sstatus ) ; /* status */
} /* endif */
#endif
return;
} /* end read_and_distribute_input ( ) */
static void remove__based__on_binned_stat(int nhistorxes,
unsigned char *history_bitmask,
float *input_floats ,
unsigned short *input_ushorts ,
struct stats s[],
float sigma_cutoff [ ] ,
int stat_displ)
{
int h;
int stat_idx;
float val;
#ifdef DEBUG
int n_cleared_histories = 0;
#endif
for (h = 0; h < nhistories ; h ++)
{
if ( TEST ( history_bitmask , h ) )
{
/* this history has not been previously thrown out */ stat_idx = get_stat_cell_idx( input_floats ,
input_ushorts ,
h,
stat_displ ) ;
#ifdef DEBUG
/ * *.* ************* /
/* sanity check */
/**************** /
/****************************************************/
/* confirming that history has not yet been cleared */
/* has been accumulated as part of the global stats */
/****************************************************/
if (s [ stat_idx] .n <= 0)
{
fprintf ( stderr,
"ERROR: %s %d: proc %d: "
"history %d: stat_idx %d from stat_displ %d "found n = %d (must be > 0)\n",
FILE , LINE . , Myid,
h, stat_idx, stat_displ, s[stat_idx] .n) ;
PCT_ABOR (ABORT_STATS_ERROR)
} /* endif */
#endif
val = compute_stat_val ( input_floats ,
input_ushorts ,
h,
stat_displ) ;
#ifdef DEBUG
/**************** /
/* sanity check */
**************** j
/*****************************************************************
/* confirming global_min_x <= x <= global_max_x */ /* where "global" means across all procs for stats cell stat_idx */ /*****************************************************************/ if (val < s [ stat_idx ] .min )
{
f rintf ( stderr ,
"ERROR: %s %d: proc %d: "
"history %d: stat_idx %d from stat_displ %d "
"computed val %f < global min %f\n" ,
FILE , LINE , Myid,
h, stat_idx, stat_displ, val, s [ stat__idx ] . min ) ;
PCT_ABOR (ABORT_STATS_ERROR)
} /* endif */
if (val > s[stat_idx] .max)
{
f rintf ( stder ,
"ERROR: %s %d: proc %d: "
"history %d: stat_idx %d from stat_displ %d "
"computed val %f > global max %f\n" ,
FILE , LINE , Myid,
h, stat_idx, stat_displ, val, s[stat_idx] .max) ;
PC _ABORT ( BORT_STATS_ERRO )
} /* endif */
#endif
if (fabs(val - s [ stat_idx] .mean) > sigma_cutoff [ stat_idx] )
{
CLEAR (history_bitmask, h);
#ifdef DEBUG
n_cleared_histories ++;
#endif
} /* endfor */
} /* endif */
} /* endfor */
#ifdef DEBUG
printf("%d: stat_displ %d n_cleared_histories %d\n",
Myid, stat_displ, n_cleared_histories ) ;
#endif
return ;
} /* end remove_based_on_binned_stat ( ) */
static void remove_histories_from_binned_stats ( int nhistories,
unsigned char *history_bitmask, float *input_floats ,
unsigned short *input_ushorts ) int h;
int stat_idx;
float val;
/* could make s[] and sigma_cutoff [ ] local vars instead */ /* of malloc/free them, but their combined size alone is "11MB */ /* which on some systems blows the default stack size (e.g., 8MB) */ /* ... easier to malloc/free them rather than wrestle with stack sizes */ /* struct stats s [N_STAT_CELLS ] ; */
/* float sigma_cutoff [N_STAT_CELLS ] ; */
struct stats *s;
float *sigma_cutoff ;
PCT_MALLOC ( s , N_STAT_CELLS*sizeof ( struct stats), struct stats);
PCT_MALLOC(sigma_cutoff , N_STAT_CELLS*sizeof ( float) , float) ;
/* printf ( "%d: ENTER remove histories_from_binned_stats ( ) \n" , Myid) ; */
/* initializing stats */
for (h = 0; h < N_STAT_CELLS ; h ++)
s [ h ] . n = 0 ;
/* PHASE 1 of 3 - compute local running mean and SSE */ for ( — 0; h < nhistories; h ++)
{
if (TEST(history_bitmask, h) )
{
/* accumulating stats for delta x */ stat_idx = get_stat_cell_idx ( input_floats ,
input_ushorts ,
h,
STAT_DELTA_X ) ;
val = compute_stat_val ( input_fLoats ,
input_ushorts , h,
STAT_DELTA_X ) ;
accumulate_stats val ( & ( s [ stat_idx ] . mean ) ,
& ( s [ stat_idx ] . sse ) ,
#ifdef DEBUG
& ( s [ stat_idx ] . min ) ,
S ( s [ stat_id ] . max ) ,
#endif
& ( s [ stat_idx ] . n ) , val) ;
/**********************************/
/* accumulating stats for delta y */ /**********************************/ stat_idx = get_stat_cell_idx ( input_floats , input_ushorts , h,
STAT_DELTA_Y ) ; val = compute_stat_val ( input_floats ,
input_ushorts , h,
STAT_DELTA_Y ) ;
accumulate_stats val ( & ( s [ stat_idx ] .mean) ,
s ( s [ stat_id ] . sse ) ,
#ifdef DEBUG
S ( s [ stat_id ] .min ) ,
S ( s [ stat_idx ] .max ) ,
#endif
S ( s [ stat_idx ] . n ) r val ) ;
/***************************************/ /* accumulating stats for delta energy */ stat_idx = get_stat_cell_idx ( input_floats , input_ushorts ,
STAT_DELTA__ENERGY ) ; val = compute_stat_val ( input_floats ,
input_us orts , h,
STAT_DELTA_ENERGY ) ; accuir.ulate_stats_val ( & ( s [ stat_idx ] .mean) ,
& ( s [ stat_id ] . sse ) ,
#ifdef DEBUG
& ( s [ stat_idx ] .min ) ,
& ( s [ stat^id ] . max ) ,
#endif
& ( s [ stat_idx ] . n ) , val )
} /* endif */
} /* endfor */
#if DEBUG
/* sanity check */
/****************/
{
int i,j;
(i = 0; i < N_STAT_CELLS ; i += STAT_ STATS )
for (j = 1; j < S AT_NSTATS ; j ++)
{
if (s [i] .n != s[i+j ] -n)
{
f rintf ( stderr ,
"ERROR: %s %d: proc %d: "
"LOCAL STAT: stats_idx %d: "
"found n_0 = %d != n_%d = %d "
"(must be equal) \n",
FILE , LINE , Myid,
h, s[i].n, j, s[i+j].n);
PCT_ABORT (ABORT_STATS_ERRO )
} /* endif */
} /* endfor */
endfor */
for (i = 0; i < N_STAT_CELLS ; i += STAT_NSTATS )
{
for (j = 0; j < S AT_NSTATS ; j ++)
{
if (s[i+j].n > 2 && s[i+j].sse <= 0.0)
{
f rintf ( stderr ,
"WARNING: %s %d: proc %d: "
"LOCAL STAT: stats_idx %d: found n_%d = %d sse_%d = %f "(sse probably should be > 0)\n",
FILE , LINE , Myid,
f j, s[i+j].n, j, s[i+j ] .sse) ;
} /* endif */
} /* endfor */
} /* endfor */
}
#endif
/* PHASE 2 of 3 - compute and distribute global mean and SSE */
#ifndef NO_MPI
{
MPI_Datatype Stats_mpi;
#ifdef DEBUG
int block_lengths [ 5 ] ; /* number of "things" in each element of */
/ * our new struct, for us, 1 each */
MPI_Aint displs[5] ; /* displacement of each element */
M I_Da atype types [ 5 ] ; /* type of each element */
#else
int block_lengths [ 3 ] ; /* number of "things" in each element of */
/* our new struct, for us, 1 each */
MPI_Aint displs[3]; /* displacement of each element */
MPI_Datatype types [ 3 ] ; /* type of each element */
#endif
MPI_Aint start_addr;
MPI_Aint addr;
MPI_Op op;
block_lengths [ 0 ] = block_lengths [ 1 ] = block_lengths [ 2 ]
#ifdef DEBUG
= block_lengths [ 3 ] = block_lengths [ ]
#endif
= 1;
types [ 0 ] MPI_FLOAT;
types [ 1 ] MPI_FLOAT;
types [ 2 ] MPI_UNSIGNED;
#ifdef DEBUG
types [ 3 ] MPI_FLOAT;
types [ 4 ] MPI_FLOAT ;
#endif
MPI_Address ( & ( s [ 0 ] .mean) , &start_addr ) ;
displs[0] = 0;
MPI_Address ( ( s [0 ] .sse) , Saddr) ;
displs[l] = addr - start_addr;
MPI_Address(&(s[0] .n) , Saddr);
displs[2] = addr - start_addr;
#ifdef DEBUG
MPI_Address(&(s[0] .min) , &addr);
displs[3] = addr - start_addr;
MPI_Address ( & ( s [ 0 ] .max) , saddr);
displs[4] = addr - start_addr;
#endif
/* build new derived datatype */
MPI_Type_struet (
#ifdef DEBUG
5, /* nelements */
#else
3, /* nelements */
#endif
block_lengths ,
displs ,
types ,
&Stats_mpi) ;
MPI_Type_commit(&Stats_mpi) ;
MPI_Op_create ( (MPI_User_function* ) combine_stats , // my user-def func
1, // commute flag
Sop); // op handle
MPi_Allreduce(MPI_IN_PLACE, // clobber operand with result
s, // operand that gets clobbered by result
N_STAT_CELLS , // count
Stats_mpi, // data type
opr // operator
MPI_COMM_WORLD ) ;
M I_Op_free ( Sop ) ;
}
#endif
#if DEBUG
/
/* sanity check */
^***·*************J
{
int i,j
for (i = 0; i < N_STAT_CELLS ; i += STAT_NSTATS )
{
for (j = 1; j < S AT_ STATS ; j ++)
{
if (s[i].n != s[i+j].n)
{
f rintf ( stderr ,
"ERROR: %s %d: proc %d: "
"GLOBAL STAT: stats_idx %d: "
"found n_0 = %d != n_%d = %d "
"(must be equal) \n",
FILE , LINE , Myid,
h, s[i].n, j, s[i+j].n);
PCT_ABORT ( ABORT_S ATS_ERROR)
} /* endif */
} /* endfor */
} /* endfor */
for (i = 0; i < N_STAT_CELLS i += STAT_ STATS )
{
for (j = 0; j < S AT_ STA S ; j ++)
{
if (s[i+j].n > 2 && s[i+j].sse <= 0.0)
{
fprintf ( stderr ,
"WARNING: %s %d: proc %d: "
"GLOBAL STAT: "
"stats_idx %d: found n_%d = %d sse_%d = %f "
"(sse probably should be > 0)\n",
FILE , LINE , Myid,
i, j, s[i+j].n, j, s[i+j] .sse) ;
} /* endif */
} /* endfor */
} /* endfor */
}
#endif
/* computing sigma cuttoff for each stat cell */
for (h = 0; h < N_STAT_CELLS ; h ++)
{
if (s [h] .n == 0)
sigma_cutoff [h] = 0; // arbitrary value, should never be used else
//KIRK:sigma_cutoff [h]=s[h] . meant (MAX_SIGMA*sqrt(s [h] .sse/(s[h] .n) ) ) sigma_cutoff [ h] = MAX_SIGMA*sqrt(s[h] .sse/s [h] .n) ;
} /* endfor */
/* dumping stats */
{
FILE *f _x, *fp_y , *fp_e;
FILE *fp = NULL;
char fname [ 300 ] ;
if (Myid == ForemanRank)
{
sprintf (fname, "%s /stats . . %s . %s" ,
OutputDirBase , Nowstring, Relaxparamstring) ;
if ((fp_x = fopen(fname, "w" ) ) == NULL)
{
f rintf (stderr, "ERROR: could not open output stats file >%s<\n", fname ) ;
PCT_ABORT (ABORT_FAILED_FOPE )
} /* endif */
sprintf ( fname, " %s/stats .y . %s . %s " ,
OutputDirBase, Nowstring, Relaxparamstring);
if ((fp_y = fopen(fname, "w" ) ) == NULL)
{
fprintf (stderr, "ERROR: could not open output stats file >%s<\n" , fname ) ;
PCT_ABORT ( ABORT_FAILED_FOPEN )
} /* endif */
sprintf ( fname , " ¾s/stats . e . %s . %s " ,
OutputDirBase , Nowstring, Relaxparamstring) ;
if ( ( fp_e = fopen( fname, "w" ) ) == NULL)
{
fprintf ( stderr , "ERROR: could not open output stats file >%s<\n", fname ) ;
PCT_ABORT ( ABORT_FAILED_FOPE )
} /* endif */
for (h = 0; h < N_STAT_CELLS ; h ++)
{
if ( (h % 3) == 0) fp = fp_x;
if ((h % 3) == 1) fp = fp_jy
if ( (h % 3) == 2) fp = fp_e;
#ifdef DEBUG
fprintf (fp, "%d %f %f %f %f %f\n",
s[h].n, s[h].mean, s[h].sse, sigma_cutoff [h] , s[h].min, s[h].max);
#else
fprintf (fp, "%d %f %f %f\n",
s[h].n, s[h].mean, s[h] .sse, sigma_cutoff [h] ) ;
#endif
} /* endfor */
fclose ( fp_x) ;
fclose ( fp_jy ) ;
fclose ( fp_e ) ;
} /* endif */
/* PHASE 3 of 3 - throwing out rogue histories (outside MAX_SIGMA ) */ /*******************************************************************/ remove_based_on_binned_sta ( nhistories ,
history_bitmask,
input_floats ,
input__ushorts ,
s,
sigma_cutoff ,
STAT_DELTA_X )
remove_based_on_binned_sta ( nhistories ,
history_bitmask ,
input_floats ,
input_ushorts ,
s,
sigma_cutoff ,
STAT_DELTA_Y ) ;
remove_based_on_binned_stat ( nhistories ,
history_bitmask,
input_floats ,
input_ushorts ,
s,
sigma_cutoff ,
STAT_DELTA_ENERGY ) ;
PCT_FREE ( s ) ;
PCT_FREE ( sigma_cutoff )
return;
} /* end remove_histories_from_binned_stats ( ) */
// NICK: need to write test_solution_for_done ( )
static int test_solution_for_done ( float solution[])
{
int rc ;
Iterations ++;
rc = (Iterations >= Niters);
return rc ;
} /* end test_solution( ) */
#include "pct.h"
#ifndef NO_MPI
int MPI_Init(int *argc, char ***argv)
{
int idx = MPI_INIT_IDX;
int rc ;
int i ;
double start_timef end_time;
start_time = MPI_Wtime ( ) ;
strcpy ( func_profiles [MPI_INIT_IDX ] . func_name , "M I_Init ) ;
strcpy( func_profiles [MPI_COMM_SIZE_IDX] .func_name, "MPI_Comm_size" ) ; strcpy ( func_profiles [MPI_COMM_RANK_IDX] . func_name, "MPI_Comm_rank" ) ; strcpy ( func_profiles [MPI_ADDRESS_IDX ] . func_name , "MPI_Address " ) ; strcpy ( func__profiles [MPI_TYPE_STRUCT__IDX] . func_name "MPI_Type_struct " strcpy ( func_profiles [MPI_TYPE_COMMIT_IDX ] . func_name "MPI_Type_commit " strcpy ( func_profiles [MPI_OP_CREATE_IDX] . func_name, "MPI_Op_create" ) ; strcpy ( func_profiles [MPI_OP_FREE_IDX ] . func_name , "MPI_Op_free" ) ; strcp ( func_profiles [ MPI_ALLREDUCE_IDX ] . func_name , "MPI_Allreduce" ) ; strcp ( func_profiles [MPI_BCAST_IDX] . func_name, "MPI_Bcast" ) ; strcpy ( func_profiles [MPI__SEND_IDX] . func_name, "M I_Send" ) ; strcpy ( func_profiles [MPI_RECV_IDX ] . func_name , "MPI Recv" ) ; for (i = 0; i < Nfuncs; i ++)
{
func_profiles [ i ] . call_count = 0;
func_profiles [ i ] - otal_time_secs = 0.0;
func_profiles [ i] . does_communicate = 0;
func_profiles [ i ] . total__bytes = 0;
} /* endfor */
/* fixing those that DO communicate */
funcjorofiles [MPI_ALLREDUCE_IDX] .does_coinmunicate
= func_profiles [MPI_BCAST_ID ] . does_communicate
= func profiles [MPI SEND IPX] . does_communicate
= func_profiles [MPI_RECV_IDX] . does_communicate rc = PMPI_Init ( argc , argv) ;
end_time = MPI_Wtime ( ) ;
( func_profiles [ id ] . call_count ) ++;
( func_profiles [ idx] . total_time_secs ) += (end_time return rc ;
} /* end MPI_Ii.i () */
int MPI_Comm_size(MPI_Comm comm, int *size)
{
int idx = MPI_COMM_SIZE_IDX;
int rc;
double start_time, end_time;
start time = MPI_Wtime();
rc = PMPI_Coinm_size ( comm, size);
end_time = MPI_Wt±me( ) ;
(func_profiles [idx] . call_count) ++;
( func_profiles [idx] . total_time_secs ) += (end_time return rc;
} /* end PI_Comm_size ( ) */
int MPI_Conim_rank( PI_Comm contra, int *rank)
{
int idx = MPI_COMM_RANK_IDX;
int rc ;
double start_time, end_time;
start_time = MPI_Wtime();
rc = PMPI_Coirim_rank( comm, rank);
end_time = MPI_Wtime();
( func_profiles [ id ] .call_count) ++;
( func_profiles [ id ] . total_time_secs ) += ( end_time return rc;
} /* end MPI_Comm_rank( ) */
int MPl_Address ( oid * location, MPI_Aint *address) {
int idx = MPI_ADDRESS_IDX;
int redouble start_time, end_time;
start_time = MPI_Wtime ( ) ;
rc = PMPl_Address ( location, address);
end_time = MPI_Wtime ( } ;
( func_prcfiles [ idx ] . call__count ) ++;
( func_profiles [ id ] .total_time_secs) += (end_time return rc ;
} /* end MPI_Address ( ) */
int MPI_Type_struct(int count,
int *array_of_blocklengths , MPI_Aint *array_of_displacements , MPI_Datatype *array_of_types , MPI_Datatype *newtype)
{
int idx = MPI_TYPE_STRUCT_IDX ;
int rc ;
double start_time, end_time;
start_time = MPI_Wtime ( ) ;
rc = PMPI_Type_struct ( count,
array_of_blocklengt s ,
array_of_displacements ,
array_of_types ,
newtype ) ;
end_time = MPI_Wtime ( ) ;
( func_profiles [ idx] .call_count) ++;
( func_profiles [ idx] . total_time_secs ) += (end_time - start_time); return rc ;
} /* end MPI_Type_struct ( ) */
int MPI_Type_commit(MPI_Datatype *datatype)
{
int idx = MPI_TYPE_COMMIT_IDX;
int rc ;
double start_time, end_time;
start_time = MPI_Wtime();
rc = PMPI_Type_commit ( datatype ) ;
end_time = MPI_Wtime();
( func_profiles [ idx] . call_count) ++;
( func_profiles [ idx ] .total_time_secs ) += (end_time - start_time); return rc ;
} /* end MPI_Type_commit( ) */
int MPI_Op_create (MPI_User_function *function, int commute, MPI_Op * {
int idx = MPI_OP_CREATE_IDX;
int rc ;
double start_time, end_time;
start_time = MPI_Wtime ( ) ;
rc = PMPI_Op_create ( function, commute, op);
end_time = MPI_ time();
( func_profiles [ idx] .call_count) ++;
( func_profiles [ id ] . total_time_secs ) += (end_time - start_time ) ; return rc;
} /* end MPI_Op_create{ ) */
int MPI_Op_free(MPI_Op *op)
{
int idx = MPI__OP_FREE_IDX;
int rc ;
double start_time, end_time;
start_time = MPI_Wtime();
rc = P PI_Op_free(op) ;
end_time = MPI_Wtime( ) ;
( func_profiles [ id ] . call_count ) ++ ;
( func_profiles [ idx ]. total_time_secs ) += (end_time - start_time); return rc;
} /* end MPI_Op_free ( ) */
/*****************************/
/* Message Passing functions */
/*****************************/
int MPI_Allreduce ( oid *sendbuf,
void *recvbuf,
int count,
MPIJDatatype datatype,
MPI_Op op,
MPI_Comra comm)
{
int idx = MPI_ALLREDUCE_IDX;
int rc ;
double start_time, end_time;
int extent;
start_time = MPI_Wtime();
rc = PMPI_Allreduce(sendbuf , recvbuf, count, datatype, op, comm); end_time = MPI_Wtime( ) ;
( func_profiles [ idx ] . call_count ) ++ ;
( func_profiles [ idx ]. total_time_secs ) += (end_time - start_time); MPI_Type_size ( datatype, fcextent) ;
( func_profiles [ idx ]. total_bytes ) += (count * extent);
return rc ;
} /* end MPI_Allreduce( ) */
int M I_Bcast(void *buffer,
int count,
MPI_Datatype datatype,
int root,
MPI_Comin comm)
{
int idx = MPI_BCAST_IDX;
int rc;
double start_time, end_time;
int extent;
start_time = MPI_Wtime();
rc = PMPI_Bcast( buffer, count, datatype, root, comm);
end_time = MPI_Wtime ( ) ;
( func_profiles [ idx ] . call_count ) ++ ,*
( func_profiles [ id ]. total_time_secs ) += (end_time - start_time); MPI_Type_size ( datatype , Sextent) ;
( func_profiles [ idx] . total_bytes ) += (count * extent);
return rc;
} /* end MPI_Bcast() */
int MPI_Send(void *buf,
int count,
MPl_Datatype datatype,
int dest,
int tag,
MPI_Comm comm)
int idx = MPI_SEND_IDX;
int rc;
double start_time, end_time;
int extent;
start_time = MPI_Wtime ( ) ;
rc = PMPI_Send(buf , count, datatype, dest, tag, com) ;
end_time = MPI_Wtime();
{ func_profiles [ idx] -call__count) ++;
( func_profiles [ idx ] .total_time_secs) += (end_time - start_time); MPI_Type_size ( datatype, Sextent) ;
( func_profiles [ idx] -total_bytes ) += (count * extent);
return rc ;
} /* end MPI_Send() */
int MPI_Recv( void *buf,
int count,
MPI_Datatype datatype,
int source,
int tag,
MPI_Comm comm.,
MPI_Status *status)
{
int idx = MPI_RECV_IDX;
int rc ;
double start_time, end_time;
int extent;
start_time = MPI_ time ( ) ;
rc = PMPI_Recv( buf , count, datatype, source, tag, comm, status); end_time = MPI_Wtime();
( func_profiles [ idx] .call_count) ++;
( func_profiles [ idx] . total_time_secs ) += (end_time - start_time); MPI_Type_size( datatype, Sextent) ;
( func_profiles [ idx]„total_bytes ) += (count * extent);
return rc ;
} /* end MPI_Recv( ) */
#endif
[00042] Throughout this application, various publications, including United States patents, are referenced by author and year and patents by number. Full citations for the publications are listed below. The disclosures of these publications and patents in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art to which this invention pertains.
[00043] The invention has been described in an illustrative manner, and it is to be understood that the terminology, which has been used is intended to be in the nature of words of description rather than of limitation.
[00044] Obviously, many modifications and variations of the present invention are possible in light of the above teachings. It is, therefore, to be understood that within the scope of the appended claims, the invention can be practiced otherwise than as specifically described.