WO2017122590A1

WO2017122590A1 - Arithmetic logic unit, x-ray ct device, and data processing method

Info

Publication number: WO2017122590A1
Application number: PCT/JP2017/000263
Authority: WO
Inventors: 佑太小倉; 亮太小原
Original assignee: 株式会社日立製作所
Priority date: 2016-01-15
Filing date: 2017-01-06
Publication date: 2017-07-20
Also published as: JP2017124081A

Abstract

When an arithmetic logic unit according to the present invention executes a load distribution process in a back projection process, for example, the arithmetic logic unit partitions an image that is a processing region into a plurality of blocks during the load distribution processing and calculates the amount of data to be used for processing each of the partitioned blocks. Furthermore, the arithmetic logic unit calculates the load for processing each block based on the amount of data used, and assigns arithmetic units to processing regions on the basis of the load. Thus, the load applied to an arithmetic unit can be equalized and data processing can be performed efficiently. Moreover, an X-ray CT device 1 according to the present invention comprises the arithmetic logic unit.

Description

Arithmetic apparatus, X-ray CT apparatus, and data processing method

The present invention relates to an arithmetic device, an X-ray CT device, and a data processing method, and more particularly, to an increase in calculation speed in data processing using an arithmetic device having a plurality of arithmetic units.

In recent years, in a CT (Computed Tomography) apparatus, the processing amount of an arithmetic unit has increased with an increase in data amount and complicated calculation. On the other hand, speeding up of processing is required in order not to delay the workflow in hospital facilities. Due to such a background, it is generally performed to perform parallel processing using a multi-core processor (arithmetic unit) equipped with a plurality of arithmetic units (cores). In parallel processing, for example, a method of assigning a computing unit to each module or a method of dividing a region to be processed (image, projection data, etc.) and assigning a computing unit to each of the divided regions are proposed.

Thus, when performing parallel processing using a plurality of arithmetic units, it is desirable that the processing amount of each arithmetic unit is as uniform as possible. This is because an arithmetic unit with a small amount of processing ends processing quickly and waits for processing of an arithmetic unit with a large amount of processing. In order to deal with such problems, a method has been proposed in which the processing time of each computing unit is balanced by dividing an image into equal parts to perform efficient parallel processing.

In the back projection process for generating an image from projection data obtained by an X-ray CT apparatus, the amount of projection data used for generating one pixel differs depending on the position of the pixel. Therefore, the calculation load varies depending on the pixel position to be processed. Also in the forward projection process for generating projection data from an image, the number of transmitted pixels (number of images) varies depending on the position of the detection element, and therefore the calculation load varies depending on the element position to be processed.

The reason why the calculation load increases as the amount of data used increases is that the higher the amount of data, the higher the cost for transferring data to the calculator.

In Patent Document 1, a slice plane to be reconstructed is divided into a plurality of blocks, a region of a projection image corresponding to this is cut out, a processing unit is set for each block as local data of each block, and the processing unit is divided into calculation units. An image processing method for performing parallel processing is described.

JP 2006-55336 A

However, in the method of Patent Document 1, the load balance of the computing unit is equalized by equally dividing the processing area for those in which the data amount does not change depending on the processing area (convolution of the projection image). For the processing (back projection) in which the amount of data changes depending on the above, equalization of the load balance of the arithmetic unit is not considered.

The present invention has been made in view of the above problems, and its object is to perform parallel processing so that the load balance of each computing unit is equal in consideration of the amount of data used for processing. Accordingly, it is an object to provide an arithmetic device, an X-ray CT apparatus, and an image processing method capable of speeding up image processing.

The present invention for achieving the above-described object is an arithmetic device having a plurality of arithmetic units, a data acquisition unit that acquires data used for processing, and a processing region dividing unit that divides a processing region into a plurality of blocks. A use data amount calculation unit that obtains a use data amount that is a data amount used for processing of the block; a load amount calculation unit that calculates a load amount of processing in each block based on the use data amount; and the load An arithmetic unit comprising: an arithmetic unit assigning unit that assigns an arithmetic unit to each block based on a quantity; and an arithmetic processing unit that performs the processing by the assigned arithmetic unit.

Or an X-ray CT apparatus provided with the arithmetic device of the present invention.

Alternatively, in an arithmetic device having a plurality of arithmetic units, a step of obtaining data used for processing, a step of dividing a processing region into a plurality of blocks, and a use data amount that is a data amount used for processing of the block are obtained. A step, a step of calculating a processing load amount in each block based on the use data amount, a step of assigning a computing unit to each block based on the load amount, and performing the processing by the assigned computing unit A data processing method comprising: steps.

According to the present invention, even when the amount of data used for processing is not constant, an arithmetic device capable of speeding up image processing by performing parallel processing so that the load balance of each arithmetic unit is equal, X A line CT apparatus and a data processing method can be provided.

External view showing the entire configuration of the X-ray CT apparatus 1 Block diagram showing the internal configuration of the X-ray CT apparatus 1 Functional block diagram of the arithmetic unit 202 The figure which shows the positional relationship of the X-ray detector 103 (use data), X-ray beam, and image i1, i2, i3, ... (processing area) in back projection processing. Graph showing the relationship between image reconstruction position and amount of data used Diagram explaining assignment of computing units Diagram showing the flow of backprojection processing The flowchart which shows the flow of the load distribution process of step S102 of FIG. The flowchart which shows the flow of a calculation process of the use data amount of step S202 of FIG. Example of divided processing areas (blocks B1, B2, B3,...) The flowchart which shows the flow of the range calculation process of the use data of step S302 of FIG. The figure explaining the method of calculation of use ch coordinate (step S401) The figure explaining how to use slice coordinates (step S402) The figure explaining the positional relationship between the end point of the block and the corresponding coordinates on the projection data The flowchart which shows the flow of the arithmetic unit allocation process of step S204 of FIG. The figure which shows the positional relationship of the X-ray detector 103 (processing area | region), X-ray beam, and image i1, i2, i3, ... (use data) in a forward projection process. Diagram showing the flow of forward projection processing The flowchart which shows the flow of the load distribution process of step S602 of FIG. 18 is a flowchart showing the flow of processing for calculating the amount of used data in step S702 of FIG. Example of divided processing areas (blocks Rd1, Rd2,... RdN) The figure explaining the position on the image corresponding to the representative end points Rd1 [0, sl1], Rd2 [0, sl2],... RdN [0, slN] of each block. Flowchart for explaining arithmetic unit assignment processing (step S204 in FIG. 8) when the number of arithmetic units is larger than the number of processing blocks.

An arithmetic device of the present invention is an arithmetic device having a plurality of arithmetic units, a data acquisition unit that acquires data used for processing, a processing region dividing unit that divides a processing region into a plurality of blocks, and processing of the blocks A use data amount calculation unit for obtaining a use data amount that is a data amount to be used for a load, a load amount calculation unit for calculating a load amount of processing in each block based on the use data amount, and an operation based on the load amount A computing unit allocating unit for allocating a unit, and an arithmetic processing unit for performing the process by the allocated computing unit.

Further, the arithmetic unit assigning unit assigns the arithmetic unit to each block so that a difference in load amount between the blocks is not more than a predetermined threshold value.

Further, the data used for the processing is projection data, and the processing is back projection processing for generating an image that is the processing region using the projection data.

The use data amount calculation unit calculates the use data amount by calculating a range of projection data corresponding to a pixel position of each block through which X-rays irradiated from an X-ray focal point pass. And

Further, the data used for the processing is image data, and the processing is forward projection processing for forwardly projecting the image data onto a detector which is the processing region.

The use data amount calculation unit calculates the use data amount of the block by obtaining the number of images through which X-rays emitted from the X-ray focal point pass through the detection element position of the detector. To do.

Further, the load amount calculation unit further calculates the load amount based on a data transfer rate of a computing unit.

Further, the arithmetic unit assigning unit calculates an average load amount per arithmetic unit, integrates the load amounts of a plurality of blocks, and performs each operation so that the integrated value is within the average load amount per arithmetic unit. The processing range to be allocated to the vessel is determined.

Further, the arithmetic unit assigning unit calculates a load amount ratio of each block, determines the number of arithmetic units to be assigned to the block based on the ratio, and assigns as many arithmetic units as the determined arithmetic unit to each block. It is characterized by.

Further, the arithmetic unit assigning unit assigns the arithmetic unit using a function representing a relationship between the position of the processing area and the amount of used data.

Also, the X-ray CT apparatus of the present invention is characterized by including these arithmetic units.

In addition, the data processing method of the present invention includes a step of acquiring data used for processing, a step of dividing a processing area into a plurality of blocks, and data used for processing of the blocks in an arithmetic device having a plurality of arithmetic units. An amount of used data that is an amount; a step of calculating a processing load amount in each block based on the used data amount; a step of assigning an arithmetic unit to each block based on the load amount; Performing the above-described processing by a computing unit.

[First Embodiment]
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. First, the hardware configuration of the X-ray CT apparatus 1 will be described with reference to FIGS. 1 and 2.

As shown in FIGS. 1 and 2, the X-ray CT apparatus 1 is roughly composed of a scanner 10 and an operation unit 20. The scanner 10 includes a gantry 100 and a bed 101. As shown in FIG. 2, the gantry 100 includes an X-ray tube 102, an X-ray detector 103, a collimator 104, a high voltage generation device 105, a data collection device 106, a gantry driving device 107, and the like. The operation unit 20 includes a central control device 200, an input / output device 201, an arithmetic device 202, and the like.

The operator uses the input / output device 201 of the operation unit 20 to input shooting conditions, reconstruction conditions, and the like. The imaging conditions include, for example, the X-ray beam width, the bed feeding speed, the tube current, the tube voltage, the imaging range (body axis direction range), the number of imaging views per round. The reconstruction condition is, for example, a region of interest, FOV (Field （Of View), a reconstruction filter function, or the like. The input / output device 201 includes a display device 211 that displays CT images and the like, an input device 212 such as a mouse, trackball, keyboard, and touch panel, a storage device 213 that stores data, and the like.

The central control device 200 is a computer including a CPU (Central Processing Unit), ROM (Read Only Memory), RAM (Random Access Memory), and the like, and controls the operation of the entire X-ray CT apparatus 1. The central control device 200 transmits a control signal necessary for photographing to each device of the scanner 10 based on photographing conditions and reconstruction conditions.

When imaging is started in response to an imaging start signal from the central controller 200, the high voltage generator 105 applies a tube voltage and a tube of a predetermined magnitude to the X-ray tube 102 based on the control signal from the central controller 200. Apply current. The X-ray tube 102 emits electrons with energy corresponding to the applied tube voltage from the cathode, and the emitted electrons collide with the target (anode) to cause X-rays with energy corresponding to the electron energy to the subject 3. Irradiate.

The irradiation area of the X-rays irradiated from the X-ray tube 102 is limited by the collimator 104. The opening width of the collimator 104 is controlled based on a control signal from the central controller 200.

In multi-slice CT using an X-ray detector 103 in which detection elements are arranged in a two-dimensional direction, X-rays spread in a cone shape or a pyramid shape from the X-ray tube 102 as an X-ray source in accordance with the X-ray detector 103. A beam is irradiated.

X-rays irradiated from the X-ray tube 102 and whose irradiation area is limited by the collimator 104 are absorbed (attenuated) according to the X-ray attenuation coefficient in each tissue in the subject 3, pass through the subject 3, and pass through the X-ray tube 102. Is detected by an X-ray detector 103 arranged at a position opposite to.

The couch 101 includes a top plate on which the subject 3 is placed, a vertical movement device, and a top plate driving device. The couch height is raised and lowered in the vertical direction based on a control signal from the central control device 200, or in the body axis direction. Move back and forth, or move left and right in a direction perpendicular to the body axis and parallel to the floor (left and right direction). During imaging, the couch 101 moves the couch at the couch moving speed and moving direction determined by the central controller 200.

The gantry driving device 107 circulates the rotating disk of the gantry 100 based on the control signal from the central control device 200.

The X-ray detector 103 is obtained by two-dimensionally arranging, for example, an X-ray detection element group constituted by a combination of a scintillator and a photodiode in a channel direction (circumferential direction) and a column direction (body axis direction). The X-ray detector 103 is disposed so as to face the X-ray tube 102 with the subject 3 interposed therebetween. The X-ray detector 103 detects the X-ray dose irradiated from the X-ray tube 102 and transmitted through the subject 3 and outputs it to the data collection device 106.

The data collection device 106 collects X-ray dose information detected by each X-ray detection element of the X-ray detector 103, converts it into a digital signal, and sequentially transmits it to the arithmetic unit 202 of the operation unit 20 as transmitted X-ray information. Output.

The computing device 202 includes a reconstruction processing device 221 and an image processing device 222.

The reconstruction processing device 221 acquires transmission X-ray information collected by the data collection device 106 and creates projection data necessary for reconstructing an image. Further, the reconstruction processing device 221 reconstructs a tomographic image (CT image) of the subject 3 using the projection data.

In the present invention, the arithmetic unit 202 includes a plurality of arithmetic units (cores). The computing device 202 performs load distribution processing described later, assigns processing so that the load balance is equal to a plurality of computing units, and generates an image. The processing is processing related to image generation such as, for example, back projection, forward projection, or successive approximation reconstruction processing that repeatedly performs back projection and forward projection. Details of the load distribution processing executed by the arithmetic unit 202 will be described later. The reconstruction processing device 221 stores the generated CT image in the storage device 213 and displays it on the display device 211.

The image processing device 222 performs image processing on the CT image created by the reconstruction processing device 221 and stored in the storage device 213. In addition, the image after image processing is displayed on the display device 211 and stored in the storage device 213.

Next, the functional configuration of the arithmetic unit 202 will be described with reference to FIG.

The computing device 202 includes a data acquisition unit 41, a processing area dividing unit 42, a used data amount calculating unit 43, a load amount calculating unit 44, a computing unit allocating unit 45, an arithmetic processing unit 46, and the like as main functional configurations.

The data acquisition unit 41 acquires data used for processing. The data used for processing is projection data in back projection processing and image data in forward projection processing. Projection data is generated based on X-ray dose information (transmitted X-ray information) irradiated from the X-ray tube 102, transmitted through the subject 3 and the bed 101, and detected by the X-ray detector 103.

Here, with reference to FIGS. 4 to 6, the relationship between the image position and the use data in the process of generating an image from the projection data (back projection process) will be described. 4, i1, i2, i3,... Are tomographic images at each slice position (image reconstruction position).

As shown in FIG. 4, the line connecting the X-ray tube (X-ray focal point) 102 and each detection element of the X-ray detector 103 is an X-ray beam. The number of X-ray beams that intersect the images i1, i2, i3,... Varies depending on the slice position of the image.

For example, if the number of slices of projection data used to generate the image i1 is s1, the number of slices used to generate the image i2 is s2, and the number of slices used to generate the image i3 is s3, X The closer to the center from the tube apparatus 102, the smaller the number of used slices, and the farther the image, the larger the number of used slices (s2> s1). When the image position is more than a certain distance from the X-ray tube 102, pixels exceeding the slice range of the X-ray detector 103 appear. For example, when an end slice is used beyond the slice range, the number of used slices is reduced (s2> s3).

That is, in the back projection process for generating an image from projection data, the amount of data used to generate one pixel is not uniform and varies depending on the image reconstruction position as shown in FIG.

As shown in FIG. 6, the projection data used to create the images i1, i2, i3, i4 are in the range of D1, D2, D3, D4, respectively. That is, in the back projection process, the size (number of pixels) of the area (image) to be processed is the same for each image i1 to i4, but the data amount D1 to D4 used for the process (image generation) is different for each image.

By the way, the load amount of the arithmetic unit of the arithmetic unit 202 is determined by the calculation amount and the data transfer amount.

The calculation amount is determined by the number of pixels (image size), but the data transfer amount depends on the data amounts D1 to D4 used for processing. Therefore, as shown in the upper part of FIG. 6, the load amounts for generating the images i1, i2, i3, i4 are different.

If the computing units are evenly distributed in consideration of only the size of the processing area (image size), as shown in the lower part of FIG. 6, the load amount of the computing units varies, and the processing is inefficient as a whole. In the present invention, as shown in the middle part of FIG. 6, the load amount is calculated using the calculation amount (number of pixels) of the calculator and the data amount (used data amount) used for the calculation, and based on the load of the calculator. By assigning computing units, processing can be performed efficiently and at high speed.

Returning to the explanation of FIG.

The processing area dividing unit 42 divides the processing area into a plurality of blocks. The processing area is an image in back projection processing.

The used data amount calculation unit 43 obtains the data amount used for processing of each block divided by the processing area dividing unit 42. How to determine the amount of data used will be described later.

The load amount calculation unit 44 calculates the load amount of each block described above. At this time, the load amount calculation unit 44 calculates the load amount of each block in consideration of not only the image size (calculation amount) but also the use data amount. The calculation of the load amount will be described later.

The computing unit allocating unit 45 allocates computing units that process the processing areas so as to be even based on the load amount calculated by the load amount calculating unit 44. Although it is desirable that the load amount be equal, if the load amount is not equal, the arithmetic unit may be assigned so that the difference in load amount between the processing regions is equal to or less than a predetermined threshold. The assignment of the arithmetic units will be described later.

The arithmetic processing unit 46 performs processing of the processing area by the arithmetic unit assigned by the arithmetic unit assigning unit 45. The process is a back projection process for generating image data from projection data in the first embodiment.

Next, the flow of the entire process of the X-ray CT apparatus 1 of the present invention will be described with reference to FIG.

First, the X-ray CT apparatus 1 performs positioning imaging on the subject 3. In the positioning imaging, the X-ray irradiation direction is fixed without rotating the gantry 100, and the X-ray dose transmitted through the subject 3 and the bed 101 is measured while moving the bed 101 at a predetermined speed. The X-ray CT apparatus 1 creates a positioning image based on transmitted X-ray data obtained by positioning imaging.

The central control device 200 accepts various condition settings such as shooting conditions and reconstruction conditions using the positioning image. Then, the central control device 200 executes the actual photographing based on the set various conditions. In the actual photographing, the gantry 100 is rotated to emit X-rays from each direction around the subject 3, and X-ray information transmitted through the subject 3 and the bed 101 is measured. Projection data is acquired by this actual photographing (step S101).

Next, the arithmetic unit 202 executes load distribution processing. In the load distribution process, the computing device 202 assigns each computing unit to the processing area so that the load balance of the plurality of computing units is equal (step S102).

In accordance with the assignment obtained in step S102, the arithmetic unit 202 performs divided image processing (step S103). In the divided image processing, parallel processing is performed using all the arithmetic units to generate an image. The computing device 202 displays the image generated in step S103 on the display device 211 (step S104).

Next, the load distribution process in step S102 will be described with reference to FIG.

The computing device 202 first acquires information on the number of created images and the processing block size (vertical and horizontal size) as parameters from the storage device 213. The processing block size is usually set in advance to an appropriate size according to the processor architecture or a size that facilitates processing. The arithmetic unit 202 calculates the total number of processed pixels from the acquired number of created images, and calculates the division number N by dividing by the processing block size (step S201).

The calculation of the division number N will be specifically described. The computing device 202 acquires the image size X × Y, the number of created images i, the image reconstruction position z, the number of computing units t, and the block size X × yb [pixel] from the storage device 213. The division number N is obtained by the following equation (1). The block size is variable, and may be set to 1 × 1, for example.

The computing device 202 calculates the usage data amount for each of the N processing blocks calculated in step S201 (step S202), and calculates the load amount of each processing block based on the usage data amount (step S203). In the case of back projection processing, for example, the usage data amount is calculated according to the procedure shown in FIG.

As shown in FIG. 9, the arithmetic unit 202 first obtains the coordinates of the end points of each divided area (block) (step S301). For example, as shown in FIG. 10, when the processing area (images i1, i2,...) Is divided into a plurality of blocks B1, B2, B3,..., B9, B10, B11,. To do. The coordinates of the pixels B11, B12, B13, B14 at the end points of the block B1 are as follows.

B11 [0, y, z]
B12 [X, y, z]
B13 [0, y + yb, z]
B14 [X, y + yb, z]
Similarly, the coordinates of the end points are calculated for all N blocks B2, B3,.

Next, the arithmetic unit 202 calculates a use data range (step S302). In the case of back projection, the use data range is a range on the projection data corresponding to the block range. In other words, the use data range on the projection data is obtained by calculating the coordinates on the projection data corresponding to the end point coordinates obtained in step S301.

In step S302, as shown in the flowchart of FIG. 11, the arithmetic unit 202 calculates the used ch coordinates (step S401) and calculates the used slice coordinates (step S402).

The used ch coordinate is the channel position (ch coordinate) of the coordinates on the projection data (used data) corresponding to the end point coordinates of the block. The used slice coordinates are slice positions (slice coordinates) of coordinates on the projection data (used data) corresponding to the end point coordinates of the block.

The arithmetic unit 202 calculates the used ch coordinate corresponding to the end point of each block. Although depending on the geometric system of the X-ray CT apparatus 1, for example, in the case of a parallel beam, as shown in FIG. 12, the X coordinate of the pixel i on the image is a scaling factor from the pixel size to the element size (ch). Using k1, the following equation (2) is calculated (step S401).

Next, the arithmetic unit 202 calculates used slice coordinates (step S402). In step S402, the arithmetic unit 202 determines the Y coordinate, Z coordinate of the pixel, the distance d from the X-ray tube 102 (X-ray focal position) to the center of the image, and the position of the pixel i on the image to the element size (Slice). Using the scaling coefficient k2, the used slice coordinates are calculated as in the following equation (3).

However, if the calculated coordinates do not exist, rounding may be performed as in the following equation (4).

The ch coordinate and slice coordinate on the projection data corresponding to each block end point can be calculated by the processing from step S401 to step S402. As shown in FIG. 14, the coordinates of the usage data of the end points of each block are represented as Bd11 [ch1, slice1], Bd12 [ch2, slice2], Bd13 [ch3, slice3], Bd14 [ch4, slice4]. And

Returning to the explanation of FIG. As described above, when the use data range corresponding to the block end point coordinates is calculated in step S302, the arithmetic unit 202 calculates the use data amount (step S303). In step S303, the arithmetic unit 202 extracts the maximum value and the minimum value in the channel direction and the slice direction from the use data range corresponding to the end point of each block as the use data amount in the block, as shown in Expression (5). To calculate.

Data usage =
(Ch maximum value-ch minimum value)
× (slice maximum value−slice minimum value) (5)
Thereafter, the arithmetic unit 202 calculates the load amount (step S203 in FIG. 8). In step S203, the arithmetic unit 202 calculates the load amount of each block as follows. The load amount can be expressed by the following equation (6) using a speed ratio coefficient α with the speed at which the unit usage data is transferred with respect to the processing speed of the unit processing area.

Load amount = processing area + α × data amount used (6)
Here, α is a coefficient depending on the data transfer performance and architecture of the processor, and depends on, for example, the bandwidth and the cache.

When the load amount is calculated as described above, the arithmetic device 202 assigns the arithmetic unit to each block so that the load amount of each arithmetic unit is equal based on the calculated load amount of each block (step S204). ).

The assignment of computing units in step S204 will be described with reference to the flowchart of FIG.

The computing device 202 first calculates an average load amount per computing unit (step S501). In step S501, the arithmetic unit 202 obtains the total load amount of all blocks from the following equation (7), and calculates the average load amount per arithmetic unit using the equation (8).

Total load =
Load amount of block B1 + load amount of block B2 +...
+ Load of block BN (7)
Average load amount = total load amount / number of arithmetic units t (8)
Next, the arithmetic unit 202 adds the load amount of each block in order from the block B1 according to the equation (9) (step S502).

Integrated value + = load amount [n] (9)
When the integrated value obtained in step S502 reaches the average load amount obtained in step S501, the block is defined as a process end block end_b, and a range in which one computing unit processes the process start block st_b to the process end block end_b. (Processing block) (step S503). This process is repeated to obtain a process start block and a process end block for each computing unit.

However, the integrated value of the load amount described above does not always reach the average load amount. In that case, for example, as shown in the following equation (10):
Integrated value>
Average load-Average load per block (10)
It is good. Here, average load amount per block = total load amount / number of blocks.

When processing blocks are allocated to the last computing unit, if processing blocks still remain, they are allocated to each computing unit one block at a time. When the number of remaining blocks is larger than the number of arithmetic units, the above load distribution processing may be repeated.

The computing device 202 obtains the coordinates of the processing block assigned in step S503 (step S504).

In step S504, specifically, end points [st_x, st_y] and [end_x, end_y] of the processing block are calculated. For example, the processing start coordinates and end coordinates are respectively
[St_x, st_y] = [0, st_b × yb]
[End_x, end_y] = [0, end_b × yb]
It is required as follows. As described above, the process start coordinates and the process end coordinates are determined.

In the load distribution process of FIG. 8, when the processing up to the assignment of the arithmetic units (step S203; steps S501 to S504) is completed, the process proceeds to the divided image processing of FIG. 7 (step S103).

In the divided image processing in step S103, each computing unit performs back projection processing on the pixels from the processing start coordinates determined in step S504 to the processing end coordinates. The value of each pixel is obtained by back projection processing, and an image is generated.

The computing device 202 displays an image on the display device 211 and ends a series of backprojection processing (step S104).

As described above, in the back projection process in which projection data is projected onto the image area that is the processing area, the load amount applied to each arithmetic unit used in the parallel processing is equally allocated. As a result, the time required for processing in each arithmetic unit is equalized, and high-speed image processing can be performed.

Note that the processing of the first embodiment is effective when the number of divisions (number of blocks) of the processing area is set larger than the number of computing units. In the back projection process, the processing area is an image, but a large unit processing area is not fixed and assigned. In the present invention, one image is divided into a plurality of small blocks and the arithmetic units are flexibly assigned. That is, the size of the processing area can be flexibly changed for each arithmetic unit, instead of making the processing area the same in all the arithmetic units. As a result, it is possible to assign the calculators in a balanced manner so that the load amounts are equal. Further, the method of dividing the processing area is arbitrary, and is not limited to the example of FIG.

[Second Embodiment]
The present invention can also be applied to forward projection processing. The forward projection process is a process for generating projection data of each detection element from image data using a detector area (projection data area) as a processing area. When a plurality of images (tomographic images) i1 to i7 are created as shown in FIG. 16, the detection elements s1, s2, s3,... Of the X-ray tube 102 (X-ray focal point) and the X-ray detector 103 are formed. ... The number of lines crossing the images i1, i2, i3,... Differs depending on the slice position of the image. For example, the detection element s1 intersects the images i3, i4, and i5. The detection element s2 intersects the images i1, i2, i3, i4. The detection element s3 intersects with the images i1, i2, and i3. That is, the closer to the center position from the X-ray tube 102, the smaller the number of used pixels, and the farther the element, the larger the number of used pixels.

And, since the image does not exist when the detection element position is away from the X-ray tube 102 (X-ray focal point) by a predetermined distance or more, the number of used pixels decreases. That is, even in the forward projection process for generating projection data from image data, the amount of data used to generate projection data for one detection element is not uniform but varies depending on the element position.

For this reason, in the forward projection processing as well, as in the first embodiment (back projection processing), the load balance of the arithmetic unit is based on the calculation amount (number of elements) of the arithmetic unit and the amount of data (number of pixels) to be used. If the computing units are distributed so as to be even, processing can be performed efficiently and at high speed. Hereinafter, the load distribution process in the forward projection process will be described.

First, the overall flow of the forward projection process will be described with reference to FIG.

In the forward projection processing, the arithmetic unit 202 acquires image data that is data used to generate projection data (step S601), and executes load distribution processing (step S602). In the load distribution process, the processing area (detection element) of each computing unit is allocated so that the load balance of the plurality of computing units is equal.

In accordance with the assignment obtained in step S602, the arithmetic unit 202 performs divided data processing (step S603). In the divided data processing, the processing area is forward projected using all the arithmetic units. That is, forward projection processing is performed using image data to calculate a projection value at each element to generate projection data. The arithmetic device 202 outputs (saves) the projection data generated in step S603 to the storage device 213 (step S604).

The load distribution process in step S602 is performed in the same procedure as in the first embodiment. That is, as shown in FIG. 18, in the load distribution process, the arithmetic unit 202 first divides the detector area, which is the processing area, into a plurality of blocks (step S701), and the amount of data (transparent pixels) used for the processing of the divided blocks. Number) is calculated (step S702). Then, the load amount of each block is calculated based on the use data amount calculated in step S702 (step S703), the number of arithmetic units assigned to each block is obtained so that the load amount is equal, and only the calculated number of arithmetic units is obtained. An arithmetic unit is assigned to each block (step S704).

In the processing region dividing step of step S701, the arithmetic unit 202 acquires, for example, the detector element size Ch × Sl, the number of arithmetic units t, and the block size Ch × 1 pixel from the storage device 213. Then, the division number N of the processing area is calculated from the following equation (11).

Next, in the step of calculating the used data amount of each block in step S702, the arithmetic unit 202 calculates the used data amount according to the procedure shown in FIG.

First, the arithmetic unit 202 calculates the coordinates of the end points of the block (step S801). In the process of step S801, the arithmetic unit 202 divides the processing area into blocks Rd1, Rd2,... RdN as shown in FIG. 20, and calculates end point coordinates (Rd11, Rd12,...) Of each block. In the case of forward projection, as shown in FIG. 21, it is assumed that the change in the amount of data used (transmission image) due to the ch position is small, and the end point coordinates of each block are set as Rd1 [0, sl1], Rd2 [0, sl2]... RdN [0, slN].

Next, the computing device 202 calculates a position (transmission coordinate) on the image corresponding to the end point of each block (step S802), and performs a range determination as to whether or not the calculated transmission coordinate is on the image (step S803). ). If the calculated transmission coordinates are on the image (step S803; Yes), the number of transmission pixels is integrated (step S804). When the transmission coordinates calculated in step S802 are not on the image (step S803; No), the number of transmission pixels is not added.

In the transmission coordinate calculation step in step S802, the arithmetic unit 202 first transmits the length L from the X-ray tube 102 to the corresponding element in the z-axis direction, the image interval p, and the corresponding element from the corresponding element toward the X-ray tube 102. The first transmission coordinates (coordinates on the image transmitted through the first sheet) I1 [of the corresponding element using the distance l to the image position to be performed and the length D from the X-ray tube 102 to the element immediately below the X-ray tube 102 y, z].

Specifically, it can be calculated as shown in Equation (12).

The transmission coordinates In [y, z] after that (transmission coordinates of the nth image) can be calculated as in Expression (13).

When the transmission coordinates are calculated in step S802, the arithmetic unit 202 determines whether or not the calculated transmission coordinates are on the image (step S803). In the determination in step S803, the arithmetic unit 202 uses the z coordinate z_min of the image closest to the X-ray tube position (X-ray focal position), the lower end y_min of the image, and the upper end y_max of the image to the following equation (14). If the conditions shown are satisfied, the number of transmissive pixels is integrated (number of transmissive pixels = number of transmissive pixels + 1).

As described above, the number of transmissive pixels at the end points of each block can be calculated.

The number of transmissive pixels at each end point can be obtained as Rd1 [number of transmissive pixels 1], Rd2 [number of transmissive pixels 2],... RdN [number of transmissive pixels N].

When the number of transmissive pixels at the end points of each block is calculated, the arithmetic unit 202 next calculates the amount of data used in the block (step S805).

The amount of data used in the nth block is obtained from the following equation (15) by extracting the value of the number of transparent pixels at the end point of the nth block.

Use data amount n = Rdn [number of transmission pixels] × Ch (15)
When the use data amount of each block is obtained, the arithmetic unit 202 performs the process of step S703 in FIG. The arithmetic device 202 calculates the load amount (step S703 in FIG. 18).

In the load amount calculation process in step S703, the arithmetic unit 202 calculates the load amount of each block, as in the first embodiment. Using the speed ratio coefficient α with the speed at which the unit usage data is transferred with respect to the processing speed of the unit processing area (block), it can be expressed as the above equation (6).

When the load amount is calculated as described above, the arithmetic unit 202 assigns a processing block to the arithmetic unit based on the calculated load amount (step S704 in FIG. 18).

The operation unit assignment step is the same as that in the first embodiment. Specifically, as shown in the flowchart of FIG. 15, the arithmetic unit 202 first calculates an average load amount (step S501). Next, the arithmetic unit 202 adds the load amount of each block from the first processing block B1 (step S502).

When the load amount obtained in step S502 (the integrated value of equation (9)) reaches the average load amount, the processing area from the processing start block st_b to the block end_b is set as a processing area of one computing unit (step S503). This is repeated to obtain the start block and end block of each computing unit.

When the processing up to the allocation of computing units (step S704) is completed, the load distribution processing is terminated, and the computing device 202 then shifts to the divided data processing of FIG. 17 (step S603).

In the divided data processing in step S603, each arithmetic unit performs forward projection processing from the determined processing start element to the processing end element. When the projection data is generated by the divided data processing, the arithmetic device 202 stores the generated projection data in the storage device 213 or the like, and ends a series of data processing (step S604).

As described above, even in a process in which the amount of data used differs depending on the element position, such as a forward projection process in which the value of each detection element is obtained by accumulating the value of the transmission pixel in the detector area that is the processing area, Data processing can be performed efficiently by assigning processing blocks so that the load amount of each arithmetic unit is equal. Thereby, high-speed image processing can be provided.

[Third Embodiment]
In the first and second embodiments, allocation of computing units has been described in the case where the number of processing blocks (the number of divisions of processing areas) is greater than the number of computing units. In the third embodiment, assignment of operation units when the number of operation units is larger than the number of processing blocks will be described. When the number of arithmetic units is larger than the number of processing blocks, for example, a single arithmetic unit straddles the processing of a plurality of images, such as a device made so that the processing block size is an image unit. This is the case when it cannot be performed.

Hereinafter, with reference to FIG. 22, processing for assigning arithmetic units in the third embodiment will be described. Each process (data acquisition, process area division process, use data calculation process, load amount calculation process, calculation process, etc.) other than the arithmetic unit assignment process is performed in the same manner as in the first or second embodiment. Shall.

In the third embodiment, the computing device 202 (calculator assigning unit 45) calculates the load amount ratio of each block (step S901), for example, as shown in the flowchart of FIG. 22, and based on the calculated ratio. The number of arithmetic units to be assigned to the processing area is determined (step S902).

In step S901, the arithmetic unit 202 calculates the total load amount of all blocks. The total load amount is the sum of the load amounts of the respective blocks, and can be calculated from the following equation (16).

Total load =
Load amount of block B1 + load amount of block B2 +...
+ Load of block BN (16)
Next, the arithmetic unit 202 obtains a load ratio that is a ratio of the load amount of each block as shown in Expression (17).

Load ratio =
[Load amount / total load amount of block B1, load amount / total load amount of block B2,
... load amount / total load amount of block BN] (17)
Then, the arithmetic unit 202 calculates the number of arithmetic units assigned to each block. The number of arithmetic units assigned to each block is determined from the following equation (18).

Number of arithmetic units in block Bn =
Total number of arithmetic units × block Bn load / total load (18)
The computing device 202 determines the number of computing units to be assigned to all blocks. Here, when the number of arithmetic units is determined by a decimal, the portion after the decimal point may be rounded down, and the remainder of the arithmetic units may be distributed again to the processing area according to the size of the value after the rounded down decimal point. However, when the number of computing units becomes less than 1, the number of computing units is 1.

As described above, in the third embodiment, the number of arithmetic units assigned to each block is calculated according to the load ratio of each block. Therefore, even when the processing block size is as large as an image unit and the number of arithmetic units is larger than the number of processing blocks, it is possible to assign the arithmetic unit loads as evenly as possible. Thereby, high-speed data processing can be performed.

[Fourth Embodiment]
In the first or second embodiment, the load amount calculation unit 44 (step S203 in FIG. 8 and step S703 in FIG. 18) uses a function (use data amount characteristic f ( z)), and the load amount may be calculated based on the use data amount characteristic f (z).

For example, in a conventional scan (also referred to as a normal scan or a step-and-shoot scan), the amount of data used varies depending on the image position (z position) as shown in FIG. An approximate curve indicating the relationship between the amount of used data and the z position is prepared in advance as, for example, a used data amount characteristic f (z), and is used for calculation of the calculator load.

In the case of the load amount calculating step of the first embodiment (step S203 in FIG. 8) or the load amount calculating step of the second embodiment (step S703 in FIG. 18), the arithmetic unit 202 is expressed by the following equation (19 ) And (20) can be used to calculate the load amount.

Data amount used = f (z) (19)
Load = processing area + α × f (z) (20)
As described above, in the fourth embodiment, the load amount of each block can be easily calculated using a function. Therefore, the calculation time of the load distribution process can be shortened, and more efficient data processing can be performed.

[Fifth Embodiment]
The size at which high-speed data reading is possible is determined according to the hardware. For example, if the data size to be read is large, it may be necessary to perform DMA transfer a plurality of times or to refer to a cache hierarchy with a low transfer rate. In the fifth embodiment, application to the case of using hardware whose read speed varies depending on the data size will be described.

Threshold values for data size (data_size) that can be transferred at high speed are Th1 and Th2, and speed ratio coefficients corresponding to the data size are c0 and c1. These values are stored in the storage device 213 in advance.

And, for example, the speed ratio coefficient α included in the above formula (6) and formula (20) is changed according to the following condition (formula (21)).

Thus, by calculating the speed ratio coefficient α according to the data size (data_size), the calculation accuracy of the load amount of the computing unit can be improved. Thereby, it is possible to improve the accuracy of equalizing the load on the arithmetic unit, and to speed up the image processing.

As described above, in the X-ray CT apparatus 1 (arithmetic apparatus 202) including a plurality of arithmetic units, the arithmetic apparatus 202 executes load distribution processing. In the load distribution process, the arithmetic unit 202 divides the processing area into a plurality of blocks, and calculates the amount of data used for the processing of each divided block.

Further, the arithmetic unit 202 calculates the load amount of each block based on the amount of data used, and assigns a processing area (block to be processed) to each arithmetic unit so that the load amount is equalized in each arithmetic unit. As a result, the load applied to each arithmetic unit is equalized, and data processing can be performed efficiently.

In the above-described embodiment, the back projection process and the forward projection process have been described. However, the present invention can also be applied to a successive approximation reconstruction process in which an image is generated by repeating back projection and forward projection.

[Example]
Taking back projection processing as an example, the result of load distribution processing is shown by applying specific numerical values. In the following description, the division of the processing area (step S201), the calculation of the used data amount (step S202), and the calculation of the load amount (step S203) will be described according to the procedure of the flowchart of FIG. For step S204), the process of the third embodiment (FIG. 22) is applied.

<Step S201 (Division of Processing Area (Image))>
The arithmetic unit 202 stores the image size “512 × 512”, the image reconstruction position “1, 3”, the number of created images “2”, the block size “512 × 512 [pixel]”, and the number of arithmetic units “20” from the storage device 213. To get.

The number of divisions (number of blocks) N is obtained from the above equation (1).

N = 512 × 512 × 2 / (512 × 512) = 2

<Step S202; Calculation of Usage Data Amount of Each Block>
The arithmetic unit 202 calculates the coordinates of the end points of the divided areas (step S301 in FIG. 9). First, the arithmetic unit 202 obtains the coordinates of each divided area.

The coordinates of the end points of the first block B1 are as follows, for example.

B11 [1,256,1]
B12 [512, 256, 1]
B13 [1, -256, 1]
B14 [512, -256, 1].

Also, the coordinates of the end points of the second block B2 are as follows, for example.

B21 [1,256,3]
B22 [512, 256, 3]
B23 [1, -256, 3]
B24 [512, -256, 3].

Next, the arithmetic unit 202 calculates the range of use data (step S302 in FIG. 9).

The computing device 202 calculates the coordinates of the projection data at the end points of each block.

Depending on the geometric system, for example, in the case of a parallel beam, usage data is calculated as shown in FIGS.

First, the arithmetic unit 202 calculates the used ch coordinate (step S401 in FIG. 11).

Using the x coordinate of the pixel, the scaling factor k1 (= 2.0) from the pixel size to the element size (ch),
Use ch of block B1 = maximum ch-minimum ch
= 512 × 2-1 × 2
= 1022
Further, the arithmetic unit 202 calculates the used slice coordinates (step S402 in FIG. 11).

Y = 256 at the upper end point of the image, z coordinate = 1, y = −256 at the lower end point of the image, z coordinate = 1, distance d = 500 from the X-ray tube 102 to the image center, and element size (slice) ) To the scaling coefficient k2 = 4.0, from the above equation (3) (slice = z × d / (dy) × k2),
Use slice of block B1
= Maximum slice-Minimum slice
= {1 × 500 / (500−256) −1 × 500 / (500 + 256)}
= 6 (value round up)
As described above, the number of ch and slice used in the projection data can be calculated.

Next, the arithmetic unit 202 calculates the amount of data used (step S303 in FIG. 9).

The amount of data used in the block is expressed by equation (5) as described above.

This
Use amount of data of B1 = 1022 × 6 = 6132
Is required.

<Step S203; Calculation of Load Amount of Each Block>
Next, the arithmetic unit 202 calculates the load amount of each block.

Assuming that the speed ratio coefficient α = 50 with the speed of transferring the unit usage data with respect to the processing speed of the unit processing area, from the above equation (6),
Load amount of block B1 = 512 × 512 + 50 × 6132
= 568744
Is required.

If the above calculation is similarly performed for the block B2,
Load amount of B2 = 1130844
Is obtained.

<Step S204: Assignment of computing units>
The computing unit assignment process is the same as that of the third embodiment (steps S801 and S802 in FIG. 17).

First, the arithmetic unit 202 calculates the load ratio of each block (step S801).

The total load of all blocks is
Total load = 567744 + 1130844 = 1699588
It is.

Load ratio = [(Block B1: 568744/1669988 =) 0.335, (Block B2: 1130844/1669988 =) 0.665]
The computing device 202 determines the number of computing units to be assigned to each block according to the load ratio (step S802).

Since the number of computing units in block Bn = total number of computing units × load amount n / total load amount,
The number of arithmetic units in block B1 is 20 × 0.335 = 6 (round down)
The number of arithmetic units in block B2 is 20 × 0.665 = 13 (round down)
In this case, the load amount per computing unit is 86988 (1130844/13) to 94790.67 (568744/6).

On the other hand, as in the conventional case as a comparative example, the load amount is calculated without using the amount of data used, and when the computing units are evenly distributed to each image,
Number of arithmetic units of block B1 = 20 × 0.5 = 10
Number of calculators in block B2 = 20 × 0.5 = 10
In this case, the load amount per computing unit is 56874.4 (568744/10) to 11308.44.4 (1130844/10).

The total processing time of image processing is the processing time of the computing unit with the greatest load.

When comparing the processing time of the conventional method and the method of the present invention,
113084.4 (maximum load amount in the conventional method) /9479.67 (maximum load amount in the present invention) = 1.19
It is.

By using the method of the present invention, the processing speed can be improved by about 19% compared to the conventional method.

The preferred embodiment of the present invention has been described above, but the present invention is not limited to the above-described embodiment. It will be apparent to those skilled in the art that various changes or modifications can be conceived within the scope of the technical idea disclosed in the present application, and these are naturally within the technical scope of the present invention. Understood.

1 X-ray CT device, 3 subjects, 10 scanners, 20 operation units, 100 gantry, 101 bed device, 102 X-ray tube, 103 X-ray detector, 104 collimator, 105 high voltage generator, 106 data collection device, 107 gantry Drive device, 200 central control device, 201 input / output device, 202 arithmetic device, 211 display device, 212 input device, 213 storage device, 221 reconstruction processing device, 222 image processing device, 41 data acquisition unit, 42 processing area division unit , 43 used data amount calculation unit, 44 load amount calculation unit, 45 arithmetic unit assignment unit, 46 arithmetic processing unit, i1, i2, ... image, s1, s2, ... detection element, D1, D2, ...・ Range on projection data, B1, B2, ... Block

Claims

An arithmetic device having a plurality of arithmetic units,
A data acquisition unit for acquiring data used for processing;
A processing area dividing unit for dividing the processing area into a plurality of blocks;
A use data amount calculation unit for obtaining a use data amount which is a data amount used for processing the block;
A load amount calculation unit that calculates a load amount of processing in each block based on the use data amount; an arithmetic unit assignment unit that allocates an arithmetic unit based on the load amount;
An arithmetic processing unit for performing the processing by the assigned arithmetic unit;
An arithmetic device comprising:
The arithmetic unit according to claim 1, wherein the arithmetic unit assigning unit assigns an arithmetic unit to each block so that a difference in load amount between the blocks is equal to or less than a predetermined threshold value.
3. The arithmetic apparatus according to claim 2, wherein the data used for the processing is projection data, and the processing is back projection processing for generating an image that is the processing region using the projection data.
The use data amount calculation unit calculates the use data amount by calculating a range of projection data corresponding to a pixel position of each block through which X-rays irradiated from an X-ray focal point pass. The arithmetic unit according to claim 3.
3. The arithmetic apparatus according to claim 2, wherein the data used for the processing is image data, and the processing is forward projection processing for forwardly projecting the image data onto a detector which is the processing region.
The usage data amount calculation unit calculates the usage data amount of the block by obtaining the number of images through which X-rays emitted from an X-ray focal point pass through the detection element position of the detector. Item 6. The arithmetic device according to Item 5.
3. The arithmetic device according to claim 2, wherein the load amount calculation unit further calculates the load amount based on a data transfer rate of an arithmetic unit.
The computing unit allocating unit calculates an average load amount per computing unit, integrates the load amounts of a plurality of blocks, and sets each computing unit so that the integrated value is within the average load amount per computing unit. The computing device according to claim 2, wherein a processing range to be assigned is determined.
The computing unit assigning unit calculates a load ratio of each block, determines the number of computing units to be assigned to the block based on the ratio, and assigns as many computing units as the determined number of computing units to each block. The arithmetic unit according to claim 2.
3. The arithmetic unit according to claim 2, wherein the arithmetic unit assigning unit assigns the arithmetic unit using a function representing a relationship between a position of the processing area and a used data amount.
An X-ray CT apparatus comprising the arithmetic unit according to claim 1.
In an arithmetic unit having a plurality of arithmetic units,
Obtaining data for processing; and
Dividing the processing region into a plurality of blocks;
Obtaining a used data amount that is a data amount used for processing the block;
Calculating a processing load amount in each block based on the use data amount;
Assigning an arithmetic unit to each block based on the load amount;
Performing the process by an assigned computing unit;
A data processing method comprising: