CN112991482B

CN112991482B - GPU-based rapid reconstruction imaging method and device and readable storage medium

Info

Publication number: CN112991482B
Application number: CN202110389801.1A
Authority: CN
Inventors: 褚政; 叶宏伟
Original assignee: Minfound Medical Systems Co Ltd
Current assignee: Minfound Medical Systems Co Ltd
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2023-03-24
Anticipated expiration: 2041-04-12
Also published as: CN112991482A

Abstract

The invention provides a rapid reconstruction imaging method based on a GPU (graphics processing Unit), equipment and a readable storage medium, which relate to the field of medical image processing and comprise the steps of obtaining data to be reconstructed, carrying out dimension conversion and obtaining first processing data; performing forward projection on the first processing data by adopting a first task set to obtain a first processing result, and under each task, performing multi-thread synchronous processing and storing the first processing result in a first shared memory address; carrying out reduction processing based on the first processing result to obtain a forward projection result; acquiring a projection ratio; performing back projection on the first processing data by adopting a second task set to obtain a second processing result, and performing multi-thread synchronous processing on each task and storing the result in a second shared memory address; carrying out reduction processing based on the second processing result to obtain a back projection result; and generating an image based on the forward projection result, the projection ratio and the back projection result and performing iterative reconstruction to obtain a target image, thereby solving the problem that the conventional image reconstruction consumes more time.

Description

GPU-based rapid reconstruction imaging method and device and readable storage medium

Technical Field

The invention relates to the field of medical image processing, in particular to a rapid reconstruction imaging method and device based on a GPU (graphics processing unit) and a readable storage medium.

Background

PET (positron emission tomography) is generally called positron emission computed tomography (PET), and is an important clinical examination imaging technology in the field of nuclear medicine. PET technology is a functional imaging technology that utilizes metabolism and receptors. The detection principle of PET is that a nuclide capable of emitting positrons is injected into a patient body as a specific tracer, the positrons emitted by the tracer and electrons in the human body are annihilated to generate a pair of gamma photons, the pair of gamma photons is recorded and stored by a detector, and the distribution condition of the tracer in the patient body can be obtained through image reconstruction, so that early detection and positioning of tumors are realized.

Reconstruction, PET, converts acquired data into diagnostic images that are recognizable to humans. The commonly used reconstruction methods at present include radon transformation, filtered back projection, iterative reconstruction and the like. The iterative reconstruction is the key point of PET equipment development because the iterative reconstruction can provide images with high resolution contrast and low noise, but in the existing reconstruction calculation process, due to the single-task processing, the calculation amount is large, and the reading and writing speed is slow, the image reconstruction consumes much time.

Disclosure of Invention

In order to overcome the technical defects, the present invention provides a GPU-based fast reconstruction imaging method, a GPU-based fast reconstruction imaging device, and a readable storage medium, which are used to solve the problem of the prior art that image reconstruction consumes a lot of time.

The invention discloses a rapid reconstruction imaging method based on a GPU, which comprises the following steps:

acquiring data to be reconstructed, and performing dimension conversion on the data to be reconstructed to acquire first processing data;

performing forward projection calculation on the first processing data by adopting a first task set to obtain a first processing result, and performing multi-thread synchronous processing by adopting a first line Cheng Jige under each task in the first task set;

storing the first processing result in a first shared memory address; wherein each task of the first task set and each thread of the first line Cheng Jige uniquely corresponds to a memory address of the first shared memory addresses;

performing reduction processing based on the first processing result under the first shared memory address to obtain a forward projection result;

acquiring a projection ratio according to the forward projection result and the data to be reconstructed;

performing back projection calculation on the first processing data by adopting a second task set to obtain a second processing result, and performing multi-thread synchronous processing on each task in the second task set by adopting a second line Cheng Jige;

storing the second processing result in a second shared memory address; wherein each task of the second set of tasks and each thread of the second line Cheng Jige uniquely corresponds to a memory address of the second shared memory address

Carrying out reduction processing based on the second processing result under a second shared memory address to obtain a back projection result;

and generating an image based on the forward projection result, the projection ratio and the backward projection result, and performing iterative reconstruction to obtain a target image.

Preferably, the performing forward projection calculation on the first processing data by using a first task set to obtain a first processing result, and performing multi-thread synchronous processing by using a first line Cheng Jige for each task in the first task set includes the following steps:

obtaining a set of pixel layers based on the first processed data;

performing two-dimensional projection calculation on the first processing data to obtain a geometric relation between a specific direction and each pixel layer in the pixel layer set as first basic data;

dividing the pixel layer set into a plurality of pixel sub-sets according to the number of tasks in the first task set;

under any task in a first task set, performing multi-thread synchronous calculation on a pixel subset by adopting a first thread set based on the first basic data and a preset first function, and accumulating calculation results corresponding to all threads to obtain a first intermediate result;

and sequentially executing each task in the first task set, and combining the calculation results corresponding to each task to obtain a first processing result.

Preferably, the performing multi-thread synchronous computation on a pixel subset by using a first thread set based on the first basic data and a preset first function, and accumulating computation results corresponding to each thread to obtain a first intermediate result includes the following steps:

acquiring dimension data with symmetry axis attributes according to the first processing data;

acquiring the weight of the first processing data under the dimension corresponding to the dimension data with the symmetry axis attribute, and acquiring a calculation result corresponding to each thread under each thread according to the product of the weight and the first basic data;

and accumulating the corresponding calculation results of all the threads to obtain a first intermediate result.

Preferably, the reducing the first processing result under the first shared memory address to obtain the forward projection result includes the following steps:

and performing dimensionality summation based on the first processing data by adopting a preset reduction algorithm under the first shared memory address to obtain a forward projection result.

Preferably, the performing a back projection calculation on the first processing data by using a second task set to obtain a second processing result, and performing a multi-thread synchronous processing by using a second line Cheng Jige for each task in the second task set, includes the following steps:

obtaining a set of directional features based on the first processed data;

performing two-dimensional projection calculation on the first processing data to obtain a geometric relation between a specific pixel layer and each direction feature in the direction feature set as second basic data;

dividing the direction characteristic set into a plurality of direction sub-sets according to the number of tasks in a second task set;

under any task in a second task set, based on the second basic data and a preset second function, a second line Cheng Jige is adopted to perform multi-thread synchronous calculation on a direction subset, and calculation results corresponding to all threads are accumulated to obtain a second intermediate result;

and sequentially executing each task in the second task set, and combining the calculation results corresponding to each task to obtain a second processing result.

Preferably, the processing the second processing result under the second shared memory address to obtain a back projection result includes the following steps:

and performing dimension addition based on the second processing result by adopting a preset reduction algorithm under the second shared memory address to obtain a back projection result.

Preferably, the performing the dimension conversion on the data to be reconstructed to obtain the first processed data includes the following steps:

acquiring a data set under multiple dimensions according to the data to be reconstructed;

carrying out data symmetry analysis on the data to be reconstructed to obtain a dimension with a symmetry axis attribute;

and storing and preposing the data set under the dimension with the symmetry axis attribute to obtain first processing data.

Preferably, before the forward projecting the first processed data synchronously by using the first task set and the second task set, the method further includes the following steps:

acquiring a preset memory address;

generating memory addresses associated with each task in the first task set and the second task set based on the preset memory address mapping, and gathering the memory addresses into a first shared memory address;

and generating memory addresses associated with each task in the third task set and the fourth task set based on the preset memory address mapping, and grouping each memory address into a second shared memory address.

The invention also discloses a computer device, which is characterized by comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the reconstruction imaging method when executing the computer program.

The invention also discloses a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned reconstruction imaging method.

After the technical scheme is adopted, compared with the prior art, the method has the following beneficial effects:

1. according to the method, the dimension conversion is carried out on the data with reconstruction, the forward projection calculation and the back projection calculation are carried out, the multi-thread parallel processing is adopted under the multi-task condition in the forward projection calculation and the back projection calculation, the target image is obtained through iterative reconstruction, the data to be reconstructed and the symmetry in the calculation process are reasonably utilized, the processing speed of the forward projection calculation and the back projection calculation in the image reconstruction process is effectively improved, and the problem that the image reconstruction in the prior art consumes more time is solved;

2. in the scheme, multiple tasks are adopted to uniformly divide and process first processed data in the forward projection calculation and the backward projection calculation processes, multiple threads are adopted to process a data segment in parallel under each task, each thread processes the same position of each piece of data under the data segment, calculation results of each thread are finally summed to obtain calculation results of the data segment, the calculation results are summed to obtain calculation results of each piece of data in the first processed data, reduction processing is executed to obtain a forward projection result and a backward projection result, the problem that calculation speed is slow due to inconsistent calling of LOR and the tasks in the prior art is solved, and meanwhile, the problem that the data processing speed is slow due to inconsistent calling of LOR and the tasks in the prior art is distinguished from an existing multi-thread processing mode (each thread processes one piece of data and multiple threads are parallel), and a machine and data symmetry are utilized.

3. By means of prepositioning partial dimension data, the sequential access speed of the data in the symmetric dimension is increased, and the speed of the data processing process is increased; meanwhile, one memory address is mapped to a plurality of memory addresses under multiple tasks, and a calculation result after each task is processed in the process of processing the multiple tasks is correspondingly written into the unique memory address, so that the condition that the calculation speed is influenced by concurrency in the process of accumulating the same memory position of the multiple tasks is reduced.

Drawings

FIG. 1 is a flowchart of a first embodiment of a GPU-based fast reconstruction imaging method according to the present invention;

fig. 2 is a flowchart of performing dimension conversion on the data to be reconstructed to obtain first processed data in a first embodiment of the rapid reconstruction imaging method based on the GPU according to the present invention;

fig. 3 is a flowchart of a GPU-based fast reconstruction imaging method according to a first embodiment of the present invention, before forward projecting the first processed data by using the first task set and the second task set synchronously;

fig. 4 is a flowchart of synchronously forward projecting the first processing data by using the first task set and the second task set to obtain a forward projection result in the first embodiment of the GPU-based fast reconstruction imaging method of the present invention;

FIG. 5 is a flowchart illustrating a calculation result obtained by performing a three-dimensional projection calculation on the pixel layer set based on the first basic data according to a first embodiment of the GPU-based fast reconstruction imaging method of the present invention;

fig. 6 is a flowchart illustrating that a third task set and a fourth task set are used to synchronously back-project the first processing data to obtain a back-projection result in the first embodiment of the GPU-based fast reconstruction imaging method according to the present invention;

FIG. 7 is a block diagram of a second embodiment of the GPU-based fast reconstruction imaging apparatus of the present invention;

fig. 8 is a schematic diagram of a hardware structure of a computer device according to a third embodiment of the present invention.

Reference numerals are as follows:

8. reconstruction imaging apparatus 81, pre-processing module 82, first forward projection processing module 83, second forward projection processing module 84, ratio calculation module 85, first back projection processing module 86, second back projection processing module 87, reconstruction module 9, computer device 91, memory 92, processor

Detailed Description

The advantages of the invention are further illustrated by the following detailed description of the preferred embodiments in conjunction with the drawings.

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if," as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are merely for convenience of description and simplicity of description, but do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.

The first embodiment is as follows: the embodiment provides a GPU-based fast reconstruction imaging method, which is used on a server side in a PET system to convert data acquired by a PET device into a diagnostic image recognizable for human beings, and in order to obtain a target image, the GPU-based fast reconstruction imaging method is generated based on the following reconstruction formula (1) in the form of maximum likelihood estimation:

wherein f is an image, i is a pixel label of the image, k is iteration times, m is a sequence number of a subset, and N is the total number of pixels of the image; j is the j-th element of the sinogram representing a spatial projection direction, LOR, lm representing the set of all j in the subset m; h _j，i Is a system matrix which reflects the geometric corresponding relation between a specific direction j and a pixel point i; s is a sensitivity matrix which is calculated by the formula

Obtained by calculation of the forward projection process in the following steps,

and &>

Calculated by the back-projection procedure,. And>

is the projected ratio. Specifically, referring to fig. 1, the present embodiment includes the following steps:

s100: acquiring data to be reconstructed, and performing dimension conversion on the data to be reconstructed to acquire first processing data;

in this embodiment, the data to be reconstructed is data acquired by PET, the data to be reconstructed is given in the form of a sinogram, the sinogram includes 4 dimensions, and a serial number of each dimension can be embodied by R, V, zi, and Zd, where R represents a radial coordinate set, V represents a rotation direction coordinate set, zi represents an axial center coordinate set, and Zd represents an axial start point and end point difference value set, and the data to be reconstructed is subjected to dimension conversion, mainly in order to realize quick access to the data, specifically, the data to be reconstructed is subjected to dimension conversion to obtain first processed data, which includes the following steps:

s110: acquiring a data set under multiple dimensions according to the data to be reconstructed;

as mentioned above, the data to be reconstructed includes at least four dimensions, and each dimension corresponds to a data set;

s120: carrying out data symmetry analysis on the data to be reconstructed to obtain a dimension with a symmetry axis attribute;

in the above steps, the main purpose of analyzing the data symmetry is to screen out a symmetrical data set, so that the data sequence access speed can be increased conveniently based on the data to be reconstructed and the symmetry in the calculation process, and the subsequent GPU calculation can be completed quickly.

S130: and storing and preposing the data set under the dimension with the symmetry axis attribute to obtain first processing data.

By way of example and not limitation, let a datum be denoted as S0 (R, V, zi, zd), after dimension conversion, as (Zi, zd, R, V) preceded by storage in two axial dimensions Zi, zd,

in addition to the dimension (X, Y, Z) being transformed into (Z, X, Y) for the image f and the sensitivity matrix, XYZ are three-dimensional spatial coordinate sets, respectively. In the forward and backward projection process, most calculations have symmetry along the Zi direction, the Zi of the data and the Zd dimension related to the Zi are advanced, and when a symmetrical variable can be accessed in a loop (namely an inner loop) of the first task set processing, the addressing range is shortened, so that the data processing speed is improved.

S200: performing forward projection calculation on the first processing data by adopting a first task set to obtain a first processing result, and performing multi-thread synchronous processing by adopting a first line Cheng Jige under each task in the first task set; storing the first processing result in a first shared memory address; wherein each task in the first task set and the first thread set uniquely corresponds to a memory address in the first shared memory address;

in the step, a first task set is used for distributing and processing a plurality of tasks and is set as an outer loop; the first line Cheng Jige contains multiple threads for synchronous processing, and is an inner loop implemented by GPU multithreading (including K for each task) ₂ Threads), that is, the same position data of m LOR is calculated under each thread, each thread is correspondingly stored under the unique address, the first intermediate result corresponding to each LOR is stored under K memory addresses, the calculation results obtained under all threads are summed to obtain the first intermediate result corresponding to each LOR, and the size of the calculation result is m × K ₂ The shared memory is used as the memory corresponding to each task. The first task set is used for outer loop, the inner loop and the outer loop sequentially process the first processing data, the processed result of each task data is stored in the corresponding unique memory address, and the memory address sets corresponding to all tasks are the first shared memory address (designated as ShareM) so as to be used for obtaining the final result by adopting a reduction algorithm subsequently. The problem of insufficient parallelism caused by processing one LOR or pixel by adopting a single thread in the prior art is solved by means of multi-task and multi-thread synchronous processing of a plurality of LORs under each task, the data processing speed is improved, meanwhile, each task corresponds to a unique memory address, the problem of concurrency caused by accumulation of multiple threads or tasks under the same memory address is reduced, and the data processing speed is further accelerated.

As described above, steps S200-S300 are used to obtain

Therefore, in the step S200, the forward projection calculation is performed on the first processing data by using the first task set to obtain a first processing result, and for each task in the first task set, the multithread synchronization processing is performed by using the first line Cheng Jige, referring to fig. 4, which includes the following steps:

s210: obtaining a set of pixel layers based on the first processed data;

known jE (R, V, zi, zd) n e (Z, X, Y), and obtaining H according to the following formula _j，n

H _j，n ＝H0(R，V，X，Y)*H1(R，V，Zi，Zd，Z) (2)

Wherein H0 (R, V, X, Y) represents H _j，n In the system matrix in the 2-dimensional plane, the LOR four-dimensional vector with H1 (R, V, zi, zd, Z) being (Zi, zd, R, V) is the weight of the Z-th pixel layer in the three-dimensional space.

And (X, Y) sequence number sets of all non-0 s are counted for each group [ R, V ], and are marked as (X ', Y'), namely the pixel layer sets in the step.

S220: performing two-dimensional projection calculation on the first processing data to obtain a geometric relation between a specific direction and each pixel layer in the pixel layer set as first basic data;

specifically, based on (X ', Y') obtained in the above step, two-dimensional values H0 corresponding to respective serial number positions are obtained _R，N (X ', Y') as the first basic data, and storing in the GPU video memory, the H0 _R，V (X ', Y') is a geometric correspondence between a specific direction (R, V) and a pixel point (X ', Y'), the geometric correspondence includes, but is not limited to, intercept, geometric projection, and other calculation methods, and each set relationship is obtained by a respective algorithm (available in the art).

S230: dividing the pixel layer set into a plurality of pixel sub-sets according to the number of tasks in the first task set;

in the above steps, the first task set executes outer loop, and as mentioned above, (X ', Y') has L sets of coordinates inside, and the L sets of coordinates are divided into K by the outer loop ₁ Fraction of, distributed over K ₁ On each task (i.e., each subset of pixels is distributed over a task), the results in ShareM are updated one coordinate at a time until all L sets of (X ', Y') coordinates have been computed.

S240: under any task in a first task set, performing multi-thread synchronous calculation on a pixel subset by adopting a first thread set based on the first basic data and a preset first function, and accumulating calculation results corresponding to all threads to obtain a first intermediate result;

as described above, let a pixel subset contain m LORs, and the first thread set contains multiple threads, i.e., GPU kernel K ₂ A warp thread for synchronously calculating m LORs, each thread synchronously processing the same position of the m LORs, where after starting, an outer loop is scheduled by a GPU kernel function, and an inner loop (GPU multithreading) is synchronously executed in the outer loop, specifically, after the inner loop finishes synchronously calculating m LORs (also one loop) in each thread, a first intermediate result is obtained, and then a first processing result corresponding to the pixel subset is obtained after the outer loop finishes, where the total calculation amount is the number of the outer loops × the number of the inner loops (K) ₁ ×K ₂ )。

For further explanation, it is understood that the first set of tasks includes K, assuming that P pixel layers are included in the first processed data ₁ A task, each task having K ₂ Thread, then each task processes P/K ₁ Segment pixel layer of each P/K ₁ Each segment pixel layer comprises m LORs, and K is adopted ₂ The threads synchronously carry out multi-thread synchronous processing on the m LORs, and meanwhile, the processing result of each thread is correspondingly stored under the memory address of the thread and is read from K ₂ Memory addresses (K) corresponding to threads ₂ In the memory address bank), the calculation results are obtained and added to obtain a first intermediate result corresponding to the m LORs, and due to the symmetry of the data, the data at the unique position in each pixel layer is processed in one thread.

Specifically, the performing multi-thread synchronous computation on a pixel subset by using a first thread set based on the first basic data and a preset first function, and accumulating computation results corresponding to each thread to obtain a first intermediate result includes the following steps, referring to fig. 5:

s241: acquiring dimension data with symmetry axis attributes according to the first processing data;

in the above step, the three-dimensional projection calculation process is calculated according to the above formula (2), and most calculations are symmetrical about Zi as described above, so that H1 (R, V, zi, zd, Z) in the above formula (2) is converted into H0 _R，V (Z (Zi, zd), X ', Y'), Z '(Zi, zd) is a set of Z's other than 0 selected by Zi and Zd.

S242: and acquiring the weight of the first processing data under the dimension corresponding to the dimension data with the symmetry axis attribute, and acquiring a calculation result corresponding to each thread under each thread according to the product of the weight and the first basic data.

Specifically, H1 (R, V, zi, zd, Z) in the above formula (2) is converted to H0 _R，V (Z (Zi, zd), X ', Y'), i.e., the calculation result is obtained according to the following formula (3): h _j，n ＝H0 _R，V (Z(Zi，Zd)，X′，Y′)*H0 _R，V (X′，Y′) (3)。

From the above, it can be seen that all H0 s are symmetric about Zi _R，V The (Z ' (Zi, zd), X ', Y ') values are the same, i.e. as follows:

H0 _R，V (Z′(1，Zd)，X′，Y′)＝H0 _R，V (Z′(2，Zd)，X′，Y′)＝…＝H0 _R，V (Z ' (Zi, zd), X ', Y '), then there is no need to calculate each H0 separately at this time _R，V (Z ' (Zi, zd), X ', Y ') values, which in turn allow for effective computation speed increases for each thread, assume that there are L sets of coordinates inside all (X ', Y ') sets of associated coordinate data. The inner loop uses K threads, each of which calculates an intermediate result H0 _R，V (Z′(Zd)，X′，Y′)*f _n (Z ' (Zi, zd), X ', Y ') (where fn is a known image and is a known quantity of the input), where each thread Ki computes the intermediate results of the m LORs and accumulates the results into ShareM Ki, m](first shared memory).

S243: and sequentially executing each task in the second task set, and combining the calculation results corresponding to each task to obtain a second processing result.

In the above step, since each thread calculates partial data of each pixel layer, the calculation results of each pixel layer obtained under each task, that is, the first intermediate result, can be obtained by accumulating the calculation results of each thread

S250: and sequentially executing each task in the first task set, and combining the calculation results corresponding to each task to obtain a first processing result.

In the above step, that is, as described above, each task processes the P/K1 segment pixel layers, the P/K1 segment pixel layers are calculated in sequence, the first intermediate result is stored in ShareM (i.e., in the first shared memory), and then updated in ShareM after each task of the first task set is completed, that is, the first processing result is obtained, which is equivalent to the intermediate calculation result in the forward projection process.

In the process, forward projection calculation is performed on each pixel layer by adopting a mode of multi-task step-by-step calculation in a first task set and a first thread set but multi-thread synchronous calculation under each task, each task processes partial data (sub-pixel layers) of each pixel layer step by step, multi-thread synchronous processing is performed on each sub-pixel layer under each task (the same position of a plurality of sub-pixel layers are processed synchronously under each thread), then the calculation results of each thread are added, each task is executed in sequence, and finally the complete processing result of each pixel layer is obtained as a first processing result by adding, so that the problem that parallelism is insufficient for one LOR or pixel by adopting a single thread in the forward projection process in the prior art is solved, the first task set and the second thread Cheng Jige adopted in the embodiment are different from the existing mode of multi-thread parallel processing of each pixel layer, the calculation result accuracy is ensured, the calculation speed of the forward projection process is improved, and the calculation result generated by each task is updated in ShareM (namely the first shared memory) in a mode of multi-thread parallel processing and/or multi-thread synchronous calculation, and the problem of conflict is further reduced.

In order to ensure the consistency of the memory addresses, before the forward projection calculation is performed on the first processing data by using the first task set, referring to fig. 3, the method includes the following steps:

s200-1: acquiring a preset memory address;

specifically, the memory address is used for presetting various types of data in the storage calculation process.

S200-2: generating memory addresses associated with the tasks in the first task set and the first thread set based on the preset memory address mapping, and gathering the memory addresses into a first shared memory address;

the first shared memory address may be implemented by a register, and a memory address associated with each task in the first task set and the first thread set is generated, that is, each thread corresponds to a unique thread under each task, that is, when K is adopted ₂ When m LORs are processed synchronously by each thread, each thread processes the same position of the m LORs synchronously and stores the same position in a memory address, and K is used for processing the same position of the m LORs ₂ The sum of the calculation results under each memory address is the first processing result corresponding to the LOR, and the GPU opens up the size of m multiplied by K ₂ In the above steps, the preset memory address is mapped into the memory addresses (from one to many) corresponding to a plurality of tasks, so that the problem of concurrency of the plurality of tasks in the accumulation process at the same memory position is reduced, further the problem of partial accumulation failure is caused, the problem of write conflict is avoided, and the calculation speed is improved.

S200-3: and generating memory addresses associated with each task in the second task set and the second thread set based on the preset memory address mapping, and gathering the memory addresses into a second shared memory address.

In the above steps, similar to the first task set and the first thread set, the second task set is used for executing an inner loop, and the second thread set is used for executing an outer loop, different in that the second task set and the second thread Cheng Jige are used for a back projection process, the first task set and the first thread Cheng Jige are used for a forward projection process, specifically, under each task of the second task set, kp threads are adopted to synchronously process mp LORs, and the GPU develops a shared memory with a size of Kp × mp as a memory corresponding to each task, so that a plurality of preset memory addresses are mapped, thereby reducing the situation of memory write conflicts in the forward and back projection processes, and meanwhile, unique memory positions corresponding to each task are obtained by mapping the preset memory positions, so that the consistency of data storage can be ensured while reducing the conflicts.

S300: carrying out reduction processing on the first processing result under the first shared memory address to obtain a forward projection result; specifically, S260: and performing dimensionality summation based on a first processing result by adopting a preset reduction algorithm under the first shared memory address to obtain a forward projection result.

In the above steps, the preset reduction algorithm is an existing GPU reduction (reduce) algorithm, and the sum of the first dimension of the register ShareM [ K, m ] in the forward projection process is quickly calculated through the reduction algorithm, so that m projection values can be obtained after the first cycle of each time is finished. All forward projection results can be completed after the first loop is called multiple times.

S400: acquiring a projection ratio according to the forward projection result and the data to be reconstructed;

the projection ratio in the above steps is

I.e. the ratio of the raw sinogram data, which is recorded as the data to be reconstructed, and the result of the forward projection.

S500: performing back projection calculation on the first processing data by adopting a second task set to obtain a second processing result, and performing multi-thread synchronous processing on each task in the second task set by adopting a second line Cheng Jige; storing the second processing result in a second shared memory address; wherein each task of the second set of tasks and each thread of the second line Cheng Jige uniquely corresponds to a memory address of the second shared memory addresses;

it should be noted that, similar to the forward projection process, the backward projection process adopts an outer loop (a second task set) and an inner loop (a second line Cheng Jige) for parallel processing, and then adds or updates the calculation result to be written into ShareM (a second shared memory).

For in the formula

And &>

I.e. a back projection process. Wherein->

Is ^ in 4-dimensional data>

(is provided with device for combining or screening>

)。

Specifically, in step S500, the back projection calculation is performed on the first processing data by using a second task set to obtain a second processing result, and for each task in the second task set, a second line Cheng Jige is used to perform the multi-thread synchronous processing, which is shown in fig. 6 and includes the following steps:

s510: obtaining a set of directional features based on the first processed data;

in the above step, specifically, all the (R, V) sequence number sets (R ', V') other than 0 are counted for each group (X, Y), which is the above direction feature set.

S520: performing two-dimensional projection calculation on the first processing data to obtain a geometric relation between a specific pixel layer and each direction feature in the direction feature set as second basic data;

specifically, from (X ', Y') obtained in the above step, two-dimensional values H0 corresponding to respective serial number positions of (R ', V') are obtained _X，Y (R ', V') as first base data and saved in video memory of GPU, H0 _XY (R ', V') is actually H0 in the above step S220 _R，V (X ', Y') representing the geometric relationship of all non-zero (R ', V') with XY, given XY, consistent with step S220, including, but not limited to, intercept, geometric projection, etc.

S530: dividing the direction feature set into a plurality of direction sub-sets according to the number of tasks in the second thread set;

in the above step, similar to step S230, the Lp set of coordinates of the outer loop is divided into Kp shares, distributed over Kp tasks, and the result in ShareM is updated every time one (R ', V') coordinate is calculated, until all the Lp set (X ', Y') coordinates are calculated.

S540: under any task in a second task set, performing multi-thread synchronous calculation on a direction subset by using a second line Cheng Jige based on the second basic data and a preset second function, and accumulating corresponding calculation results of all threads to obtain a second intermediate result;

in the above step, the second task set is an outer loop, the second task set comprises a plurality of tasks, and under each task, a multi-thread synchronization process is adopted, similar to the forward projection process of step S230, refer to the above steps S231-S232, but more specifically, the difference is that H1 (R, V, zi, zd, Z) in the above formula (2) is converted into H0 _X′，Y′ (R, V, zi (Z '), zd (Z '))), zi (Z '), zd (Z ') being a set of Zi and Zd other than 0 selected by Z ', the calculation can be obtained according to the following equation (4):

H _j，n ＝H0 _X′，Y′ (R，V，Zi(Z′)，Zd(Z′)))*H0 _X，Y (R′，V′) (4)。

the GPU kernel opens up Kp warp threads to perform back projection calculation on mp pixel points, the value coordinates of the mp pixel points are (X, Y, Z), and the mp pixel points have symmetry in the Z direction as described above, namely, when X, Y, R and V are the same, H0 is used _X′，Y′ (R，V，Zi(Z′ ₀ )，Zd(Z′ ₀ )))＝H0 _X′，Y′ (R，V，Zi(Z′ ₁ )，Zd(Z′ ₁ )))＝…＝H0 _x′，Y′ (R, V, zi (Z), zd (Z))), for all (R ', V') associated coordinate data sets, assuming an Lp set of coordinates internal, the inner loop computes intermediate results H0 using Kp threads _X′，Y′ (R，V，Zi(Z)，Zd(Z)))*P(Zi(Z _ki )，Zd(Z _ki ) R, V) (note that, calculation

When, P =1, calculate {>

In combination of time>

) Where each thread Ki computes mp pixels and accumulates the results into ShareM Ki, mp](second shared memory). Similar to step S240 above, the second set of tasks and the second line Cheng Jige are also in a contained parallel relationship.

S550: and sequentially executing each task in the second task, and combining the calculation results corresponding to each task to obtain a second processing result.

In the foregoing step, similar to step S250, each task in the second task set respectively processes a part of data of each directional feature set, each task is processed in a multi-thread synchronous manner, and each thread processes data at a unique position in a data segment of each directional feature set, so that the data processed by each thread and each task are summed to obtain a calculation result corresponding to each directional feature set, and the second intermediate result is stored in ShareM (i.e., a second shared memory), and is updated in ShareM after each task in the second task set is completed, that is, a second processing result is obtained, and is used as intermediate data in a back projection process, and the shared memory with a size Kp × mp of the GPU is marked as ShareM [ Kp, mp ] as a second shared memory. In the process, the second task set is adopted for multitasking, and the back projection calculation is performed on each pixel layer in a mode of multithread parallel data processing under each task, so that the calculation speed in the back projection process is improved, meanwhile, calculation results generated by each task can be added or updated in ShareM (namely a second shared memory), and the influence of the conflict problem on the calculation speed is further reduced.

S600: carrying out reduction processing on the second processing result under the second shared memory address to obtain a back projection result; specifically, step S560 is executed: and under the second shared memory address, performing dimensionality addition based on the second processing result by adopting a preset reduction algorithm to obtain a back projection result.

In the above step, the preset reduction algorithm is the existing GPU reduction (reduce) algorithm, and the sum of the first dimension of the register ShareM [ K, m ] in the back projection process is quickly calculated through the reduction algorithm to obtain m projection values, which are the forward projection results, consistent with the step S400.

S700: and generating an image based on the forward projection result, the projection ratio and the backward projection result, and performing iterative reconstruction to obtain a target image.

Obtained according to the above steps S200-S300

Sj is obtainable from the data to be reconstructed, and->

And &>

I.e. the formula (1) can calculate->

Namely the target image.

In the embodiment, the sequential access speed of the data under the dimension with the attribute of the symmetry axis is increased by performing dimension conversion on the data to be reconstructed (storing the data under the dimension with the attribute of the symmetry axis in a front position), then performing forward projection calculation and back projection calculation on the data after the dimension conversion, and performing multi-task (outer loop multi-task distribution calculation and multi-thread parallel calculation under each task) processing on the forward projection calculation and the back projection calculation, so that the problem of slow calculation speed caused by inconsistent calling of LOR and the tasks in the prior art is solved, the calculation speed is further increased by being different from the conventional multi-thread processing (each thread processes one data and multi-thread parallel), meanwhile, the calculation result after each task is correspondingly written into a unique memory address, the concurrency problem in the accumulation process of the same memory position of the multi-task is reduced, finally, the target image is obtained by iterative reconstruction according to the formula (1), the processing speed of the forward projection calculation and the back projection calculation in the image reconstruction process is effectively increased by reasonably utilizing the symmetry in the calculation process of the data to be reconstructed, and the reconstruction speed of the target image is further increased.

Example two: referring to fig. 7, the present embodiment provides a GPU-based fast reconstruction imaging device 8, which includes a preprocessing module 81, a first forward projection processing module 82, a second forward projection processing module 83, a ratio calculation module 84, a first back projection processing module 85, a second back projection processing module 86, and a reconstruction module 87. The concrete steps are as follows:

the preprocessing module 81 is configured to acquire data to be reconstructed, perform dimension conversion on the data to be reconstructed, and acquire first processed data;

a first forward projection processing module 82, configured to perform forward projection calculation on the first processing data by using a first task set to obtain a first processing result, and perform multi-thread synchronous processing on each task in the first task set by using a first line Cheng Jige; storing the first processing result in a first shared memory address; wherein each task of the first task set and each thread of the first line Cheng Jige uniquely corresponds to a memory address of the first shared memory addresses;

a second forward projection processing module 83, configured to perform reduction processing on the first processing result under the first shared memory address to obtain a forward projection result;

a ratio calculation module 84, configured to obtain a projection ratio according to the forward projection result and the data to be reconstructed;

the first back projection processing module 85 is configured to perform back projection calculation on the first processing data by using a second task set to obtain a second processing result, and perform multi-thread synchronous processing on each task in the second task set by using a second line Cheng Jige; storing the second processing result in a second shared memory address; wherein each task of the second set of tasks and each thread of the second line Cheng Jige uniquely corresponds to a memory address of the second shared memory addresses;

a second back projection processing module 86, configured to perform reduction processing on the second processing result under the second shared memory address to obtain a back projection result;

and the reconstruction module 87 is configured to generate an image based on the forward projection result, the projection ratio and the backward projection result and perform iterative reconstruction to obtain a target image.

In this embodiment, a preprocessing module 81 performs dimension conversion on data to be reconstructed to pre-store part of dimension data, then a first forward projection processing module 82 and a second forward projection processing module 83 are adopted to perform forward projection calculation on the dimension-converted data in parallel based on a first line Cheng Jige in a multi-thread manner under each task of a first task set, then a projection ratio is obtained from a forward projection result and the data to be reconstructed in a ratio calculation module 84, then a second task set and a second line Cheng Jige in a first backward projection processing module 85 and a second backward projection processing module 86 are used to perform backward projection calculation on the dimension-converted data in a multi-thread manner, and meanwhile, each task processing result is stored in a unique memory thereof in a corresponding manner and stored in a first shared memory and a second shared memory in a set, and finally a reconstruction module 87 is adopted to perform reconstruction iteration based on a reconstruction likelihood formula (1) having a maximum estimation form as one embodiment to obtain a target image.

In the executing process, when the symmetric variables are accessed in the task executing process through dimension conversion, the addressing range is shortened, so that the data reading speed is increased, the symmetry of the data to be reconstructed and the calculating process is effectively utilized to achieve the effect of increasing the calculating speed, the forward projection and the back projection are executed in a multi-thread parallel processing mode under multiple tasks, compared with the single thread processing in the prior art, the method is also different from the conventional multi-thread parallel processing mode (one thread processes one piece of data, and multiple threads process multiple pieces of data in parallel), the calculating speed of the forward projection process and the back projection process is greatly increased, the symmetry in the calculating process is utilized, the inner loop multi-thread parallel processing is arranged under each outer loop, the multiple pieces of data are processed at the same position under one thread, the calculating speed is increased, and besides the above, in the multi-task processing process, each task and each thread are correspondingly stored in the unique inner memory under the first shared memory or the second shared memory, the concurrence in the same memory position accumulation process of the multiple tasks is effectively avoided, and the problem of partial accumulation failure is caused.

Example three: in order to achieve the above object, the present invention further provides a computer device 9, referring to fig. 8, the computer device may include a plurality of computer devices, the components of the GPU-based fast reconstruction imaging apparatus 8 according to the second embodiment may be distributed in different computer devices 9, and the computer device 9 may be a smartphone, a tablet computer, a laptop computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including an independent server or a server cluster formed by a plurality of servers) for executing programs, and the like. The computer device of the embodiment at least includes but is not limited to: a memory 91, a processor 92, and a fast reconstruction imaging device 8 caching GPUs that may be communicatively coupled to each other via a system bus, as shown in fig. 8. It should be noted that fig. 8 only shows a computer device with components, but it should be understood that not all of the shown components are required to be implemented, and more or fewer components may be implemented instead.

In this embodiment, the memory 91 may include a program storage area and a data storage area, wherein the program storage area may store an application program required for at least one function of the system; the storage data area can store skin data information of a user on the computer device. Further, the memory 91 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 91 optionally includes memory 91 located remotely from the processor, and these remote memories may be connected to the PET system via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Processor 92 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 92 is typically used to control the overall operation of the computer device. In this embodiment, the processor 92 is configured to execute the program code stored in the memory 91 or process data, for example, execute the reconstruction imaging apparatus 8, so as to implement the reconstruction imaging method according to the first embodiment.

It is noted that fig. 8 only shows the computer device 9 with components 91-92, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.

In this embodiment, the reconstruction imaging apparatus 8 stored in the memory 91 may be further divided into one or more program modules, and the one or more program modules are stored in the memory 91 and executed by one or more processors (in this embodiment, the processor 92) to complete the present invention.

Example four:

to achieve the above objects, the present invention also provides a computer-readable storage medium including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor 92, implements corresponding functions. The computer readable storage medium of this embodiment is used for storing the reconstruction imaging device 8, and when being executed by the processor 92, the computer readable storage medium implements the GPU-based fast reconstruction imaging method of the first embodiment.

It should be noted that the embodiments of the present invention have been described in terms of preferred embodiments, and not by way of limitation, and that those skilled in the art can make modifications and variations of the embodiments described above without departing from the spirit of the invention.

Claims

1. A rapid reconstruction imaging method based on a GPU is characterized by comprising the following steps:

acquiring data to be reconstructed, performing dimension conversion on the data to be reconstructed, and storing and prepositioning a data set under a dimension with a symmetry axis attribute to acquire first processing data;

calculating the data at the same position of a plurality of LORs under each first thread, summing the calculation results obtained under all the threads to obtain a first intermediate result corresponding to each LOR, sequentially executing each task and combining to obtain a first processing result;

carrying out reduction processing based on the first processing result under a first shared memory address to obtain a forward projection result;

acquiring direction feature sets based on the first processing data, processing data at unique positions in data segments of the direction feature sets by each second thread, adding the processing data of each thread and each task to acquire a calculation result corresponding to each direction feature set, and updating the second task set under a second shared memory address after each task of the second task set is completed to acquire a second processing result;

storing the second processing result in a second shared memory address; wherein each task of the second set of tasks and each thread of the second line Cheng Jige uniquely corresponds to a memory address of the second shared memory addresses;

2. The reconstruction imaging method according to claim 1, wherein the forward projection calculation is performed on the first processed data by using a first task set to obtain a first processing result, and for each task in the first task set, a multi-thread synchronous processing is performed by using a first line Cheng Jige, which includes the following steps:

obtaining a set of pixel layers based on the first processed data;

3. The reconstruction imaging method according to claim 2, wherein the performing multi-thread synchronous computation on a pixel subset by using a first thread set based on the first basic data and a preset first function, and accumulating computation results corresponding to respective threads to obtain a first intermediate result comprises:

acquiring the weight of the first processing data under the dimensionality corresponding to the dimensionality data with the symmetry axis attribute, and acquiring a calculation result corresponding to each thread under each thread according to the product of the weight and the first basic data; and accumulating the corresponding calculation results of all the threads to obtain a first intermediate result.

4. The reconstruction imaging method according to claim 2, wherein the reducing the first processing result under the first shared memory address to obtain a forward projection result includes:

5. The reconstruction imaging method as claimed in claim 1, wherein the first processed data is back-projected with a second set of tasks to obtain a second processed result, and for each task in the second set of tasks, a second line Cheng Jige is used for multi-thread synchronous processing, comprising the following:

obtaining a set of directional features based on the first processed data;

dividing the direction feature set into a plurality of direction sub-sets according to the number of tasks in the second task set;

under any task in a second task set, performing multi-thread synchronous calculation on a direction subset by using a second line Cheng Jige based on the second basic data and a preset second function, and accumulating corresponding calculation results of all threads to obtain a second intermediate result;

6. The reconstruction imaging method according to claim 5, wherein the processing the second processing result under the second shared memory address to obtain a back projection result comprises the following steps:

and under the second shared memory address, performing dimensionality addition based on the second processing result by adopting a preset reduction algorithm to obtain a back projection result.

7. The reconstruction imaging method according to claim 1, comprising, before the dimension converting the data to be reconstructed, the following:

and carrying out data symmetry analysis on the data to be reconstructed to obtain the dimensionality with the symmetry axis attribute.

8. The reconstruction imaging method according to claim 1, further comprising, before said performing forward projection calculation on said first processed data using said first set of tasks:

acquiring a preset memory address;

generating memory addresses associated with the tasks in the first task set and the first thread set based on the preset memory address mapping, and gathering the memory addresses into a first shared memory address;

and generating memory addresses associated with the tasks in the second task set and the second thread set based on the preset memory address mapping, and grouping the memory addresses into a second shared memory address.

9. A computer device, characterized in that the computer device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the reconstruction imaging method according to any one of claims 1 to 8 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the reconstruction imaging method as set forth in any one of the preceding claims 1 to 8.