CN102831629A

CN102831629A - Graphic processor based mammary gland CT (Computerized Tomography) image reconstruction method

Info

Publication number: CN102831629A
Application number: CN2012103037600A
Authority: CN
Inventors: 郭境峰; 王海潮
Original assignee: SHANTOU DONGFANG ULTRASONIC TECHNOLOGY CO LTD
Current assignee: Shantou Ultrasonic Testing Technology Co., Ltd.
Priority date: 2012-08-23
Filing date: 2012-08-23
Publication date: 2012-12-19
Anticipated expiration: 2032-08-23
Also published as: CN102831629B

Abstract

The invention relates to a graphic processor based mammary gland CT (Computerized Tomography) image reconstruction method. graphic processor based mammary gland CT image reconstruction method sequentially comprises the following steps of: (1) receiving and transmitting data: transmitting a digital signal converted through an analog-to-digital converter from an analog signal detected by a scanning detection device into a system memory, and transmitting the digital signal inside the system memory into a video memory of the graphic processor; (2) reconstructing an imaging algorithm according to parallel limited-angles, and parallelly carrying out data operation in the graphic processor, wherein the data operation parallelly carried out in the graphic processor is iterative computation; (3) judging whether a computation result of the step (2) achieves an expected goal or not, if so, executing a step (4), and if not, returning to the step (2) again; and (4) transmitting the result of the iterative computation into the system memory for image postprocessing by the graphic processor. In the invention, reconstruction operation is carried out on the graphic processor, and therefore, the speed of the iterative computation can be greatly increased and the limited-angle mammary gland CT image reconstruction can be fast and accurately realized.

Description

Mammary gland CT image rebuilding method based on graphic process unit

Technical field

The present invention relates to image processing method, specifically, relate to a kind of mammary gland CT image rebuilding method based on graphic process unit.

Background technology

Mammary gland disease is that women's common disease, the particularly breast cancer incidence of disease in women's malignant tumour holds pride of place, and women's health is had great threat.

Traditional CT Medical Devices are when carrying out breast examination; Mostly adopting lets the patient lie at scanning bed mode of carrying out whole body or toposcopy; Scanning angle is often greater than 180 degree even reach 360 degree, and imaging mode is the complete data reestablishment imaging, and this mode not only need spend the more time and carry out image reconstruction; And make the patient receive more radiation dose, be unfavorable for that patient's body is healthy.

When scanning angle was spent less than 120, imaging mode was the limited angle reestablishment imaging.The special factor of considering radiation dose and picture contrast; Breast imaging generally adopts the limited angle reestablishment imaging; To reduce the radiation dose that the patient receives; But limited angle reestablishment imaging algorithm (like famous GP (Gerchberg-Papoulis) algorithm) iteration between Fourier space and image space needs more iterations just can obtain convergence preferably, also spends the more time simultaneously to carry out iterative computation; Have that speed of convergence is not good enough, image reconstruction speed waits problem slowly, reduced checking efficiency.

Summary of the invention

Technical matters to be solved by this invention provides a kind of mammary gland CT image rebuilding method based on graphic process unit; This mammary gland CT image rebuilding method can utilize graphic process unit easily and effectively, is implemented in the mammary gland CT image reconstruction under the limited angle rapidly and accurately.The technical scheme that adopts is following:

A kind of mammary gland CT image rebuilding method based on graphic process unit is characterized in that comprising the steps: successively

(1) Data Receiving and transmission: will be transferred to Installed System Memory through the converted digital signal of A/D converter (A/D) through system bus by the detected simulating signal of scanning detection apparatus of mammary gland CT equipment; Then according to the data volume of digital signal size application video memory, and the digital data transmission in the Installed System Memory in the video memory of graphic process unit (abbreviation GPU);

Above-mentioned digital signal is the signal that under limited angle reestablishment imaging mode, obtains.

Usually; After digital signal in the Installed System Memory being carried out successively processing such as data prediction (promptly utilizing level to dope the data for projection information of its adjacent part than the method that generates), FIR LPF; Apply for video memory according to the data volume size of digital signal again, and be transferred to data in the video memory of graphic process unit through the PCIEx16 interface.

CUDA (Compute Unified Device Architecture) the framework coding that the present invention preferentially adopts NVidia company to release; At first graphic process unit is carried out initialization, detect current graphic process unit and graphic process unit and drive the demand that whether meets the CUDA operation.And, before carrying out data transmission between Installed System Memory and the video memory, set up the CUDA environment: (a) download and install CUDA TooKit and CUDA SDK through following step; (b) in new project, comprise necessary CUDA header file, library file and chained library; (c) the nvcc compiler of loading CUDA, the nvcc compiler can convert the part of the graphic process unit program of CUDA to PTX code, becomes the program that can carry out in graphic process unit at last; (d) generate the file that suffix is called .cu, like this, in compiling, will compile the nvcc compiler that the file of suffix .cu by name is lost to CUDA, other file is then still compiled by the VC compiler.

(2) by parallel limited angle reestablishment imaging algorithm, the parallel data operation that carries out in graphic process unit;

The basic ideas of limited angle reestablishment imaging algorithm are iteration, and the data of limited angle imaging are limited frequency band in the Fourier space, the data that therefore can recover to lack with the GP algorithm.

Definition operator B and C, the process of GP iteration is in the limited angle reestablishment imaging algorithm:

B =?T _F ,?C =?FT _I?F ^?1 ?①

^0?= _{k ?}②

ⁱ⁺¹?=?C? _k?+?(I-CB) ⁱ ?③

Wherein the given data in Fourier space is defined as _k, total data is expressed as, and F representes Fourier conversion, F ¹Expression Fourier inverse transformation, T _IAnd T _FBe the two-valued function matrix of image space and frequency space, I is a unit matrix.

The GP iteration is finally with (1-λ _i) ⁿSpeed convergence arrive, wherein, { λ _iBe the eigenwert of CB, and 0<λ _i<1.

The parallel data operation that carries out is an iterative computation in graphic process unit, and iterative computation comprises the steps: (2-1) Fourier conversion and inverse transformation; (2-2) ask the eigenwert of spatial domain and frequency domain; (2-3) ask the eigenwert of operator B, C.Wherein:

Above-mentioned steps (2-1) comprises the steps: that specifically each stream handle of (2-1-1) graphic process unit receives data, that is to say that the digital signal data that the video memory of graphic process unit is received is assigned in each stream handle of graphic process unit; (2-1-2) one dimension Fourier conversion; (2-1-3) two-dimensional fourier transform; (2-1-4) TWO-DIMENSIONAL FOURIER inverse transformation; (2-1-5) result of calculation is write shared video memory.

In the step (2-1): before carrying out step (2-1-2) one dimension Fourier conversion; Through kernel design (i.e. nuclear design); Meet the warp launching condition when making GPU carry out one dimension Fourier transformation calculations; The cross-thread that assurance is subordinated to same warp need not carry out fence when communicating synchronous, thereby improve travelling speed; Before carrying out step (2-1-3) two-dimensional fourier transform; Pass through atomic operation; Guarantee when making a plurality of threads visit the same address of overall video memory or shared video memory simultaneously that but each thread can realize sharing the mutually exclusive operation of write data; Before thread complete operation, other any thread all can't be visited this address therein, thereby the speed of visit thread-data is improved; Before carrying out the inverse transformation of step (2-1-4) TWO-DIMENSIONAL FOURIER; Design through kernel; Make GPU carry out meeting when the TWO-DIMENSIONAL FOURIER inverse transformation is calculated the warp launching condition; The cross-thread that assurance is subordinated to same warp need not carry out fence when communicating synchronous, thereby improve travelling speed; Carry out step (2-1-5) result of calculation is write share video memory before; Send synchronic command; Guarantee that all threads in the same thread block all implement same position; Meeting operation suspension after wherein any thread runs to the synchronic command mark, threads all in whole thread block all run to same position, and whole thread block just can continue to carry out following statement.

Above-mentioned steps (2-2) specifically comprises the steps: (2-2-1) initializer B, C, comprises the video memory application of operator matrix and composes initial value, avoids null pointer; (2-2-2) use the cublas built-in function; (2-2-3) ask the spatial feature value; (2-2-4) ask the frequency domain character value; (2-2-5) result of calculation is write shared video memory.

In the step (2-2): before carrying out step (2-2-2) use cublas built-in function; Design through kernel; Make GPU carry out meeting when the spatial feature value is calculated the warp launching condition; The cross-thread that assurance is subordinated to same warp need not carry out fence when communicating synchronous, thereby improve travelling speed; Carrying out before step (2-2-3) asks the spatial feature value; Pass through atomic operation; When making a plurality of threads visit the same address of overall video memory or shared video memory simultaneously; Guarantee that but each thread can realize sharing the mutually exclusive operation of write data, other any thread all can't be visited this address before thread complete operation therein, thereby the speed of visit thread-data is improved; Through the asynchronous flow operation, make when GPU calculates that the host CPU thread needn't wait for and can carry out other calculating carrying out before step (2-2-4) asks the frequency domain character value, thereby make CPU and GPU carry out work simultaneously, the raising resource utilization; Carry out step (2-2-5) result of calculation is write share video memory before; Send synchronic command; Guarantee that all threads in the same thread block all implement same position; Meeting operation suspension after wherein any thread runs to the synchronic command mark, threads all in whole thread block all run to same position, and whole thread block just can continue to carry out following statement.

Above-mentioned steps (2-3) comprises the steps: that specifically (2-3-1) reads shared video memory variable, and promptly read step (2-2) writes the variable of sharing video memory after calculating and accomplishing; (2-3-2) finding the inverse matrix; (2-3-3) ask conjugate matrices; (2-3-4) obtain the eigenwert of operator B, C; (2-3-5) result of calculation is write shared video memory.

In the step (2-3): before carrying out step (2-3-2) finding the inverse matrix; Design through kernel; Make GPU carry out meeting the warp launching condition when inverse matrix is calculated, the cross-thread that guarantees to be subordinated to same warp need not carry out fence when communicating synchronous, thereby improve travelling speed; Carrying out before step (2-3-3) asks conjugate matrices; The privilege of access mark is made in the instruction of reading inverse matrix result of calculation; Make this instruction (promptly reading the instruction of inverse matrix result of calculation) share the limit priority of video memory visit, guarantee the fastest acquisition desired data and need not wait for privilege of access mark; Carrying out through asynchronous execution command, the calculating in the stream can being carried out simultaneously with the data transmission of another stream before step (2-3-4) obtains the eigenwert of operator B, C, improve resource utilization; Carry out step (2-3-5) result of calculation is write share video memory before, through offset alignment design, 4 byte-aligned or 8 byte-aligned that the alignment of data mode is calculated for meeting most GPU.

The calculation mechanism of graphic process unit is a concurrent operation mechanism; The suitable data operation that has identical calculations in a large number; That is to say that can imagine the CPU that becomes to have a plurality of (can reach tens to hundreds of) stream handle to graphic process unit, they can carry out computing simultaneously.The target of designs C UDA algorithm is to deliver to different stream handles to the data with identical calculations respectively to carry out computing, to practice thrift operation time.

(3) utilize the result of calculation of predetermined condition determination step (2) whether to reach re-set target, as reach then execution in step (4) of re-set target, carry out iterative computation otherwise come back to step (2).

(4) graphic process unit is sent to Installed System Memory with the result of iterative computation, carries out post processing of image.

Post processing of image can comprise log-compressed, window etc., output and showing after post processing of image.

The present invention utilizes the characteristics of the able to programme and high performance parallel computing of graphic process unit (GPU), on graphic process unit, rebuilds computing, can very large lifting iterative computation speed; Be implemented in the mammary gland CT image reconstruction under the limited angle rapidly and accurately; It is quicker that image is shown, the time of practicing thrift out is simultaneously carried out the more images aftertreatment, improves image display effect; It is more clear that image is shown, is more conducive to the detection of mammary gland disease.

Description of drawings

Fig. 1 is the overview flow chart of the preferred embodiment of the present invention;

Fig. 2 is the process flow diagram of step (1) Data Receiving and transmission;

Fig. 3 is the synoptic diagram of graphic process unit concurrent operation mechanism;

Fig. 4 is the process flow diagram of step (2-1) Fourier conversion and inverse transformation;

Fig. 5 is the process flow diagram that step (2-2) is asked the eigenwert of spatial domain and frequency domain;

Fig. 6 is the process flow diagram that step (2-3) is asked the eigenwert of operator B, C.

Embodiment

With reference to figure 1, this mammary gland CT image rebuilding method based on graphic process unit comprises the steps: successively

(1) Data Receiving and transmission: will be by the detected simulating signal of scanning detection apparatus of mammary gland CT equipment through the converted digital signal of A/D converter (A/D) (this digital signal is the signal that under limited angle reestablishment imaging mode, obtains); Be transferred to Installed System Memory through system bus; Then according to the data volume of digital signal size application video memory, and the digital data transmission in the Installed System Memory in the video memory of graphic process unit.With reference to figure 2; In the present embodiment; Digital data transmission is behind Installed System Memory; After digital signal in the Installed System Memory being carried out successively processing such as data prediction (promptly utilizing grade method to dope the data for projection information of its adjacent part), FIR LPF, apply for video memory according to the data volume size of digital signal again, and be transferred to data in the video memory of graphic process unit through the PCIEx16 interface than generation.

CUDA (Compute Unified Device Architecture) the framework coding that present embodiment adopts NVidia company to release; At first graphic process unit is carried out initialization, detect current graphic process unit and graphic process unit and drive the demand that whether meets the CUDA operation.And, before carrying out data transmission between Installed System Memory and the video memory, set up the CUDA environment: (a) download and install CUDA TooKit and CUDA SDK through following step; (b) in new project, comprise necessary CUDA header file, library file and chained library; (c) the nvcc compiler of loading CUDA, the nvcc compiler can convert the part of the graphic process unit program of CUDA to PTX code, becomes the program that can carry out in graphic process unit at last; (d) generate the file that suffix is called .cu, like this, in compiling, will compile the nvcc compiler that the file of suffix .cu by name is lost to CUDA, other file is then still compiled by the VC compiler.

The CUDA code that below between internal memory and video memory, exchanges for data:

B =?T _F ,?C =?FT _I?F ^?1 ?①

^0?= _{k ?}②

ⁱ⁺¹?=?C? _k?+?(I-CB) ⁱ ?③

With reference to figure 1, the parallel data operation that carries out is an iterative computation in graphic process unit, and iterative computation comprises the steps: (2-1) Fourier conversion and inverse transformation; (2-2) ask the eigenwert of spatial domain and frequency domain; (2-3) ask the eigenwert of operator B, C.Wherein:

With reference to figure 4; Step (2-1) (Fourier conversion and inverse transformation) comprises the steps: that specifically (2-1-1) receives data by the stream handle of graphic process unit; That is to say that the digital signal data that the video memory of graphic process unit is received is assigned in each stream handle of graphic process unit; (2-1-2) one dimension Fourier conversion; (2-1-3) two-dimensional fourier transform; (2-1-4) TWO-DIMENSIONAL FOURIER inverse transformation; (2-1-5) result of calculation is write shared video memory.

Below be the CUDA code of step (2-1) Fourier conversion and inverse transformation:

With reference to figure 5, step (2-2) (asking the eigenwert of spatial domain and frequency domain) specifically comprises the steps: (2-2-1) initializer B, C, comprises the video memory application of operator matrix and composes initial value; (2-2-2) use the cublas built-in function; (2-2-3) ask the spatial feature value; (2-2-4) ask the frequency domain character value; (2-2-5) result of calculation is write shared video memory.

Below ask the CUDA code of the eigenwert of spatial domain and frequency domain for step (2-2):

With reference to figure 6, step (2-3) (asking the eigenwert of operator B, C) comprises the steps: that specifically (2-3-1) reads shared video memory variable, and promptly read step (2-2) writes the variable of sharing video memory after calculating and accomplishing; (2-3-2) finding the inverse matrix; (2-3-3) ask conjugate matrices; (2-3-4) obtain the eigenwert of operator B, C; (2-3-5) result of calculation is write shared video memory.

Below ask the CUDA code of the eigenwert of operator B, C for step (2-3):

With reference to figure 3; The calculation mechanism of graphic process unit is a concurrent operation mechanism, and the suitable data operation that has identical calculations in a large number that is to say; Can imagine the CPU that becomes to have a plurality of (can reach tens to hundreds of) stream handle to graphic process unit, they can carry out computing simultaneously.

Claims

1. the mammary gland CT image rebuilding method based on graphic process unit is characterized in that comprising the steps: successively

(1) Data Receiving and transmission: will be transferred to Installed System Memory through the converted digital signal of A/D converter through system bus by the detected simulating signal of scanning detection apparatus of mammary gland CT equipment; Then according to the data volume of digital signal size application video memory, and the digital data transmission in the Installed System Memory in the video memory of graphic process unit;

Said digital signal is the signal that under limited angle reestablishment imaging mode, obtains;

B =?T _F ,?C =?FT _I?F ^?1 ?①

^0?= _k②

ⁱ⁺¹?=?C? _k?+?(I-CB) ⁱ ?③

Wherein the given data in Fourier space is defined as _k, total data is expressed as, and F representes Fourier conversion, F ¹Expression Fourier inverse transformation, T _IAnd T _FBe the two-valued function matrix of image space and frequency space, I is a unit matrix; The GP iteration is finally with (1-λ _i) ⁿSpeed convergence arrive, wherein, { λ _iBe the eigenwert of CB, and 0<λ _i<1;

The parallel data operation that carries out is an iterative computation in graphic process unit, and iterative computation comprises the steps: (2-1) Fourier conversion and inverse transformation; (2-2) ask the eigenwert of spatial domain and frequency domain; (2-3) ask the eigenwert of operator B, C;

(3) utilize the result of calculation of predetermined condition determination step (2) whether to reach re-set target, as reach then execution in step (4) of re-set target, carry out iterative computation otherwise come back to step (2);

2. the mammary gland CT image rebuilding method based on graphic process unit according to claim 1; It is characterized in that: in the step (1); Digital data transmission is behind Installed System Memory; After digital signal in the Installed System Memory carried out data prediction, FIR LPF successively, be transferred in the video memory of graphic process unit through the PCIEx16 interface again according to the data volume size application video memory of digital signal, and data.

3. the mammary gland CT image rebuilding method based on graphic process unit according to claim 1; It is characterized in that: in the step (1); Before carrying out data transmission between Installed System Memory and the video memory, set up the CUDA environment: (a) download and install CUDA TooKit and CUDA SDK through following step; (b) in new project, comprise necessary CUDA header file, library file and chained library; (c) the nvcc compiler of loading CUDA, the nvcc compiler converts the part of the graphic process unit program of CUDA to PTX code, becomes the program that can carry out in graphic process unit at last; (d) generate the file that suffix is called .cu.

4. the mammary gland CT image rebuilding method based on graphic process unit according to claim 1; It is characterized in that: step (2-1) comprises the steps: that specifically (2-1-1) receives data by the stream handle of graphic process unit; That is to say that the digital signal data that the video memory of graphic process unit is received is assigned in each stream handle of graphic process unit; (2-1-2) one dimension Fourier conversion; (2-1-3) two-dimensional fourier transform; (2-1-4) TWO-DIMENSIONAL FOURIER inverse transformation; (2-1-5) result of calculation is write shared video memory.

5. the mammary gland CT image rebuilding method based on graphic process unit according to claim 4 is characterized in that:

In the step (2-1): before carrying out step (2-1-2) one dimension Fourier conversion; Design through kernel; Meet the warp launching condition when making GPU carry out one dimension Fourier transformation calculations, the cross-thread that guarantees to be subordinated to same warp need not carry out fence when communicating synchronous; Before carrying out step (2-1-3) two-dimensional fourier transform; Pass through atomic operation; Guarantee when making a plurality of threads visit the same address of overall video memory or shared video memory simultaneously that but each thread can realize sharing the mutually exclusive operation of write data; Before thread complete operation, other any thread all can't be visited this address therein; Before carrying out the inverse transformation of step (2-1-4) TWO-DIMENSIONAL FOURIER; Design through kernel; Make GPU carry out meeting the warp launching condition when TWO-DIMENSIONAL FOURIER inverse transformation is calculated, the cross-thread that guarantees to be subordinated to same warp need not carry out fence when communicating synchronous; Carry out step (2-1-5) result of calculation is write share video memory before; Send synchronic command; Guarantee that all threads in the same thread block all implement same position; Meeting operation suspension after wherein any thread runs to the synchronic command mark, threads all in whole thread block all run to same position, and whole thread block just can continue to carry out following statement.

6. the mammary gland CT image rebuilding method based on graphic process unit according to claim 1 is characterized in that: step (2-2) specifically comprises the steps: (2-2-1) initializer B, C, comprises the video memory application of operator matrix and composes initial value; (2-2-2) use the cublas built-in function; (2-2-3) ask the spatial feature value; (2-2-4) ask the frequency domain character value; (2-2-5) result of calculation is write shared video memory.

7. the mammary gland CT image rebuilding method based on graphic process unit according to claim 6 is characterized in that:

In the step (2-2): before carrying out step (2-2-2) use cublas built-in function; Design through kernel; Make GPU carry out meeting the warp launching condition when spatial feature value is calculated, the cross-thread that guarantees to be subordinated to same warp need not carry out fence when communicating synchronous; Carrying out before step (2-2-3) asks the spatial feature value; Pass through atomic operation; When making a plurality of threads visit the same address of overall video memory or shared video memory simultaneously; Guarantee that but each thread can realize sharing the mutually exclusive operation of write data, other any thread all can't be visited this address before thread complete operation therein; Carrying out through the asynchronous flow operation, making that the host CPU thread needn't be waited for and can carry out other calculating when GPU calculates before step (2-2-4) asks the frequency domain character value; Carry out step (2-2-5) result of calculation is write share video memory before; Send synchronic command; Guarantee that all threads in the same thread block all implement same position; Meeting operation suspension after wherein any thread runs to the synchronic command mark, threads all in whole thread block all run to same position, and whole thread block just can continue to carry out following statement.

8. the mammary gland CT image rebuilding method based on graphic process unit according to claim 1 is characterized in that: step (2-3) comprises the steps: that specifically (2-3-1) reads shared video memory variable; (2-3-2) finding the inverse matrix; (2-3-3) ask conjugate matrices; (2-3-4) obtain the eigenwert of operator B, C; (2-3-5) result of calculation is write shared video memory.

9. the mammary gland CT image rebuilding method based on graphic process unit according to claim 8 is characterized in that:

In the step (2-3): before carrying out step (2-3-2) finding the inverse matrix; Design through kernel; Make GPU carry out meeting the warp launching condition when inverse matrix is calculated, the cross-thread that guarantees to be subordinated to same warp need not carry out fence when communicating synchronous; Carrying out the privilege of access mark being made in the instruction of reading inverse matrix result of calculation before step (2-3-3) asks conjugate matrices, make this instruction obtain to share the limit priority of video memory visit with privilege of access mark; Carrying out through asynchronous execution command, the calculating in the stream can being carried out simultaneously with the data transmission of another stream before step (2-3-4) obtains the eigenwert of operator B, C; Carry out step (2-3-5) result of calculation is write share video memory before, through offset alignment design, 4 byte-aligned or 8 byte-aligned that the alignment of data mode is calculated for meeting most GPU.