CN102750727B

CN102750727B - Access memory method for realizing shear wave data three-dimensional visualization by aiming at parallel volume rendering

Info

Publication number: CN102750727B
Application number: CN201210231757.2A
Authority: CN
Inventors: 刘金硕; 王丽娜; 程力; 尹晓丹; 郑勇
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2012-07-05
Filing date: 2012-07-05
Publication date: 2014-11-19
Anticipated expiration: 2032-07-05
Also published as: CN102750727A

Abstract

The invention provides an access memory method for realizing shear wave data three-dimensional visualization by aiming at parallel volume rendering. According to the access memory method, four-based access memory optimization strategies including texture memory strategies, global memory strategies, global and share memory strategies, constant memory strategies designed by aiming at a visual angle matrix frequently invoking shear wave data, and rendering quality evaluation method and criterion are provided. Through the access memory method, the rendering efficiency for realizing the shear wave data three-dimensional visualization by adopting a volume rendering method on the basis of a CUDA (compute unified device architecture) can be improved. Therefore, the access memory method perfectly helps scientific research personnel to accurately know the plate tectonics, the dynamics process, the earthquake generation mechanism and the like, and further, the fields of geological exploration, continental plates, seismic interpretation and the like are favorably developed.

Description

Memory access method for realizing shear wave data three-dimensional visualization aiming at parallel volume rendering

Technical Field

The invention relates to the technical field of visualization, in particular to a memory access method for realizing three-dimensional visualization of shear wave data aiming at parallel volume rendering.

Background

Seismic waves are the only waves known to us today that are capable of penetrating the earth's interior. When the fault is broken, the rock bodies on two sides of the fault are relatively moved, and a large amount of energy is released. Most of the energy overcomes the friction force to do work, so that the energy is converted into heat energy; and the other part is propagated to the surrounding crust by taking the seismic source as the center in the form of elastic waves. Wherein, the compression wave formed along with the thrust generated by the rapid movement of rock masses at two sides of the fault is called P wave; the elastic waves formed by relative displacement along the fault plane are called shear-waves (S-waves) for short. The seismic waves are emitted by the seismic source and then propagate through the medium in all directions, and reflection, refraction and scattering also occur in the process of propagation. Because various underground mineral resources have different characteristic attributes in structure, the structure, depth, form and the like of the underground rock stratum can be accurately inferred by analyzing the arrival time and form of seismic waves received on the earth surface, so that the investigation of the mineral resources is realized.

Because the shear wave is only controlled by the shear modulus and the density structure, and the magnitude of the shear modulus is in an exponential relation with the temperature of a substance, scientific researchers can be helped to accurately know the plate structure, the dynamic process, the earthquake-induced mechanism and the like through the research on the shear wave, so that more underground structure and geodynamic information can be provided for the three-dimensional visualization of the shear wave, and the development of the fields of geological exploration, continental plates, earthquake explanation and the like is facilitated.

The current research on shear waves is mainly focused on the processing and inversion of data, and the interpretation and elucidation of seismic shear waves is mainly based on qualitative results. However, at present, the main image display is mainly two-dimensional static, and the software mainly used is GMT (general Mapping Tools, Wessel, 2012), which shows the result on each layer or each section on a plane by giving the speed size on the layer, and cannot give the overall three-dimensional frame-like state. This poses certain difficulties for three-dimensional spatial presentation and understanding. If a three-dimensional visual image, particularly a dynamic three-dimensional shear wave evolution image, can be quantitatively given, the understanding and comprehension of continental deformation, dynamic evolution and earthquake pregnancies mechanism can be greatly increased.

Since the eighties of the twentieth century, seismic data has expanded rapidly in scale, from tens of thousands of data in the past, to millions of data today. Seismic attributes are also increasingly being used in research. With the wide application of visualization technology in the field of seismic exploration, a great deal of related visualization research work appears.

Kidd performs three-dimensional modeling on seismic data for the first time, and combines stratum and depth data through a direct integration construction method, so that specific details of a small area are more finely shown, and higher-quality seismic data interpretation is provided in a short time. Castanie, L, et al, designed a volume data roaming system for oil and gas exploration by integrating several giga volumes of volume data based on along-volume data and efficient volume scoring, and using a general multi-modal volume rendering framework to present high quality rendering, and finally using an interactive roaming method to display the rendering results. Huw James et al have conducted intensive studies on the display quality of seismic data by adding an illumination model, and introducing the illumination model into a volume rendering model can distinguish volume data in a high light and shadow manner, thereby absorbing the detailed spatial relationship between the internal structure and data coupling.

Although the three-dimensional visualization technology is widely applied in the fields of earthquake, exploration and the like, the traditional volume rendering algorithm is complex and large in calculated amount, and needs researchers to have a certain visualization professional foundation and grasp the hardware characteristics of the GPU and render rendering and rendering programming languages, so that the researchers use the three-dimensional visualization method to observe the seismic data to cause certain difficulty, and the researchers in the current seismic field can analyze the seismic volume data by means of a plurality of layers and sections. Although the early Cg-language-based GPU architecture can bring significant efficiency improvement to the rendering algorithm, the GPU programming model still processes texture data, and therefore when the architecture is used for seismic data visualization, the seismic data needs to pass through a special data structure, and a visualization tool only suitable for the model is developed according to the limitation of the GPU hardware architecture. Because the transportability is poor and the requirement on developers is high, the three-dimensional visualization difficulty of the seismic data is not greatly changed.

The CUDA (unified computing device) is a general GPU computing product promoted by NVIDIA, the programming is simpler, the function is stronger, the architecture system is more suitable for general GPU computing, the three-dimensional visualization algorithm realized based on the CUDA better solves the defects existing in the acceleration of the traditional GPU, and the high real-time interaction speed and good drawing quality are achieved. However, the difficulty is the improvement of the memory access efficiency based on the CUDA architecture during drawing, and further the operation efficiency of the program is further improved.

Disclosure of Invention

The invention provides a memory access method for realizing three-dimensional visualization of shear wave data aiming at parallel volume rendering aiming at the characteristics of volume data and a ray projection algorithm.

The technical scheme of the invention is a memory access method for realizing shear wave data three-dimensional visualization aiming at parallel volume rendering, which realizes the shear wave data three-dimensional visualization based on a CUDA architecture, wherein the CUDA architecture comprises a texture memory, a global memory, a shared memory and a constant memory, a data object of the memory access comprises three-dimensional texture of volume data, volume data and a view angle matrix,

when the three-dimensional texture of the volume data needs to be accessed and stored, a texture memory strategy is adopted, the three-dimensional texture of the volume data is built by utilizing a texture register, and extra floating point processing capacity is provided in the process of picking up texture coordinates through a hardware interpolation filtering process;

when the realization mode of accessing and storing the volume data is needed, a global memory strategy is adopted, including establishing the volume data by using a global memory; in the process of traversing volume data, obtaining values of sampling points through trilinear interpolation of 8 voxels around the sampling point as the center; when the merged access condition is met, performing merged access on the global memory, and when non-merged access occurs to the accessed global memory, adjusting the access sequence of threads in the Half-warp by using the shared memory as a buffer area, and optimizing the non-merged access to the global memory into the merged access;

when the view angle matrix needs to be accessed and stored, a constant memory strategy is adopted, wherein the constant memory strategy comprises the steps of obtaining the view angle matrix from an OpenGL built-in model matrix and copying the view angle matrix to a constant memory; and directly calling the view angle matrix when the ray needs to be calculated with the view angle matrix.

And the smoothness of the edge curve is measured by the variance value of the first derivative, and the smoothness is used as a judgment criterion of the picture quality of the shear wave data three-dimensional visualization.

The invention designs a texture memory strategy, a global + shared memory strategy and a constant memory strategy, and realizes the three-dimensional visual access optimization of shear wave data in parallel volume rendering.

Drawings

FIG. 1 is a flow chart of texture creation according to an embodiment of the present invention.

FIG. 2 is a flow chart of texture fetching according to an embodiment of the present invention.

Fig. 3 is a flow chart of the establishment of volume data in the global memory according to the embodiment of the present invention.

Fig. 4 is a flow chart of sampling volume data in the local memory according to the embodiment of the present invention.

FIG. 5 is a schematic diagram of float3 type data merge access according to an embodiment of the present invention.

FIG. 6 is a sampling flow diagram of a global + sharing policy according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention can be automatically operated by combining computer software technology by persons skilled in the art. The technical scheme of the invention is explained in detail in the following by combining the drawings and the embodiment.

The invention provides a memory access method for realizing shear wave data three-dimensional visualization aiming at parallel volume rendering, which realizes the shear wave data three-dimensional visualization based on a CUDA (compute unified device architecture) architecture, wherein the CUDA architecture comprises a texture memory, a global memory, a shared memory and a constant memory, and a data object of the memory access comprises three-dimensional texture of volume data, the volume data and a view angle matrix. The memory access optimization is realized aiming at the establishment of three-dimensional texture of volume data, the establishment and traversal of the volume data, the problem of nonalignment and interval access of access time periods and the access of a visual angle matrix in the shear wave data drawing process.

The method for realizing the access and storage of the three-dimensional texture of the volume data comprises the steps of establishing the three-dimensional texture of the volume data by using a texture register and providing extra floating point processing capacity in the process of picking up texture coordinates through a hardware interpolation filtering process.

When the three-dimensional texture of the volume data needs to be established to access the texture memory, the texture memory strategy is adopted. When the volume rendering algorithm needs a large amount of fine calculation to obtain higher result image quality, a texture memory strategy is adopted to support the frequent reading of the volume data.

The texture memory strategy designed by the embodiment is as follows:

in the embodiment, a specific scheme for storing a three-dimensional Texture of volume data by using a Texture memory is shown in fig. 1, the three-dimensional Texture of the volume data is established by using a Texture register, a channel type descriptor is obtained according to the type of the volume data, a device memory 3D Array pointer is obtained according to the channel type descriptor and the size of the volume data, and finally, the volume data 3D Texture is obtained by using a volume data host memory pointer, the channel type descriptor and the device memory 3D Array pointer.

The specific scheme of the embodiment of reading the three-dimensional Texture of the volume data by using the Texture memory is to use a Tex3D () function provided by CUDA to pick up floating-point type Texture coordinates according to the 3D Texture of the volume data and the position of the sampling point, which can be denoted as Tex3D (Tex, pos), where Tex is the 3D Texture of the volume data, and pos is the floating-point type Texture coordinates, as shown in fig. 2; and then, performing hardware interpolation filtering to finally obtain a sampling point sample, wherein the hardware interpolation filtering process is one of the characteristics of the texture memory, and the hardware interpolation filtering process does not occupy a programmable unit and provides extra floating point processing capacity.

In the invention, the realization mode of accessing and storing the volume data comprises the steps of establishing the volume data by utilizing a global memory; in the process of traversing volume data, obtaining values of sampling points through trilinear interpolation of 8 voxels around the sampling point as the center; and when the global memory is accessed, the shared memory is used as a buffer area, the access sequence of the threads in the Half-warp is adjusted, and the non-merged access to the global memory is optimized into the merged access.

And when the three-dimensional data needs to be established and the three-dimensional data in the global memory is traversed to access the global memory, adopting a global memory strategy. The global memory strategy designed by the embodiment is as follows:

the global memory occupies most of the display memory of the display equipment, has a higher storage space, and any thread in the whole grid can read and write at any position of the global memory, and can provide a very high bandwidth at the same time.

The specific scheme of storing the volume data by using the global memory in the embodiment is shown in fig. 3, where the specific scheme includes a process of building the volume data by using the global memory, and the volume data device end group pointer d _ PitchedPtr is an example of an existing cudapatcedptr structure in the CUDA architecture, and includes a pointer ptr of the volume data in the global memory and a value of the volume after the volume data is aligned. Setting the volume data size as volume Szie, allocating volume Szie size space for the volume data device end group pointer d _ PitchedPtr, and then copying HosttoDevice (host-to-device) direction data according to the volume data host memory pointer.

The specific scheme of reading the volume data by using the global memory in the embodiment is shown in fig. 4, which includes a process of traversing the volume data in the global memory, and the three dimensions of the volume data include a depth dimension (depth), a height dimension (height), and a width dimension (width). And setting the voxel position as pos (x, y, z), and sequentially obtaining a volume data depth dimension, a volume data height dimension and a volume data width dimension according to the volume data device end array pointer d _ PitchedPtr and the voxel position pos (x, y, z), thereby obtaining the voxel.

The values of the sampling points are obtained by trilinear interpolation of 8 surrounding voxels centered on the sampling point, so that 8 voxels need to be obtained by the method shown in fig. 3, and then the final sampling points are obtained by interpolation.

A Half-warp is a collection of 16 threads, and in the CUDA architecture, instruction issue is in units of one Half-warp. By meeting a certain condition, the memory access efficiency of the global memory is improved by processing the requests of the threads at the same time only through one-time transmission. This condition is a merged access (coalesced access) condition. CUDA devices of different computing capabilities also have different requirements on the merged access conditions.

On devices with 1.0 and 1.1 computing power, the access of a half-warp to a segment is linear and continuous, namely the ith thread must access the ith word in the segment, the address of the segment accessed by the half-warp needs to be aligned to 16 times the word length accessed by each thread, and only the combined access of data with the word lengths of 32 bits, 64 bits and 128 bits is supported. If the merged access condition is met, but the threads do not access the memory, the merged access is regarded as one time; if the merge access condition is not satisfied, the half-warp is broken down into 16 serial accesses.

On devices with 1.2 or more computing power, the restrictions on the merged access conditions are greatly relaxed, firstly, the access to the word length is expanded to 8 bits, 16 bits, 32 bits, 64 bits and 128 bits, and secondly, the access to the segments does not require sequential access and aligned access, and the merged access conditions are met as long as within one segment. Even if two segments are crossed, the device is decomposed into two transmissions meeting the merging access condition, and the device is not decomposed into 16 times in the devices of 1.0 and 1.1, so that the access efficiency of the global memory is greatly improved.

In order to solve the problem of access efficiency reduction and the waste of the bandwidth of a global memory caused by access time interval misalignment and interval access during sampling of volume data, the invention provides a strategy for combining a shared memory and the global memory. When non-merged access occurs to the global memory, the access sequence of threads in the Half-warp can be adjusted by using the shared memory as a buffer area, so that the merged access is optimized. The embodiment designs a global + sharing policy as follows:

as shown in FIG. 5, a non-merged access to a one-time read of float3 type data may be converted to a merged access using the shared memory and thread synchronization mechanism provided by the CUDA.

Taking 256threads/block as an example, the pseudo code is converted into:

// step 1

_share_float smem_data[256*3];

smem_data[threadIdx.x]=gmem_data[threadId];

smem_data[threadIdx.x+256]=gmem_data[threadId+256];

smem_data[threadIdx.x+512]=gmem_data[threadId+512]

_syncthreads();

// step 2

Calc();

// step 3

gmem_data[threadId]=smem_data[threadIdx.x];

gmem_data[threadId+256]=smem_data[threadIdx.x+256];

gmem_data[threadId+512]=smem_data[threadIdx.x+512];

Wherein 256threads/block represents a thread block of size 256; "share _ float smart _ data [256 x 3] defines a 256 x3 array of memory shared memory data, with threadId representing the Id of the process and smart _ data [ ] representing the array of data in global memory.

Step 1 is fetching data to shared memory. The float3 type data is fetched from global memory to shared memory in the form of 3 floats and the threads are synchronized using synchreads () existing in the CUDA architecture. The conversion of non-merged access to merged access is achieved in this step. As shown, each thread reads three floats of 0, 256, 512 with its own thread ID as the starting offset position, so that the threads t0, t1, t2, t3 … t255 can read 256 float type data consecutively, thereby realizing access conforming to the merged access condition.

Step 2 is data calculation. Consistent with the calculation steps in the original calc (), only the data is read from the shared memory smem _ data according to the threadadd. The function calc () represents processing done on data, and may be any processing.

Step 3 is data write back. Similar to step 1, each thread writes back three floats from shared memory to global memory with its own thread ID as 0, 256, 512 starting offset location.

The flow of the policy for establishing volume data in the global memory is the same as the global memory policy, see fig. 6: after sequentially obtaining the volume data depth dimension, the volume data height dimension, and the volume data width dimension according to the volume data device end array pointer d _ PitchedPtr and the voxel position pos (x, y, z), (Float) Offset =0, (Float) Offset =256, (Float) Offset =512, where Offset represents an Offset, and each thread writes three floats, i.e., Float3voxel, from the shared memory to the global memory with its own thread ID as 0, 256, 512 of the initial Offset position.

In conjunction with the above analysis of translation and merge access through shared memory, sampling of the global + sharing strategy decomposed the reading of a single float3 type voxel into 3 float data concurrent reads. The sampling process is shown in fig. 6.

Different from the strategy of the global memory, after the shared memory is matched, when the voxel is read from the global memory, the float3 type data is decomposed into 3 floats, then the concurrent reading meeting the merging access condition is carried out through 256 reads/block, and after the data is read from the global memory to the shared memory, the data is read in the float3 type format.

The invention discloses an implementation mode for accessing and storing a view angle matrix, which comprises the steps of obtaining a view angle matrix from an OpenGL built-in model matrix and copying the view angle matrix to a constant memory; and directly calling the view angle matrix when the ray needs to be calculated with the view angle matrix. When the constant register needs to be calculated by applying the view angle matrix, a constant memory strategy is adopted.

The constant memory strategy designed in the embodiment is directed to a viewing angle matrix for controlling image viewing angle conversion, and the manner of storing the viewing angle matrix by using the constant memory in the embodiment is as follows:

the visual angle matrix is a float type matrix with the size of 3x4, and in a light projection algorithm, when each ray calculates a sampling point, the position of an eye and a normalized normal vector of the ray are calculated according to the current visual angle matrix.

The CUDA architecture provides a constant memory size of 64KB for storing read-only data that needs to be accessed frequently. One half-warp thread can access the constant register and obtain required data only in one cycle, and the efficiency is greatly higher than that of reading data from the video memory.

A model MATRIX built in OpenGL (existing graphical program interface) is a float type array with a size of 4 × 3, a current active MATRIX is set as the model MATRIX through an existing OpenGL function glMatrixMode (GL _ model view), the model MATRIX can be copied into a flow [16] array model view declared in advance through an existing OpenGL function glGetFloatv (GL _ model view _ MATRIX, model view), and a view MATRIX can be obtained by inverting the model view.

And copying the data in the host into the constant memory by a cudamemcpyToSymbol () method provided by the CUDA architecture.

When the ray needs to be calculated with the view angle matrix, the ray can be directly called from the constant memory.

In order to compare the rendering efficiency, the embodiment also designs a variance model of the first derivative of the image edge as a measure of the rendering efficiency of the volume under different optimization strategies. Since the variance can be used to measure the degree of deviation between a random variable and its mathematical expectation (i.e., mean), and the first derivative of the image edge is the appearance of the slope at all points of the image edge curve, if the image quality is high, the edge curve will be smoother, and large undulations and breakpoints will occur less, i.e., there will be less of a particularly large change in the slope magnitude, and therefore the variance value of the first derivative of the rounding curve will also be lower in this case. Therefore, the invention uses the variance value of the first derivative to measure the smoothness of the edge curve, thereby evaluating the drawing quality.

The evaluation criteria model of the example was designed as follows:

extracting the image edge by using a canny operator. And determining a critical value k, reading each row of pixel values, considering the pixel values to enter an image area from left to right if the pixel values are smaller than k, recording coordinate values of boundary points, and obtaining a boundary coordinate value array.

② according to the boundary coordinate value array defined in the above-mentioned first, using d-diff(s)_i,s_j) Every two pixels s are calculated_i、s_jThe first derivative value (slope of the edge line) in between, resulting in a slope array.

Wherein (x)_i，y_i) Is a pixel point s_i(x) coordinate value of_j，y_j) Is a pixel point s_jThe coordinate values of (2). The statistical image is a very small fraction taken from the overall image, with a size of around 100 x 100 pixels. A first derivative of the image edge within the 100 x 100 region is calculated.

Thirdly, according to the slope array obtained in the step (c) (namely, the array of the first derivative of the pixel points on the storage edge curve is set, the length of the array is set to be n), the slope array is usedThe variance value of the slope is calculated. Wherein d is_iThe first derivative of any pixel point on the edge curve, i.e. the number of pixel points on the edge curve,is the average of the slope array and,

the rendering efficiencies of the first three strategies for volume data under the same conditions were compared vertically using the above evaluation method. Comparing the number of frames rendered per second (fps) when reading 128x128x128 volume data, it is found that the rendering efficiency of the global memory strategy is too low, which results in unsatisfactory algorithm performance, good rendering efficiency of the texture memory strategy and the global memory strategy, and the rendering efficiency of the texture memory strategy is slightly higher than that of the global + sharing strategy under the condition of infrequent reading of the volume data. However, as the sampling interval of the volume data is reduced, that is, the number of times of reading the memory is gradually increased, the rendering efficiency of the texture strategy is decreased very quickly, and the global + sharing strategy does not change the experimental conditions greatly. From this experiment, it can be seen that the global + sharing strategy is superior to the texture strategy when the algorithm needs to read the volume data frequently.

Similarly, comparing fps of reading volume data with size 128x128x128 when using and not using the constant memory strategy, experimental results show that after the constant memory is used for optimizing the storage of the visual angle matrix, the drawing efficiency is almost doubled, so that the constant memory strategy is very effective in improving the performance of the ray casting algorithm according to the time result.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A memory access method for realizing shear wave data three-dimensional visualization aiming at parallel volume rendering is characterized in that shear wave data three-dimensional visualization is realized based on a CUDA (compute unified device architecture) which comprises a texture memory, a global memory, a shared memory and a constant memory, and the memory access method is characterized in that: the accessed data objects comprise three-dimensional textures of volume data, the volume data and a view angle matrix,

the implementation mode of accessing and storing the three-dimensional texture of the volume data provides a texture memory strategy, which comprises the steps of utilizing a texture register to build the three-dimensional texture of the volume data, and providing additional floating point processing capacity through a hardware interpolation filtering process in the process of picking up texture coordinates;

wherein,

in the process of establishing the three-dimensional Texture of the volume data by using the Texture register, obtaining a channel type descriptor according to the volume data type, obtaining a 3D Array pointer of an equipment memory according to the channel type descriptor and the volume data size, and finally obtaining a 3D Texture of the volume data according to a host memory pointer of the volume data, the channel type descriptor and the 3D Array pointer of the equipment memory;

in the process of picking up the Texture coordinates, a Tex3D () function provided by the CUDA is used for picking up the floating-point type Texture coordinates according to the volume data 3D Texture and the sampling point position, and the coordinates are marked as Tex3D (Tex, pos), wherein the Tex is the volume data 3D Texture, and the pos is the floating-point type Texture coordinates; then, performing hardware interpolation filtering to finally obtain a sampling point sample;

the implementation mode of accessing the volume data provides a global memory strategy, which comprises the steps of establishing the volume data by using a global memory; in the process of traversing volume data, obtaining values of sampling points through trilinear interpolation of 8 voxels around the sampling point as the center; or providing a strategy of combining a shared memory and a global memory, and when the global memory is accessed, adjusting the access sequence of threads in the Half-warp by using the shared memory as a buffer area, and optimizing non-merged access of the global memory into merged access;

when a global memory strategy is adopted, in the process of building stereo data by using a global memory, setting the volume data size as volume Szie, distributing volume Szie size space for a volume data equipment end group pointer d _ PitchedPtr, and then copying Hosttodevice direction data according to a volume data host memory pointer; in the process of traversing volume data, setting a voxel position as pos (x, y, z), and sequentially obtaining a volume data depth dimension, a volume data height dimension and a volume data width dimension according to a volume data device end array pointer d _ PitchedPtr and the voxel position pos (x, y, z), thereby obtaining a voxel; depth dimension, height dimension and width dimension are respectively depth, height and width;

when a strategy that a shared memory is combined with a global memory is adopted, the process of establishing volume data by using the global memory is the same as that when the strategy of adopting the global memory is adopted, in the process of traversing the volume data, a volume data depth dimension, a volume data height dimension and a volume data width dimension are sequentially obtained according to a volume data device end array pointer d _ PitchedPtr and a voxel position pos (x, y, z), and after a voxel is obtained, each thread writes three float data from the shared memory to the global memory by taking the thread ID of each thread as 0, 256 and 512 of an initial offset position;

providing a constant memory strategy for the implementation mode of accessing and storing the view angle matrix, wherein the constant memory strategy comprises the steps of obtaining the view angle matrix from an OpenGL built-in model matrix and copying the view angle matrix to a constant memory; directly calling a visual angle matrix when the ray needs to be calculated with the visual angle matrix;

wherein,

when the view MATRIX is copied to the constant memory, the view MATRIX is stored by using the constant memory in a manner that a current active MATRIX is set as a model MATRIX through an OpenGL function glMatrixMode (GL _ MODELVEW), the model MATRIX is copied into a flow [16] array model View declared in advance through an OpenGL function glGetFloatv (GL _ MODELVEW _ MATRIX, model View), the view MATRIX is obtained through inversion of the model View, and data in a host is copied into the constant memory through a cudamMemcpyToSymbol () method provided by a CUDA architecture.

2. The memory access method for realizing shear wave data three-dimensional visualization aiming at parallel volume rendering according to claim 1, characterized in that: the smoothness of the edge curve is measured by the variance value of the first derivative and is used as the judgment criterion of the three-dimensional visual picture quality of the shear wave data to obtain the drawing efficiency of each strategy,

the evaluation criteria model was designed as follows:

firstly, extracting an image edge by using a canny operator to obtain a boundary coordinate value array;

secondly, according to the determined boundary coordinate value array in the step one, counting every two pixels s_i、s_jThe first derivative value between the two is as follows to obtain a slope array,

wherein (x)_i，y_i) Is a pixel point s_i(x) coordinate value of_j，y_j) Is a pixel point s_jThe coordinate values of (a); the statistical image is a part of area taken from the whole image, and the first derivative of the image edge in the statistical area is calculated;

thirdly, setting the length of the array as n according to the slope array obtained in the second stepCalculating a variance value of the slope, wherein d_iThe first derivative at any pixel point on the edge curve,is the average of the slope array and,

<math> <mrow> <mover> <mi>d</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <msubsup> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msub> <mi>d</mi> <mi>i</mi> </msub> <mo>/</mo> <mi>n</mi> <mo>.</mo> </mrow> </math>