CN111239745A

CN111239745A - Beam synthesis processing method, beam synthesis processing device, computer equipment and storage medium

Info

Publication number: CN111239745A
Application number: CN201911412201.1A
Authority: CN
Inventors: 郭震; 周鄂林; 凌涛
Original assignee: Vinno Technology Suzhou Co Ltd
Current assignee: Vinno Technology Suzhou Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-06-05
Also published as: WO2021135629A1

Abstract

The application relates to a beam forming processing method, a beam forming processing device, a computer device and a storage medium. The method comprises the following steps: receiving front-end received data corresponding to a plurality of receiving channels; receiving a control instruction transmitted by a CPU, wherein the control instruction carries position information of a plurality of receiving channels, coordinate information of a plurality of receiving lines and information of sampling points contained in each receiving line; and synthesizing front-end received data corresponding to the plurality of receiving channels according to the position information of the plurality of receiving channels, the coordinate information of the plurality of receiving lines and the information of the sampling points contained in each receiving line to obtain the received data of the plurality of receiving lines. By adopting the method, the data received by the front end of the multi-channel is directly transmitted to the GPU, so that the data transmission efficiency is improved, the real-time requirement of beam forming is met, and the data quality of the intermediate image required by subsequent processing can be met.

Description

Beam synthesis processing method, beam synthesis processing device, computer equipment and storage medium

Technical Field

The present application relates to the field of ultrasound data processing technologies, and in particular, to a beam forming processing method and apparatus, a computer device, and a storage medium.

Background

The digital beam forming technology is ultrasonic detection and is also the core technology of the whole signal receiving and processing system. The traditional multi-channel ultrasonic parallel data acquisition and processing system is based on an FPGA (Field Programmable Gate Array), generally processes signals in parallel through various modules based on the FPGA, fully utilizes the advantage of parallel work of the FPGA, realizes the work of parallel sampling, digital processing and the like of multi-channel signals, realizes accurate delay and quick weighted summation, and then the FPGA writes weighted summation data into a CPU through a PCIE bus for subsequent processing.

In the related art, the beamforming process combines a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit), transmits ultrasound front-end data to the CPU through an FPGA, and performs beamforming by the CPU. And then, transmitting the processed data to a digital GPU, and performing further processing such as compounding and demodulation on the processed data on the GPU to realize ultrahigh-speed imaging. However, in the related art, there is a certain time delay in transmitting the ultrasound front-end data to the CPU, thereby resulting in inefficient data transmission.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a beamforming processing method, apparatus, computer device and storage medium capable of improving data transmission efficiency between multiple devices of large amount of ultrasound data and facilitating debugging and migration.

In order to achieve the above object, in one aspect, an embodiment of the present application provides a beamforming processing method applied to a GPU, where the method includes:

receiving front-end received data corresponding to a plurality of receiving channels;

receiving a control instruction transmitted by a CPU, wherein the control instruction carries position information of a plurality of receiving channels, coordinate information of a plurality of receiving lines and information of positions of sampling points contained in each receiving line;

and synthesizing front-end received data corresponding to the plurality of receiving channels according to the position information of the plurality of receiving channels, the coordinate information of the plurality of receiving lines and the information of the sampling points contained in each receiving line to obtain the received data of the plurality of receiving lines.

In one embodiment, the thread structure provided in the graphics processor is a two-dimensional grid structure, the horizontal dimension of the two-dimensional grid structure corresponds to the number of the plurality of receiving lines, and the vertical dimension corresponds to the number of sampling points in each receiving line.

In one embodiment, a plurality of thread blocks are arranged in the graphics processor, and each thread block processes received data of a corresponding receiving line; synthesizing front-end received data corresponding to a plurality of receiving channels according to position information of the plurality of receiving channels, coordinate information of the plurality of receiving lines and information of sampling points contained in each receiving line to obtain received data of the plurality of receiving lines, the method comprises the following steps:

determining the position information of each sampling point in each receiving line according to the coordinate information of each receiving line and the information of the sampling point contained in each receiving line;

calculating the received data of each sampling point in a corresponding receiving line according to the position information of each sampling point in the corresponding receiving line, the position information of a plurality of receiving channels and the front-end received data corresponding to the plurality of receiving channels by each thread block;

and carrying out weighted sum on the received data of the sampling points in each receiving line to obtain the received data of each receiving line.

In one embodiment, calculating, by each thread block, received data of each sampling point in a corresponding one of the receiving lines according to position information of each sampling point in the corresponding one of the receiving lines, position information of a plurality of receiving channels, and front-end received data corresponding to the plurality of receiving channels includes:

obtaining the distance value between each sampling point and each receiving channel according to the position information of each sampling point in the receiving line;

determining the time delay of each sampling point in the receiving line relative to each receiving channel according to the distance value;

and generating the received data of each sampling point according to the time delay of each sampling point relative to each receiving channel and the front-end received data corresponding to each receiving channel.

In one embodiment, after synthesizing the front-end received data corresponding to the plurality of receiving channels according to the position information of the plurality of sampling points to obtain the received data of the plurality of receiving lines, the method further includes:

storing the received data of a plurality of receiving lines into a preset shared data buffer area;

and transmitting the received data of the plurality of receiving lines to the CPU through the shared data buffer.

In one embodiment, the method further comprises:

and determining the number of sampling points per unit distance in each receiving line according to the acquired sampling frequency and the tissue speed.

In one embodiment, receiving front-end received data corresponding to a plurality of receiving channels includes:

and receiving front-end receiving data corresponding to the plurality of receiving channels through a high-speed serial computer expansion bus standard PCIe.

On the other hand, an embodiment of the present application further provides a beam forming processing apparatus, where the apparatus includes:

the receiving module is used for receiving front-end receiving data corresponding to the plurality of receiving channels;

the receiving module is also used for receiving a control instruction transmitted by the CPU, wherein the control instruction carries position information of a plurality of receiving channels, coordinate information of a plurality of receiving lines and information of sampling points contained in each receiving line;

and the beam synthesis module is used for synthesizing front-end received data corresponding to the plurality of receiving channels according to the position information of the plurality of receiving channels, the coordinate information of the plurality of receiving lines and the information of the sampling points contained in each receiving line to obtain the received data of the plurality of receiving lines.

In yet another aspect, an embodiment of the present application further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the beam forming processing method described in any one of the above when executing the computer program.

In yet another aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the beam forming processing method described in any one of the above.

According to the beam forming processing method, the beam forming processing device, the computer equipment and the storage medium, the data received by the front ends of the multiple channels are directly transmitted to the GPU, so that the data transmission efficiency is improved; the advantages of short development period, convenient debugging and convenient transplantation are achieved by using the CPU and the GPU platform, and the cost of beam synthesis can be reduced; by utilizing the high-speed parallel computing capability and the parallel multi-task processing capability of the GPU, the data processing of multiple receiving lines is realized at the same time, so that the real-time performance of beam forming is improved, and the data quality of an intermediate image required by subsequent processing can be met.

Drawings

FIG. 1 is a diagram of an exemplary implementation of a beamforming process;

FIG. 2 is a flow diagram illustrating a method for beamforming in one embodiment;

FIG. 2a is a graph illustrating delay comparison between GPU and CPU beamforming in one embodiment;

FIG. 3 is a flow diagram that illustrates processing of receive data for a receive line by a thread block in one embodiment;

FIG. 4 is a flow diagram that illustrates processing of receive data for a receive line by a thread block in one embodiment;

FIG. 4a is a diagram illustrating obtaining a delay value according to the position information of a sampling point in one embodiment;

FIG. 5 is a flow diagram that illustrates a method for beamforming in one embodiment;

FIG. 6 is a block diagram showing the structure of a beam forming apparatus according to an embodiment;

FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The beam forming processing method provided by the present application can be applied to the application environment shown in fig. 1. The application environment includes a GPU 110, a CPU120, and an AFE130(Analog Front End). The GPU is called a Graphics Processor (Graphics Processor Unit) and is a core Unit of the Graphics card. Since GPU has been produced, it has been used as a task for rendering graphics, and as NVIDIA (NVIDIA) introduced a unified Computing Architecture CUDA (unified Computing device Architecture), AMD (Advanced Micro Devices, american ultra Micro semiconductor) company, which uses OPENCL (Open Computing Language) as a development Language, GPU programming has become convenient and simple, and GPU has also been promoted to be widely used in data Computing processing fields other than graphics Computing, such as general high-performance Computing field. Structurally, the CPU is basically a controller and a cache register, and the GPU structurally has a large number of logical operation units, which makes the GPU more suitable when processing a large amount of data in parallel, and the development high-performance computing capability of the GPU far exceeds that of the CPU. Specifically, the AFE130(Analog Front End) transmits Front-End reception data of multiple reception channels obtained by processing the Analog echo signal to the GPU 110 through a preset protocol. The default protocol may refer to a PCI (Peripheral Component Interconnect) bus protocol, PCIe (Peripheral Component Interconnect express), or the like. The GPU 110 receives front-end reception data corresponding to the multiple reception channels. When the beamforming processing is executed, the CPU120 transmits the control instruction and the related data therein to the GPU 110, so that the GPU 110 executes the beamforming processing method according to the control instruction in the concept of thread grid. GPU 110 includes a plurality of thread grids, each of which may contain a plurality of thread blocks, each of which may contain a plurality of threads. When a task is to be executed, each thread grid divides the task into a part to each thread block, and the thread blocks and the threads in the thread blocks complete the task. Specifically, the GPU 110 receives front-end reception data corresponding to a plurality of reception channels; the GPU 110 receives a control instruction transmitted by the CPU120, where the control instruction carries position information of a plurality of receiving channels, coordinate information of a plurality of receiving lines, and information of sampling points included in each receiving line; the GPU 110 synthesizes front-end reception data corresponding to the plurality of reception channels based on the position information of the plurality of reception channels, the coordinate information of the plurality of reception lines, and the information of the sampling points included in each reception line, to obtain reception data of the plurality of reception lines. The GPU 110 transmits the obtained received data of the plurality of receiving lines to the CPU120 for subsequent processing.

In one embodiment, as shown in fig. 2, a beamforming processing method is provided, which is exemplified by the method applied to the GPU 110 in fig. 1, and includes the following steps:

step 210, receiving front-end receiving data corresponding to a plurality of receiving channels.

The front end receiving data refers to a digital echo signal obtained by processing the received ultrasonic echo. Specifically, the hardware platform issues an instruction to enable the probe to transmit ultrasonic waves according to a certain requirement, and then the probe receives ultrasonic echoes. Because ultrasound has losses such as scattering in the transmitting and receiving paths, the received ultrasound echo signal must increase with the receiving time, and the echo relative intensity is smaller, if such a signal is directly used for subsequent processing, the obtained ultrasound image will show different brightness at different detection depths, which is not favorable for truly reflecting the detected tissue structure. Therefore, the time gain compensation of the received ultrasonic echo signals can weaken the subsequent processing problem caused by the reduction of the signal strength along with the depth. The processed signal is actually an analog signal, so in order to improve the signal processing efficiency and reduce the complexity of a hardware platform, an analog-to-digital converter (ADC) is required to convert the analog echo signal into a digital echo signal, i.e., front-end received data corresponding to each receiving channel is obtained. And receiving data at the front ends corresponding to the multiple receiving channels, and directly transmitting the data to the GPU for beam forming processing. The GPU is called a Graphics Processor (Graphics Processor Unit), and is a core Unit of a Graphics card, and structurally has a large number of logical operation units, which can accommodate thousands of numerical calculation threads without logical relationships.

Step 220, receiving a control instruction transmitted by the CPU, where the control instruction carries position information of multiple receiving channels, coordinate information of multiple receiving lines, and information that each receiving line includes a sample.

The sampling point refers to a point obtained by sampling from a receiving line according to a certain sampling rule. The sampling information may be, but is not limited to, information such as the number of sampling points in each receive line. Specifically, in this embodiment, a CPU + GPU architecture is adopted to perform beam forming processing, where the CPU completes task organization and transmission, and whenever the CPU encounters a task requiring parallel computation, the operations to be performed are organized into corresponding control instructions. Then, the CPU transmits the control instruction to the GPU, and the GPU completes parallel computation according to the received control instruction. And when a new beam forming processing task is received every time, the CPU updates corresponding parameters according to the task at this time so that the GPU can accurately perform parallel computation. In this embodiment, the CPU may transmit a control instruction to the shared data area in the GPU, where the control instruction includes information such as position information of the multiple receiving channels obtained by beamforming, position coordinates of the multiple receiving lines, and the number and positions of the sampling points in each receiving line, so that the GPU can calculate the received data of the sampling points in each receiving line according to the information.

Step 230, synthesizing the front-end received data corresponding to the plurality of receiving channels to obtain the received data of the plurality of receiving lines according to the position information of the plurality of receiving channels, the coordinate information of the plurality of receiving lines and the information of the sampling points included in each receiving line.

Specifically, after acquiring information such as position information of a plurality of receiving channels, position coordinates of a plurality of receiving lines, and the number of sampling points in each receiving line, the GPU may determine a distance of each sampling point relative to each channel according to the position coordinates of each receiving line, and the number and position of sampling points in each receiving line, so as to perform digital beam synthesis according to a delay caused by a difference between the sampling points and the channel distances to form received data of the receiving lines.

In the beam forming processing method, the data received by the front end of the multi-channel is directly transmitted to the GPU, so that the data transmission efficiency is improved; the advantages of short development period, convenient debugging and convenient transplantation are achieved by using the CPU and the GPU platform, and the cost of beam synthesis can be reduced; by utilizing the high-speed parallel computing capability and the parallel multi-task processing capability of the GPU, the data processing of multiple receiving lines is realized at the same time, so that the real-time performance of beam forming is improved, and the data quality of an intermediate image required by subsequent processing can be met. Compared with the beam synthesis processing by adopting the CPU in the related technology, the CPU is basically a controller and a cache register, and the GPU structurally has a large number of logic operation units, so that the GPU is more suitable when a large amount of data is processed in parallel, and the development high-performance computing capability of the GPU is far beyond that of the CPU. Fig. 2a shows the time duration for processing 30 receive lines by using the CPU and the GPU respectively in one embodiment, and as can be known from fig. 2a, the time delay for performing beam forming by using the GPU is much smaller than the time delay of the CPU.

In one embodiment, the thread structure arranged in the GPU is a two-dimensional grid structure, the horizontal dimension of the two-dimensional grid structure corresponds to the number of lines of the plurality of receiving lines, and the vertical dimension corresponds to the number of sampling points in each receiving line.

Specifically, the thread structure of the GPU is set to a two-dimensional grid structure, the X dimension of the grid is the number of multiple receiving lines, and the Y dimension of the grid is the number of all sampling points on one line. Calculating the received data obtained by beam synthesis and summation of each sampling point in Y dimension by utilizing the computing capability of GPU multithreading parallel processing; and processing the received data of the plurality of receiving lines in parallel according to the received data of each sampling point by utilizing the multi-dimensional parallel processing capability of the GPU. In the embodiment, the advantage of GPU parallel processing is utilized, so that the GPU can perform high-speed parallel calculation to obtain the received data of a plurality of receiving lines, and the real-time requirement of beam forming is met.

In one embodiment, a plurality of thread blocks are arranged in a graphics processor, and each thread block processes received data of a corresponding receiving line; as shown in fig. 3, the method for synthesizing front-end received data corresponding to a plurality of receiving channels to obtain received data of a plurality of receiving lines according to position track information of the plurality of receiving channels, coordinate information of the plurality of receiving lines, and information of sampling points included in each receiving line includes the following steps:

and 231, determining the position information of each sampling point in each receiving line according to the coordinate information of each receiving line and the information of each sampling point contained in each receiving line.

And step 232, calculating the received data of each sampling point in the corresponding receiving line according to the position information of each sampling point in the corresponding receiving line, the position information of the plurality of receiving channels and the front-end received data corresponding to the plurality of receiving channels through each thread block.

Specifically, the GPU Thread structure is composed of a Grid (Grid), a Thread Block (Block), and a Thread (Thread), which is equivalent to dividing a computing unit on the GPU into a plurality of grids, each Grid includes a plurality of Thread blocks, and each Thread Block includes a plurality of threads. And the thread is the minimum execution unit in the GPU operation, and the thread can complete a minimum logic meaning operation. In this embodiment, the thread structure of the GPU is set to a two-dimensional grid structure, and one thread block in each grid is responsible for processing received data of one receive line; one thread in each thread block is responsible for processing the received data of one sampling point in one receiving line. That is, if there are N (N ≧ 4) receive lines, and each receive line has M (M > 1000) sampling points, the GPU thread structure in this embodiment includes N thread blocks, each thread block includes M threads, and there are N × M threads in total, and the N × M threads can execute processing of the received data of the sampling points in parallel. For example, for a certain sampling point in a certain receiving line, a certain thread in a corresponding thread block obtains the receiving data responsible for processing the sampling point. The thread can determine the delay of the sampling point relative to each receiving channel according to the distance of the sampling point relative to each receiving channel, so that the front-end received data of each receiving channel is delayed according to the delay. The distance of the sampling point relative to each receiving channel can be determined according to the received position information of each receiving channel, the position coordinate information of each receiving line and the position information of the sampling point in each receiving line.

Step 233, the received data of the sampling points in each receiving line is weighted and summed to obtain the received data of each receiving line.

Specifically, a plurality of calculation units in the GPU may be utilized, and the delayed front-end received data of the plurality of receiving channels are superimposed on each calculation unit to obtain received data of the sampling point. Then, the received data of the plurality of sampling points obtained by all the calculating units are subjected to weighted summation to obtain the received data of a corresponding one receiving line, thereby completing the beam forming processing of the one receiving line. Similarly, for a plurality of receiving lines, the received data of the receiving lines can be processed in the above manner. In this embodiment, the thread structure of the GPU is optimized, so that each thread block correspondingly processes the received data of one receiving line, and the GPU obtained by optimization can calculate the received data of multiple receiving lines in parallel at a high speed by using the advantage of GPU parallel data processing, thereby meeting the real-time requirement of beam forming.

In one embodiment, as shown in fig. 4, calculating, by each thread block, received data of each sampling point in a corresponding one of the receiving lines according to position information of each sampling point in the corresponding one of the receiving lines and front-end received data corresponding to the plurality of receiving channels includes:

step 2311, obtaining distance values between each sampling point and each receiving channel according to the position information of each sampling point in the receiving line and the position information of each receiving channel.

And 2312, determining the time delay of each sampling point in the receiving line relative to each receiving channel according to the distance value.

The purpose of calculating the time delay is to obtain the front-end received data of the same phase of the same receiving sampling point relative to each receiving channel. Specifically, the GPU may determine a distance value of each sampling point from each channel according to a position coordinate of each sampling point in the receiving line. Then, a delay value for delaying the front-end received data of each channel is determined according to the distance value of each channel. As shown in fig. 4a, a schematic diagram of determining the delay of each receiving channel according to the position information of the sampling point in one embodiment is shown. In fig. 4, a total of a receiving channels are included, d represents the vertical distance from the sampling point to the receiving channel, d1 represents the distance from the sampling point to the receiving channel a-1, and d2 represents the distance from the sampling point to the receiving channel a-2. The delay value idelay1 ═ X (d + d1) × of the sampling point and the reception channel a-1 can be determined in the following manner₁Wherein X is₁The representative coefficient is determined according to actual conditions. Similarly, the delay value idelay2 of the sampling point and the receiving channel a-2 is (d + d2) ×₂. It will be appreciated that the delay value increases with increasing distance of the sample point from the receive channel.

And 2313, generating the received data of each sampling point according to the time delay of each sampling point relative to each receiving channel and the front-end received data corresponding to each receiving channel.

Specifically, after determining the delay value of the sampling point relative to each receiving channel, the front-end received data of the same phase of each receiving channel is obtained according to the delay value of the sampling point relative to each receiving channel. Then, the front-end received data of the plurality of receiving channels with the same phase are added to obtain the received data of the sampling point. Further, for a certain receiving line, after a plurality of threads of the same thread block process a plurality of sampling points on the receiving line in parallel to obtain sampling data of the plurality of sampling points, the received data of the plurality of sampling points are weighted and summed, so that the received data of the receiving line can be obtained.

In one embodiment, after synthesizing the front-end received data corresponding to the plurality of receiving channels according to the position information of the plurality of sampling points to obtain the received data of the plurality of receiving lines, the method further includes: storing the received data of a plurality of receiving lines into a preset shared data buffer area; and transmitting the received data of the plurality of receiving lines to the CPU through the shared data buffer.

Specifically, a shared data buffer for output is created in the GPU in advance, and the received data after beamforming is arranged in the shared data buffer as an output buffer for parallel computation. The finally obtained received data of the plurality of receiving lines can be transmitted to the CPU memory through the shared data buffer area, so that the CPU memory can perform further processing according to the received data of the plurality of receiving lines, wherein the further processing refers to processing such as decoding and filtering of the received data, and the preparation is performed for final image display. In the embodiment, the shared data buffer area for outputting the data is created in the GPU, so that the data obtained by parallel calculation of the GPU can be orderly transmitted to the CPU for processing, and the data transmission efficiency is improved.

In one embodiment, the sampling points in the receive line may be determined by: and determining the number of sampling points per unit distance in each receiving line according to the acquired sampling frequency and the tissue speed.

Wherein, the tissue speed refers to the ultrasonic echo speed. Specifically, data of unit distance sampling points in a receiving line are changed according to different sampling depths, and the deeper the sampling depth, the more the sampling points. The number of sampling points per unit distance in the receiving line can be determined by the sampling frequency and the tissue velocity, and can be determined by the following formula: the number of samples per unit distance is the sampling frequency/tissue velocity. It will be appreciated that the deeper the sampling depth, the lower the tissue velocity will be. In the embodiment, the number of the sampling points is changed along with the change of the sampling depth, so that the sampling points can be selected more comprehensively, and the accuracy of beam forming is improved.

In one embodiment, receiving front-end received data corresponding to a plurality of receive channels includes: and receiving front-end receiving data corresponding to the plurality of receiving channels through a high-speed serial computer expansion bus standard PCIe. In this embodiment, front-end received data corresponding to the multiple receiving channels is directly transmitted to the GPU through PCIe, so that the beam forming processing process can meet the requirement of ultra-high speed imaging on the data transmission rate and can meet the intermediate image data quality required for subsequent processing.

In one embodiment, as shown in fig. 5, a beamforming processing method is described by a specific embodiment, including the following steps:

step 501, the GPU receives data from the front end corresponding to the multiple receive channels through the PCIe bus standard. The thread structure arranged in the GPU is a two-dimensional grid structure, the horizontal dimension of the two-dimensional grid structure corresponds to the number of the receiving lines, and the vertical dimension direction corresponds to the number of sampling points in each receiving line. The GPU is provided with a plurality of thread blocks, and each thread block processes the received data of a corresponding receiving line.

Step 502, receiving a control instruction transmitted by the CPU, where the control instruction carries position information of multiple receiving channels, coordinate information of multiple receiving lines, and information that each receiving line includes a sampling point.

Step 503, obtaining a distance value between each sampling point and each receiving channel through the thread in each thread block according to the position information of the plurality of receiving channels, the coordinate information of the plurality of receiving lines, and the information of the sampling point in each receiving line.

And step 504, determining the time delay of each sampling point in the receiving line relative to each receiving channel according to the distance value.

And 505, adding the front-end received data corresponding to each delayed receiving channel in parallel by using each computing unit in the GPU to obtain the received data of each sampling point.

Step 506, performing weighted sum on the received data of the sampling points in one receiving line obtained by all the calculating units, and completing the received data processing of the corresponding receiving line. Similarly, the received data of all the receiving lines can be obtained.

In step 507, the received data of the plurality of receiving lines is stored in a preset shared data buffer.

And step 508, transmitting the received data of the plurality of receiving lines to the CPU through the shared data buffer.

It should be understood that although the various steps in the flow charts of fig. 1-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 6, there is provided a beam-forming processing apparatus 600, including: a receiving module 601 and a beam forming module 602, wherein:

a receiving module 601, configured to receive front-end received data corresponding to multiple receiving channels;

the receiving module 601 is further configured to receive a control instruction transmitted by the CPU, where the control instruction carries position information of multiple receiving channels, coordinate information of multiple receiving lines, and information of each receiving line including a sampling point;

a beam synthesis module 602, configured to synthesize front-end received data corresponding to the multiple receiving channels according to the position information of the multiple receiving channels, the coordinate information of the multiple receiving lines, and the information of the sampling points included in each receiving line, so as to obtain received data of the multiple receiving lines.

In one embodiment, a plurality of thread blocks are arranged in a graphics processor, and each thread block processes received data of a corresponding receiving line; a beam synthesis module 602, configured to determine, according to the coordinate information of each receiving line and the information of the sampling points included in each receiving line, position information of each sampling point in each receiving line; calculating the received data of each sampling point in a corresponding receiving line through each thread block according to the position information of each sampling point in the corresponding receiving line and the position information of a plurality of receiving channels and the front-end received data corresponding to the plurality of receiving channels; and carrying out weighted sum on the received data of the sampling points in each receiving line to obtain the received data of each receiving line.

In an embodiment, the beam forming module 602 is specifically configured to obtain a distance value between each sampling point and each receiving channel according to the position information of each sampling point in the receiving line and the position information of each receiving channel; determining the time delay of each sampling point in the receiving line relative to each receiving channel according to the distance value; and generating the received data of each sampling point according to the time delay of each sampling point relative to each receiving channel and the front-end received data corresponding to each receiving channel.

In one embodiment, the system further comprises a data transmission module (not shown in fig. 6) for storing the received data of the plurality of receiving lines into a preset shared data buffer; and transmitting the received data of the plurality of receiving lines to the CPU through the shared data buffer.

In one embodiment, a sampling point determination module (not shown in fig. 6) is further included for determining the number of unit distance sampling points in each receiving line according to the acquired sampling frequency and the tissue velocity.

In an embodiment, the receiving module 601 is specifically configured to receive front-end receiving data corresponding to a plurality of receiving channels through the PCIe as a high speed serial computer expansion bus standard.

For specific limitations of the beamforming processing apparatus, reference may be made to the above limitations of the beamforming processing method, which is not described herein again. The modules in the beam forming processing device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a beamforming processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

receiving front-end received data corresponding to a plurality of receiving channels; receiving a control instruction transmitted by a CPU, wherein the control instruction carries position information of a plurality of receiving channels, coordinate information of a plurality of receiving lines and information of sampling points contained in each receiving line; and synthesizing front-end received data corresponding to the plurality of receiving channels according to the position information of the plurality of receiving channels, the coordinate information of the plurality of receiving lines and the information of the sampling points contained in each receiving line to obtain the received data of the plurality of receiving lines.

In one embodiment, a plurality of thread blocks are arranged in a graphics processor, and each thread block processes received data of a corresponding receiving line; the processor, when executing the computer program, further performs the steps of:

determining the position information of each sampling point in each receiving line according to the coordinate information of each receiving line and the information of the sampling point contained in each receiving line; calculating the received data of each sampling point in a corresponding receiving line according to the position information of each sampling point in the corresponding receiving line, the position information of a plurality of receiving channels and the front-end received data corresponding to the plurality of receiving channels by each thread block; and carrying out weighted sum on the received data of the sampling points in each receiving line to obtain the received data of each receiving line.

In one embodiment, the processor, when executing the computer program, further performs the steps of:

obtaining the distance value between each sampling point and each receiving channel according to the position information of each sampling point in the receiving line and the position information of each receiving channel; determining the time delay of each sampling point in the receiving line relative to each receiving channel according to the distance value; and generating the received data of each sampling point according to the time delay of each sampling point relative to each receiving channel and the front-end received data corresponding to each receiving channel.

storing the received data of a plurality of receiving lines into a preset shared data buffer area; and transmitting the received data of the plurality of receiving lines to the CPU through the shared data buffer.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

receiving front-end received data corresponding to a plurality of receiving channels; receiving a control instruction transmitted by a CPU, wherein the control instruction carries position information of a plurality of receiving channels, coordinate information of a plurality of receiving lines and information of each receiving line comprising a sampling point; and synthesizing front-end received data corresponding to the plurality of receiving channels according to the position information of the plurality of receiving channels, the coordinate information of the plurality of receiving lines and the information of the sampling points contained in each receiving line to obtain the received data of the plurality of receiving lines.

In one embodiment, a plurality of thread blocks are arranged in a graphics processor, and each thread block processes received data of a corresponding receiving line; the computer program when executed by the processor further realizes the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A beamforming processing method applied to a Graphics Processing Unit (GPU), the method comprising:

receiving a control instruction transmitted by a CPU, wherein the control instruction carries position information of the plurality of receiving channels, coordinate information of the plurality of receiving lines and information of a plurality of sampling points contained in each receiving line;

2. The method according to claim 1, wherein the thread structure provided in the graphics processor is a two-dimensional grid structure, the horizontal dimension of the two-dimensional grid structure corresponds to the number of the plurality of receiving lines, and the vertical dimension corresponds to the number of sampling points in each receiving line.

3. The method according to claim 2, wherein a plurality of thread blocks are provided in the graphics processor, each thread block processing the received data of a corresponding one of the receive lines; the synthesizing of the front-end received data corresponding to the plurality of receiving channels according to the position information of the plurality of receiving channels, the coordinate information of the plurality of receiving lines and the information of the sampling points included in each receiving line to obtain the received data of the plurality of receiving lines includes:

calculating the received data of each sampling point in a corresponding receiving line according to the position information of each sampling point in the corresponding receiving line, the position information of the plurality of receiving channels and the front-end received data corresponding to the plurality of receiving channels by each thread block;

4. The method according to claim 3, wherein the calculating, by each thread block, the reception data of each sampling point in a corresponding one of the receiving lines according to the position information of each sampling point in the corresponding one of the receiving lines, the position information of the plurality of receiving channels, and the front-end reception data corresponding to the plurality of receiving channels comprises:

obtaining the distance value between each sampling point and each receiving channel according to the position information of each sampling point in the receiving line and the position information of each receiving channel;

5. The method according to any one of claims 1 to 4, wherein after synthesizing the received data of the plurality of receiving lines according to the position information of the plurality of sampling points and the front-end received data corresponding to the plurality of receiving channels, the method further comprises:

storing the received data of the plurality of receiving lines to a preset shared data buffer area;

and transmitting the received data of the plurality of receiving lines to a CPU (Central processing Unit) for subsequent processing through the shared data buffer area.

6. The method of claim 1, further comprising:

7. The method of claim 1, wherein receiving the front-end received data corresponding to the plurality of receive channels comprises:

and receiving the front-end receiving data corresponding to the plurality of receiving channels through a high-speed serial computer expansion bus standard PCIE.

8. A beamforming processing apparatus, the apparatus comprising:

the receiving module is further configured to receive a control instruction transmitted by the CPU, where the control instruction carries position information of the multiple receiving channels, coordinate information of multiple receiving lines, and information of sampling points included in each receiving line;

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.