CN118069969B - GPU-based hierarchical media green function rapid calculation method and device - Google Patents

GPU-based hierarchical media green function rapid calculation method and device

Info

Publication number
CN118069969B
CN118069969B CN202410503575.9A CN202410503575A CN118069969B CN 118069969 B CN118069969 B CN 118069969B CN 202410503575 A CN202410503575 A CN 202410503575A CN 118069969 B CN118069969 B CN 118069969B
Authority
CN
China
Prior art keywords
integral
matrix
integration
tail
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410503575.9A
Other languages
Chinese (zh)
Other versions
CN118069969A (en
Inventor
吴比翼
袁馨
闫超泽
杨明林
盛新庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202410503575.9A priority Critical patent/CN118069969B/en
Publication of CN118069969A publication Critical patent/CN118069969A/en
Application granted granted Critical
Publication of CN118069969B publication Critical patent/CN118069969B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The application provides a method and a device for quickly calculating a hierarchical media green function based on a Graphic Processing Unit (GPU), and relates to the technical field of computational electromagnetism, wherein the method comprises the following steps: initializing a three-dimensional grid of the GPU and the thread number of each thread block; using a calculation task filling matrix of Somoprofil integration (Sommerfeld Integral, SI) of a plurality of parameter points and a plurality of space points contained in the initialized GPU, promoting the numerical integration of SI into matrix products, uniformly distributing calculation tasks of entries of the matrix into each thread block for parallel execution, and obtaining SI calculation results of the plurality of parameter points and the plurality of space points once, wherein the SI calculation results comprise SI head integration results and tail integration results, and the calculation process in each thread block comprises the following steps: and (3) performing matrix product by using a CUDA matrix operation unit Tensor Core, calculating the segment integral of the head and the tail, and accelerating convergence of the segment integral result by using Euler transformation when the tail integral is calculated. The method and the device for calculating the hierarchical medium greens by adopting the scheme realize rapid calculation of the hierarchical medium greens.

Description

GPU-based hierarchical media green function rapid calculation method and device
Technical Field
The application relates to the technical field of computational electromagnetics, in particular to a method and a device for quickly calculating a hierarchical media green function based on a Graphic Processing Unit (GPU).
Background
The integral equation method in planar layered media is one of the most successful models in computing electromagnetics, and has been widely used in analyses of microstrip, radio frequency circuits and chips. The difficulty in solving the target electromagnetic response in layered media using the moment method is mainly the fast calculation of the green's function. Unlike free space, where the green's function cannot be resolved, calculation of the somofilin integral (Sommerfeld Integral, SI) is required to convert the spectral domain green's function to a spatial green's function. However, the high concussion and slow decay of the Bessel (Bessel) function in the integrand and the inherent singularities of the integration kernel make its computation very difficult, and its computational efficiency directly affects the filling time of the matrix equation of the moment method. Therefore, the efficient solution of the Sommerfeld integral is a key to accelerating planar layered media electromagnetic simulation.
The wide application of Sommerfeld integral attracts attention of researchers, and a plurality of methods can accelerate the calculation of the somofeld integral. These methods can generally be divided into two categories: a closed approximation method and a numerical integration method.
The closed approximation method calculates a Sommerfeld integral implementation scheme:
(1) Fitting the spectral domain green function;
(2) Transforming the integral identity to a space domain, so as to obtain a mode of superposition of a plurality of spherical waves and or cylindrical waves;
the most representative closed approximation methods are the Discrete Complex Image Method (DCIM) and the rational fitting method. Although the closed approximation method avoids the calculation of infinite oscillation integral, greatly reduces the calculation time, the precision is uncontrollable, and the position of the wave pole on the surface is difficult to accurately position in a layered medium.
The implementation scheme for solving the Sommerfeld integration by a numerical integration method is as follows:
(1) Determining an integral path;
(2) Direct numerical integration along an integration path;
The numerical integration method mainly comprises a steepest descent path (STEEPEST DESCENT PATH, SDP) method, a fast Hankel transformation method and a series of processing methods aiming at slow attenuation of an integration core. Taking the steepest descent path method as an example, the method mainly processes the exponential term of the integral core, and selects the integral path according to the saddle point so as to enable the exponential function to descend rapidly from the saddle point. The disadvantage of this method is that the integral kernel of Sommerfeld integral may contain a plurality of different exponential terms, and the integral path must contain each saddle point, and when the number of layers is large, the number of saddle points increases rapidly, so that it is not suitable for solving the general multi-layer structure problem.
Disclosure of Invention
The present application aims to solve at least one of the technical problems in the related art to some extent.
Therefore, a first object of the present application is to provide a GPU-based hierarchical media green function fast computing method, which solves the technical problem that the existing method is difficult to apply in a multi-layer structure, realizes one-time parallel computing of multiple-parameter somofel integral, and can greatly improve computing efficiency of Sommerfeld integral in green functions in hierarchical media.
A second object of the present application is to propose a GPU-based hierarchical media green function fast computing device.
To achieve the above objective, an embodiment of a first aspect of the present application provides a GPU-based hierarchical media green function fast computing method, including: initializing a three-dimensional grid of the GPU and the thread number of each thread block; the method comprises the steps of filling a matrix by using calculation tasks of SI of a plurality of parameter points and a plurality of space points contained in an initialized GPU, promoting numerical integration of the SI into matrix products, uniformly distributing calculation tasks of items of the matrix into each thread block for parallel execution, and obtaining SI calculation results of the plurality of parameter points and the plurality of space points once, wherein the SI calculation results comprise SI head integration results and tail integration results, and the calculation process in each thread block comprises the following steps: and (3) performing matrix product by using a CUDA matrix operation unit Tensor Core, calculating the segment integral of the head and the tail, and accelerating convergence of the segment integral result by using Euler transformation when the tail integral is calculated.
According to the GPU-based hierarchical medium green function rapid calculation method, the spectral domain green function and the Bessel function are repeatedly utilized in the parameter scanning process to optimize a calculation framework, so that the Somoprofen integral is converted into a two-matrix multiplication form, and the Somoprofen integral parallel scheme for calculating a plurality of frequencies or a plurality of planar hierarchical medium parameters at one time is realized by utilizing the powerful parallel calculation capacity of the GPU and a special matrix operation unit Tensor Core and deriving an Euler transformation expression acceleration tail integral.
Optionally, in one embodiment of the present application, using the initialized calculation task filling matrix of SI of the plurality of parameter points and the plurality of spatial points included in the GPU, the numerical integration of SI is extrapolated to a matrix product, including:
GPU including setup initialization Each parameter point,Calculating tasks of the Somoprofil integral of each space point, and determining SI of M parameter points and N space points calculated by each thread block;
the SI calculated by each thread block is arranged in an mxn matrix, each column representing SI of a different parameter point, and each row representing SI of a different spatial point, such that numerical integration of SI extrapolates the matrix product, and results in a first matrix and a second matrix.
Alternatively, in one embodiment of the application, the matrix product isFirst matrixIs thatMatrix, the first matrix is composed of spectral domain green's function of M parameter points and K integral sampling points, the second matrixThe term of (2) consists of the result of the calculation of the product of the Bessel function and the integral weight coefficient.
Optionally, in one embodiment of the present application, the somofil integral includes a head integral and a tail integral, the head integral and the tail integral being both piecewise integrals, the head integral being expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,For the lateral distance between the field point and the source point,As a first class of Bessel functions,The order of the Bessel function, a is the long axis,AndThe weights and the samples are respectively given,SI integration results of the ith sampling point along the elliptical path are represented, and N represents the number of integrated sampling points;
The tail integral is expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,As a first class of Bessel functions,For the order of the Bessel function,Is the lateral distance between the field point and the source point, a is the long axis,AndRespectively representing weight and sampling points, L represents the number of sampling points dividing the tail integration interval into sub-intervals, N represents the number of sampling points of the sub-integration interval,And the calculation result after the tail integral subinterval Euler transformation is shown.
Alternatively, in one embodiment of the application, the SI tail integral after the Kth recursion at the time of computation is expressed as:
wherein N represents the number of sampling points of the tail integration dividing subinterval, A coefficient representing the corresponding segment integration value,Representing the integration value of each segment of the tail integral.
To achieve the above object, according to a second aspect of the present application, there is provided a GPU-based hierarchical media green function fast computing device, comprising a CPU, a GPU, the CPU including a memory, wherein,
The CPU is used for initializing the GPU, filling a matrix by using the calculation tasks of SI of a plurality of parameter points and a plurality of space points contained in the initialized GPU, promoting the numerical integration of the SI into a matrix product, and storing the promoted data into the memory;
The GPU is used for uniformly distributing calculation tasks of the matrix items into each thread block for parallel execution, obtaining SI calculation results of a plurality of parameter points and a plurality of space points at a time, and transmitting integration calculation results into a memory of the CPU through a PCIe bus, wherein the SI calculation results comprise SI head integration results and tail integration results, and the calculation process in each thread block comprises the following steps:
and (3) performing matrix product by using a CUDA matrix operation unit Tensor Core, calculating the segment integral of the head and the tail, and accelerating convergence of the segment integral result by using Euler transformation when the tail integral is calculated.
Optionally, in one embodiment of the present application, using the initialized calculation task filling matrix of SI of the plurality of parameter points and the plurality of spatial points included in the GPU, the numerical integration of SI is extrapolated to a matrix product, including:
GPU including setup initialization Each parameter point,Calculating tasks of the Somoprofil integral of each space point, and determining SI of M parameter points and N space points calculated by each thread block;
the SI calculated by each thread block is arranged in an mxn matrix, each column representing SI of a different parameter point, and each row representing SI of a different spatial point, such that numerical integration of SI extrapolates the matrix product, and results in a first matrix and a second matrix.
Alternatively, in one embodiment of the application, the matrix product isFirst matrixIs thatMatrix, the first matrix is composed of spectral domain green's function of M parameter points and K integral sampling points, the second matrixThe term of (2) consists of the result of the calculation of the product of the Bessel function and the integral weight coefficient.
Optionally, in one embodiment of the present application, the somofil integral includes a head integral and a tail integral, the head integral and the tail integral being both piecewise integrals, the head integral being expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,For the lateral distance between the field point and the source point,As a first class of Bessel functions,The order of the Bessel function, a is the long axis,AndThe weights and the samples are respectively given,SI integration results of the ith sampling point along the elliptical path are represented, and N represents the number of integrated sampling points;
The tail integral is expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,As a first class of Bessel functions,For the order of the Bessel function,Is the lateral distance between the field point and the source point, a is the long axis,AndRespectively representing weight and sampling points, L represents the number of sampling points dividing the tail integration interval into sub-intervals, N represents the number of sampling points of the sub-integration interval,And the calculation result after the tail integral subinterval Euler transformation is shown.
Alternatively, in one embodiment of the present application, the implementation of the simplified Euler extrapolation method is derived by a formula, where the SI tail integral after the kth recursion at the time of calculation is expressed as:
wherein N represents the number of sampling points of the tail integration dividing subinterval, A coefficient representing the corresponding segment integration value,Representing the integration value of each segment of the tail integral.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flowchart of a method for quickly computing a hierarchical media green function based on a GPU according to an embodiment of the present application;
FIG. 2 is a geometric representation of a planar layered media according to an embodiment of the present application;
FIG. 3 is a Sommerfeld numerical integration path diagram according to an embodiment of the present application;
FIG. 4 is a diagram of an example of multiplication of a computing matrix using tensor kernels according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a multi-parameter Sommerfeld integration one-time parallel computing scheme according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a three-layered media model in accordance with an embodiment of the present application;
FIG. 7 shows layered green's functions at different frequencies for a three-layer medium according to an embodiment of the application And (3) withA first exemplary plot of the magnitude and relative error of the components;
FIG. 8 shows layered green's function at different frequencies for a three-layer medium according to an embodiment of the application And (3) withA second exemplary plot of the magnitude and relative error of the components;
FIG. 9 shows layered green's functions at different frequencies for a three-layer medium according to an embodiment of the application And (3) withA third exemplary plot of the magnitude and relative error of the components;
FIG. 10 shows layered green's function at different frequencies for a three-layer medium according to an embodiment of the application And (3) withA fourth exemplary plot of magnitude and relative error of the components;
FIG. 11 shows the layered green's function of a three-layer medium according to an embodiment of the present application at different dielectric constants AndA first exemplary plot of the magnitude and relative error of the components;
FIG. 12 shows the layered green function of a three-layer medium according to an embodiment of the present application at different dielectric constants AndA second exemplary plot of the magnitude and relative error of the components;
FIG. 13 shows the layered green function of a three-layer medium according to an embodiment of the present application at different dielectric constants AndA third exemplary plot of the magnitude and relative error of the components;
FIG. 14 shows the layered green's function of a three-layer medium of an embodiment of the application at different dielectric constants AndA fourth exemplary plot of magnitude and relative error of the components;
FIG. 15 shows the layered green function of a three-layer medium according to an embodiment of the application at different medium thicknesses AndA first exemplary plot of the magnitude and relative error of the components;
FIG. 16 shows the layered green function of a three-layer medium according to an embodiment of the application at different medium thicknesses AndA second exemplary plot of the magnitude and relative error of the components;
FIG. 17 shows the layered green function of a three-layer medium according to an embodiment of the application at different medium thicknesses AndA third exemplary plot of the magnitude and relative error of the components;
FIG. 18 shows the layered green function of a three-layer medium according to an embodiment of the application at different medium thicknesses AndA fourth exemplary plot of magnitude and relative error of the components;
fig. 19 is a schematic structural diagram of a GPU-based hierarchical media green function fast computing device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.
In related studies, a wide variety of different forms of integral equations and green's functions are used to analyze specific hierarchical media problems. Such as electric field integration equations (ELECTRIC FIELD INTEGRAL equation, EFIE), mixed potential integration equations (Mixed Potential integral equation, MPIE), and the like. MPIE has been successfully used for the calculation of planar microstrip antennas and higher calculation accuracy is achieved. The most important step in solving MPIE of the layered media is the calculation of the green's function. Starting from Maxwell's equation, each field quantity in the space domain is converted into a spectral domain through Fourier transformation, the result is found to have the same form of a transmission line equation through derivation, so that a layered medium is equivalent to a transmission line structure, a general expression of a spectral domain Grignard function is derived by utilizing a transmission line theory, and the Grignard function in the form of the space domain of the layered medium can be obtained through calculation of Sommerfeld integral. When the electromagnetic characteristics of the layered medium are analyzed by using a MPIE-based moment method, a green function is combined with a mixed bit integral equation, then a spatial green function is adopted, RWG basis function expansion current distribution is selected, a gamma method is used for matching to obtain a matrix equation, and finally the matrix equation is solved to obtain the current distribution, so that the electromagnetic response of the target in the layered medium is obtained. The most commonly used MPIE can be expressed as:
wherein, Representing the magnetic vector bits and the electric scalar bit green functions respectively,As a known current source,AndRespectively the angular frequency and the permittivity and permeability in free space,Is expressed as a matrix:
Fig. 1 is a flowchart of a GPU-based hierarchical media green function fast computing method according to an embodiment of the present application.
As shown in fig. 1, the GPU-based hierarchical media green function fast computation method includes the following steps:
step 101, initializing a three-dimensional grid of a GPU and the thread number of each thread block;
Step 102, using the initialized calculation task filling matrix of SI of a plurality of parameter points and a plurality of space points contained in the GPU, promoting the numerical integration of SI into matrix products, uniformly distributing the calculation tasks of the entries of the matrix into each thread block for parallel execution, and obtaining SI calculation results of the plurality of parameter points and the plurality of space points once, wherein the SI calculation results comprise SI head integration results and tail integration results, and the calculation process in each thread block comprises the following steps:
and (3) performing matrix product by using a CUDA matrix operation unit Tensor Core, calculating the segment integral of the head and the tail, and accelerating convergence of the segment integral result by using Euler transformation when the tail integral is calculated.
In some embodiments, multiple value points of other layered media structure parameters included in the GPU may also be obtained, and the rapid scan of the layered media structure parameters may be implemented by using the multiple value points and the calculation task filling matrix of SI of multiple spatial points.
In some embodiments, calculating the matrix product using a Tensor Core in the CUDA greatly improves the computational efficiency over numerical integration calculations.
In some embodiments, when each thread block performs computation, the head integral is computed first and the tail integral is computed, the sum of the two results is the sum of the two results, the two results are serial processes, the head and tail segment integral computation processes are similar, and the tail integral is one step more than the head integral, and the result of the segment integral is computed into an Euler transformation.
In some embodiments, in tail integration calculations, it is generally desirable to employ extrapolation algorithms to accelerate convergence for segmented integration results, with common extrapolation algorithms including Euler transforms, average weighted transforms, levin transforms, shanks transforms, and the like.
According to the GPU-based hierarchical medium green function rapid calculation method, the spectral domain green function and the Bessel function are repeatedly utilized in the parameter scanning process to optimize a calculation framework, so that the Somoprofen integral is converted into a two-matrix multiplication form, and the Somoprofen integral parallel scheme for calculating a plurality of frequencies or a plurality of planar hierarchical medium parameters at one time is realized by utilizing the powerful parallel calculation capacity of the GPU and a special matrix operation unit Tensor Core and deriving an Euler transformation expression acceleration tail integral.
Optionally, in one embodiment of the present application, using the initialized calculation task filling matrix of SI of the plurality of parameter points and the plurality of spatial points included in the GPU, the numerical integration of SI is extrapolated to a matrix product, including:
GPU including setup initialization Each parameter point,Calculating tasks of the Somoprofil integral of each space point, and determining SI of M parameter points and N space points calculated by each thread block;
the SI calculated by each thread block is arranged in an mxn matrix, each column representing SI of a different parameter point, and each row representing SI of a different spatial point, such that numerical integration of SI extrapolates the matrix product, and results in a first matrix and a second matrix.
Alternatively, in one embodiment of the application, the matrix product isFirst matrixIs thatMatrix, the first matrix is composed of spectral domain green's function of M parameter points and K integral sampling points, the second matrixThe term of (2) consists of the result of the calculation of the product of the Bessel function and the integral weight coefficient.
Optionally, in one embodiment of the present application, the somofil integral includes a head integral and a tail integral, the head integral and the tail integral being both piecewise integrals, the head integral being expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,For the lateral distance between the field point and the source point,As a first class of Bessel functions,The order of the Bessel function, a is the long axis,AndThe weights and the samples are respectively given,SI integration results of the ith sampling point along the elliptical path are represented, and N represents the number of integrated sampling points;
The tail integral is expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,As a first class of Bessel functions,For the order of the Bessel function,Is the lateral distance between the field point and the source point, a is the long axis,AndRespectively representing weight and sampling points, L represents the number of sampling points dividing the tail integration interval into sub-intervals, N represents the number of sampling points of the sub-integration interval,And the calculation result after the tail integral subinterval Euler transformation is shown.
Alternatively, in one embodiment of the present application, the implementation of the Euler extrapolation method is simplified by a formula derivation, where the SI tail integral after the kth recursion at the time of calculation is expressed as:
wherein N represents the number of sampling points of the tail integration dividing subinterval, A coefficient representing the corresponding segment integration value,Representing the integration value of each segment of the tail integral.
The GPU-based hierarchical media green function fast computation method of the present application is described in detail below with specific embodiments.
Planar layered media space, meaning that the discontinuity of the media occurs only in one direction in three dimensions and the media does not change in the other two directions orthogonal thereto, can be generally represented by a layered media as shown in FIG. 2, with the media interface locatedFirst, theThe relative permittivity and permeability of the layers are respectively. The dielectric constant and the magnetic permeability of the top medium are respectivelyIs assumed to be the time-harmonic factor
In the spectral domain, the transmission line equation is utilized to solve the spectral domain green's function, and then the spatial green's function can be obtained through two-dimensional inverse Fourier transform, and the form of Sommerfeld integral is expressed as follows:
(1)
wherein, Representing the green's function of the spectral domain,,The vertical coordinates of the field point and the source point, respectively; Is based on the position of the field source And) The transverse wave number is obtained through transmission line theory; as a first class of Bessel functions, Order as Bessel function; is the lateral distance between the field point and the source point;
As shown in fig. 3, the integration path is divided into Head (Head) integration And Tail (Tail) integration. Wherein the head integration path adopts a semi-elliptical path. Long axisTypically greater than the maximum of wavenumbers in the layered medium, is selectedWherein. Selection ofThe values are as follows:
(2)
wherein, Representing wave numbers in free space, the integral expression in equation (1) can be modified by calculating the tail integral using the real axis integral path:
(3)
The SI head integration adopts an elliptic integration path in a complex plane, and a segmentation integration strategy is adopted for ensuring the accuracy of the result. Using the rules of the GAUSS-KRONROD integration, the head integration can be expressed as:
wherein, AndRespectively weighing and sampling; Represent the first Sommerfeld integration results of the sampling points along the elliptical path;
The Sommerfeld tail integration is semi-infinite integration on a real axis, and in order to accelerate convergence speed and improve calculation efficiency, a classical Euler transformation technology is adopted to simplify truncated infinite integration into L-segment segmentation integration. The piecewise integral can be written as:
(5)
wherein, AndRespectively representing weights and sampling points. Section intervalFor Bessel functions of the first kindTwo adjacent zero-points are spaced apart.
(1) Optimizing computing architecture acceleration
For the somofil integral shown in equation (1), it can be seen that the multiplicative product function is a form of multiplication of the spectral domain green function with the bessel function. Where the spectral domain green's function is a function that is frequency dependent and layered media parameter independent of the field source lateral distance, while the Bessel's function is dependent only on the field source lateral distance. This means that in a multiparameter calculation of the Soxhlet integral, the Bessel function only needs to be calculated once for the same field source, different frequencies or different layered medium parameters. Also, the spectral domain green's function need only be calculated once for the same frequency or the same layered media parameters, different field sources. The calculation efficiency of the Somoprofil integral multiparameter can be greatly improved by multiplexing the spectral domain Green function and the Bessel function.
To quantify this acceleration performance, the following assumptions are made, taking the frequency sweep as an example:
1) Representing calculation of a spectral domain green's function of a frequency point in a spatial point Is a calculation time of (a).
2)Representing calculation of a spatial point Bessel function of the first typeIs a calculation time of (a).
3)Representing calculation of the weighted summation of the calculation resultsIs used for the time period of (a),Representation ofThe total number of points is integrated. It is obvious that the process is not limited to,Is generally much lower thanOr (b)
Calculation ofEach parameter point,The total time of the somofil integral of each spatial point can be written asThe parallel computing scheme using conventional single-parameter point-multiple spatial points SI calculates the total computation timeCan be written as:
(6)
if the intermediate data is effectively used, the time is calculated by using the architecture optimization method provided by the application Becomes as follows
(7)
Therefore, compared with the scheme of single parameter point SI cyclic calculation, the theoretical speed-up ratio is:
(8)
obviously, the acceleration ratio is a function of the number of parameter points And the number of space pointsAnd increases with increasing numbers of (c). In the practical application of the present invention,AndUsually a large number, in particular the number of space points. This means that a very high speed ratio can be achieved by multiplexing the spectral domain green function with the bessel function.
(2) Matrix operation acceleration integration
In the case of a GPU, the processing unit,Each parameter point,The calculation task of the Solomon integral of each space point is uniformly distributed into each thread block, one thread block calculates the Solomon integral of M parameter points and N space points, and the total is thatThe thread blocks perform the integration computation simultaneously. To calculate havingDistance of each spaceSI frequency sweeps for frequency points may be arranged in an mxn matrix, where each column represents SI for a different parameter point and each row represents SI for a point with a different spatial distance. By doing so, the numerical integration of SI (including SI head integration and tail integration) can be generalized into matrix product, and SI calculation results of a plurality of parameter points and a plurality of space points can be obtained at one time:
(9)
In the formula (9), the amino acid sequence of the compound, Matrix arrayA spectral domain green's function consisting of M parameter points and K integral sampling points, and a matrixIs composed of the result of calculation by the product of the Bessel function and the integral weight coefficient.
In modern Nvidia GPUs, there are two hardware units available to perform the matrix product (10): a CUDA core (CUDA core) and a Tensor core (Tensor core). The CUDA core is a basic processing unit on the GPU that can perform simple floating point operations and is optimized for parallel computing workloads. Tensor kernels are newer processing units in CUDA specifically designed to accelerate tensor operations widely used in deep learning and artificial intelligence applications. The tensor core performs matrix multiplication or addition more efficiently than the CUDA core for a generic parallel workload. In CUDA version 12.2, tensor Core can only execute one at a timeWith another oneIs a matrix multiplication of (a). Thus, for each thread intra-block size isA kind of electronic deviceAnd has the size ofA kind of electronic deviceThe products of (a) require block matrix multiplication, as shown in FIG. 4, the matrix is thenAndDivided into small matrices, then the products of these small matrices are calculated separately using tensor kernels, and then they are combined together to obtain the matrix
(3) Eulter extrapolation acceleration
In the tail integration calculation of SI, an Euler extrapolation algorithm is required to accelerate the convergence of the infinite integration. As described in algorithm 1, the input to the Euler transform is a sequence of integrated values of the L segments, and is stored in a shared memory of the thread block, specifically,
The inputs to algorithm 1 are: segment integration value: the output of algorithm 1 is: tail integral value:
The process of algorithm 1 is: ;k=0;while,End;;return
However, for the calculation of the multiparameter Sommerfeld tail integral, the shared memory is difficult to meet the storage space required by the Euler transformation, and the bank conflict may be caused by frequent read-write shared memory operation. In order to solve the above problems, the equation of the Euler transformation is derived, and numerical calculation is simplified. According to algorithm 1, the SI tail after the kth recursion can be explicitly written as:
(10)
the arrangement rule of Yang Hui triangles is met,
(11)
From the formula (10) and the formula (11):
(12)
in this way, adaptive loops in algorithm 1 are avoided and piecewise integration is eliminated The occupation of the shared memory of the thread blocks avoids the problem of reduced calculation efficiency caused by more occupied memory.
(4) The one-time parallel computing scheme of the multiparameter Sommerfeld integration is shown in FIG. 5, and is specifically as follows:
1) Initializing thread hierarchy configuration; initializing a three-dimensional grid of the GPU to (32,32,1); the thread number of each thread block is (32,1,1)
2) Filling matrixAnd; Matrix arrayAndThe computation of the midterm is divided equally by a total of 1024 thread blocks, and the computation tasks in each block are allocated 32 threads to execute in parallel. The matrix is filled and stored in a register,
3) Matrix multiplication calculates head-tail section integral; SI computation results of a plurality of parameter points and a plurality of spatial points are obtained at one time by performing matrix multiplication using a CUDA core and a tensor core.
4) In tail integral calculation, continuously calculating Euler transformation acceleration convergence on segmented integral results; the head integral calculation is directly performed on the obtained matrixStoring in a global memory;
5) The GPU transmits the integral calculation result to the CPU through the PCIe bus;
The application aims at calculating a layered green function in a layered medium with high performance, and provides a rapid scanning method for parameters of the layered green function. By repeatedly utilizing the integrated function in the multi-parameter, the accurate and efficient calculation of the layered green's function in the multi-parameter point or multi-layered medium parameter is realized by the aid of the efficient parallel calculation capability of the GPU.
The application calculates the layered green function of the three-layer microstrip structureAndThe components are used as examples, and the beneficial effects are shown by respectively carrying out multi-parameter calculation on the frequency, the relative dielectric constant of the layered medium and the height of the layered medium. The example model layered media parameters are shown in fig. 6. The application adopts a computing platform GPU1 of NVIDIA RTX 6000 Ada, GPU2 of NVIDIA GeForce RTX4090 and CPU of Intel Xeon (R) platform 8280.
(1) Frequency sweep
Sampling 64 parameter points at equal intervals of 0.1GHz within the range of 2GHz to 8GHz of the frequency band=64). On the interface between free space and medium=0,=0), Field source lateral distanceWithin the range, 500,000 spatial points are sampled at equal intervals=500,000. The calculation results and the relative errors are shown in fig. 7, 8, 9 and 10 by adopting a task division scheme of m=64 and n=96. Fig. 7, 8, 9, and 10 show the calculation results at three parameter points at 2GHz, 4GHz, and 6GHz among 64 frequencies, and as can be seen from fig. 7, 8, 9, and 10,And (3) withIs kept atIn the following, it is proved that the method provided by the application has good calculation accuracy.
To demonstrate the time advantage of the proposed method, the calculation times for the two methods to calculate the 64 parameter points and 500,000 spatial points were counted as shown in table 1. It can be seen that under the scheme of m=64 and n=96 task division, compared with the conventional method using OpenMP parallelism, the method provided by the application realizes 1914 times of acceleration in head integration, 1226 times of acceleration in tail integration, shortens the total time from 24960.62 seconds to less than 14.84 seconds, and realizes over 1600 times of acceleration.
Table 1 calculation of three-layer media using different methodsTime of (m=64, n=96)
(2) Relative permittivity scan
At the frequency of 8GHz and the thickness of the medium of 0.254mm, 64 relative dielectric constants are sampled at equal intervals within the range of 3.2-9.6 according to the interval of 0.1, and the relative dielectric constants are sampled at the interface between the free space and the medium=0,=0), Field source lateral distanceSampling 500,000 space points at equal intervals in the range=500,000), Calculates the correspondingAndThe components, results and relative errors are shown in fig. 11, 12, 13 and 14. Fig. 11, 12, 13 and 14 show only the calculated results when the relative dielectric constants are 3.6, 6.6 and 9.6, and it can be found that the relative errors of the method provided by the application are allIn the following, it has been demonstrated that the proposed method is still applicable to the scanning of dielectric constants, where the calculation time is substantially consistent with the multi-parameter point method.
(3) Layered media height scanning
At the frequency of 8GHz and the relative dielectric constant of the medium of 9.6, 64 medium thicknesses are sampled at equal intervals of 0.02mm within the range of 0.254mm-1.534mm, and the thickness of the medium is equal to that of the medium at the interface between the free space and the medium=0,=0), Field source lateral distanceSampling 500,000 space points at equal intervals in the range=500,000), Calculates the correspondingAndThe components, results and relative errors are shown in fig. 15, 16, 17 and 18. Fig. 15, 16, 17, 18 only show the calculated results for medium thicknesses of 0.254mm, 0.854mm, and 1.454mm, it can be seen that the relative error remains at restIn the following, it has been demonstrated that the proposed method is still applicable for simultaneous calculation of a plurality of layered media thicknesses, where the calculation time is also substantially consistent with the multi-parameter point method.
Numerical experiments show that the relative error of the method provided by the application relative to the traditional method is kept atIn the following, there are also great advantages in terms of time. Compared with a single-parameter point method adopting OpenMP acceleration in a high-end CPU, in the three-layer medium structure example, the time is shortened to less than 14.84s from 24960.62 seconds, and 1682 times of acceleration is realized. The method provided by the application has obvious acceleration effect on simulation and parameter optimization of the microstrip circuit and the microwave integrated circuit.
In order to implement the above embodiment, the present application further provides a GPU-based hierarchical media green function fast computing device.
Fig. 19 is a schematic structural diagram of a GPU-based hierarchical media green function fast computing device according to an embodiment of the present application.
As shown in fig. 19, the GPU-based hierarchical media green function fast computing device includes a CPU, a GPU, the CPU including memory, wherein,
The CPU is used for initializing the GPU, filling a matrix by using the calculation tasks of SI of a plurality of parameter points and a plurality of space points contained in the initialized GPU, promoting the numerical integration of the SI into a matrix product, and storing the promoted data into the memory;
The GPU is used for uniformly distributing calculation tasks of the matrix items into each thread block for parallel execution, obtaining SI calculation results of a plurality of parameter points and a plurality of space points at a time, and transmitting integration calculation results into a memory of the CPU through a PCIe bus, wherein the SI calculation results comprise SI head integration results and tail integration results, and the calculation process in each thread block comprises the following steps:
and (3) performing matrix product by using a CUDA matrix operation unit Tensor Core, calculating the segment integral of the head and the tail, and accelerating convergence of the segment integral result by using Euler transformation when the tail integral is calculated.
Optionally, in one embodiment of the present application, using the initialized calculation task filling matrix of SI of the plurality of parameter points and the plurality of spatial points included in the GPU, the numerical integration of SI is extrapolated to a matrix product, including:
GPU including setup initialization Each parameter point,Calculating tasks of the Somoprofil integral of each space point, and determining SI of M parameter points and N space points calculated by each thread block;
the SI calculated by each thread block is arranged in an mxn matrix, each column representing SI of a different parameter point, and each row representing SI of a different spatial point, such that numerical integration of SI extrapolates the matrix product, and results in a first matrix and a second matrix.
Alternatively, in one embodiment of the application, the matrix product isFirst matrixIs thatMatrix, the first matrix is composed of spectral domain green's function of M parameter points and K integral sampling points, the second matrixThe term of (2) consists of the result of the calculation of the product of the Bessel function and the integral weight coefficient.
Optionally, in one embodiment of the present application, the somofil integral includes a head integral and a tail integral, the head integral and the tail integral being both piecewise integrals, the head integral being expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,For the lateral distance between the field point and the source point,As a first class of Bessel functions,The order of the Bessel function, a is the long axis,AndThe weights and the samples are respectively given,SI integration results of the ith sampling point along the elliptical path are represented, and N represents the number of integrated sampling points;
The tail integral is expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,As a first class of Bessel functions,For the order of the Bessel function,Is the lateral distance between the field point and the source point, a is the long axis,AndRespectively representing weight and sampling points, L represents the number of sampling points dividing the tail integration interval into sub-intervals, N represents the number of sampling points of the sub-integration interval,And the calculation result after the tail integral subinterval Euler transformation is shown.
Alternatively, in one embodiment of the present application, the implementation of the simplified Euler extrapolation method is derived by a formula, where the SI tail integral after the kth recursion at the time of calculation is expressed as:
wherein N represents the number of sampling points of the tail integration dividing subinterval, A coefficient representing the corresponding segment integration value,Representing the integration value of each segment of the tail integral.
It should be noted that the foregoing explanation of the embodiment of the GPU-based hierarchical media green function fast computing method is also applicable to the GPU-based hierarchical media green function fast computing device of the embodiment, and will not be repeated herein.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (6)

1. A GPU-based hierarchical media green's function fast computing method, characterized in that the method is applied to electromagnetic simulation of radio frequency integrated circuits or devices in planar hierarchical media, the method comprising the steps of:
Initializing a three-dimensional grid of the GPU and the thread number of each thread block;
using a calculation task filling matrix of SI of a plurality of parameter points and a plurality of space points contained in the initialized GPU, promoting numerical integration of the SI into matrix products, uniformly distributing calculation tasks of entries of the matrix into each thread block for parallel execution, and obtaining SI calculation results of the plurality of parameter points and the plurality of space points once, wherein the SI calculation results comprise SI head integration results and tail integration results, and the calculation process in each thread block comprises the following steps:
The matrix product is executed by using a CUDA matrix operation unit Tensor Core, the segment integration of the head and the tail is calculated, and the segment integration result is accelerated to converge by using Euler transformation when the tail integration is calculated;
the method for calculating the task filling matrix of the SI by using the plurality of parameter points and the plurality of space points contained in the initialized GPU, and pushing the numerical integral of the SI to be a matrix product comprises the following steps:
GPU including setup initialization Each parameter point,Calculating tasks of the Somoprofil integral of each space point, and determining SI of M parameter points and N space points calculated by each thread block;
Arranging the SI calculated by each thread block in an MXN matrix, wherein each column represents the SI of different parameter points, and each row represents the SI with different space points, so that the numerical integration of the SI is used for promoting the matrix product, and a first matrix and a second matrix are obtained;
The matrix product is The first matrixIs thatA matrix, wherein the first matrix is composed of a spectral domain green function of M parameter points and K integral sampling points, and the second matrixThe term of (2) consists of the result of the calculation of the product of the Bessel function and the integral weight coefficient.
2. A GPU-based hierarchical media green's function fast computation method according to claim 1, wherein the somofil integral comprises a head integral and a tail integral, both of which are piecewise integral, the head integral being expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,For the lateral distance between the field point and the source point,As a first class of Bessel functions,The order of the Bessel function, a is the long axis,AndThe weights and the samples are respectively given,SI integration results of the ith sampling point along the elliptical path are represented, and N represents the number of integrated sampling points;
The tail integral is expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,As a first class of Bessel functions,For the order of the Bessel function,Is the lateral distance between the field point and the source point, a is the long axis,AndRespectively representing weight and sampling points, L represents the number of sampling points dividing the tail integration interval into sub-intervals, N represents the number of sampling points of the sub-integration interval,And the calculation result after the tail integral subinterval Euler transformation is shown.
3. A GPU-based hierarchical media green's function fast computation method according to claim 1, wherein the implementation of the simplified Euler extrapolation method is derived by a formula, and the SI tail integral after the kth recursion at the time of computation is expressed as:
wherein N represents the number of sampling points of the tail integration dividing subinterval, A coefficient representing the corresponding segment integration value,Representing the integration value of each segment of the tail integral.
4. A GPU-based hierarchical media green function rapid computing device, which is characterized by being applied to electromagnetic simulation of a radio frequency integrated circuit or a device in a planar hierarchical medium, comprising a CPU and a GPU, wherein the CPU comprises a memory,
The CPU is used for initializing the GPU, filling matrixes by using calculation tasks of SI of a plurality of parameter points and a plurality of space points contained in the initialized GPU, promoting numerical integration of the SI into matrix products, and storing promoted data into the memory;
the GPU is used for uniformly distributing calculation tasks of the matrix items to each thread block for parallel execution, obtaining SI calculation results of a plurality of parameter points and a plurality of space points at a time, and transmitting the integral calculation results to a memory of the CPU through a PCIe bus, wherein the SI calculation results comprise SI head integral results and tail integral results, and the calculation process in each thread block comprises the following steps:
The matrix product is executed by using a CUDA matrix operation unit Tensor Core, the segment integration of the head and the tail is calculated, and the segment integration result is accelerated to converge by using Euler transformation when the tail integration is calculated;
the method for calculating the task filling matrix of the SI by using the plurality of parameter points and the plurality of space points contained in the initialized GPU, and pushing the numerical integral of the SI to be a matrix product comprises the following steps:
GPU including setup initialization Each parameter point,Calculating tasks of the Somoprofil integral of each space point, and determining SI of M parameter points and N space points calculated by each thread block;
Arranging the SI calculated by each thread block in an MXN matrix, wherein each column represents the SI of different parameter points, and each row represents the SI with different space points, so that the numerical integration of the SI is used for promoting the matrix product, and a first matrix and a second matrix are obtained;
The matrix product is The first matrixIs thatA matrix, wherein the first matrix is composed of a spectral domain green function of M parameter points and K integral sampling points, and the second matrixThe term of (2) consists of the result of the calculation of the product of the Bessel function and the integral weight coefficient.
5. A GPU-based hierarchical media green's function fast computing device according to claim 4, wherein the somofil integral comprises a head integral and a tail integral, both of which are piecewise integral, the head integral expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,For the lateral distance between the field point and the source point,As a first class of Bessel functions,The order of the Bessel function, a is the long axis,AndThe weights and the samples are respectively given,SI integration results of the ith sampling point along the elliptical path are represented, and N represents the number of integrated sampling points;
The tail integral is expressed as:
wherein, Representing the green's function of the spectral domain,The vertical coordinates of the field point and the source point,For the transverse wave number based on the field source position calculated by transmission line theory,As a first class of Bessel functions,For the order of the Bessel function,Is the lateral distance between the field point and the source point, a is the long axis,AndRespectively representing weight and sampling points, L represents the number of sampling points dividing the tail integration interval into sub-intervals, N represents the number of sampling points of the sub-integration interval,And the calculation result after the tail integral subinterval Euler transformation is shown.
6. A GPU-based hierarchical media green's function fast computing device according to claim 4, wherein the simplified Euler extrapolation method is implemented by a formula derivation, and the SI tail integral after the kth recursion at the time of computation is expressed as:
wherein N represents the number of sampling points of the tail integration dividing subinterval, A coefficient representing the corresponding segment integration value,Representing the integration value of each segment of the tail integral.
CN202410503575.9A 2024-04-25 GPU-based hierarchical media green function rapid calculation method and device Active CN118069969B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410503575.9A CN118069969B (en) 2024-04-25 GPU-based hierarchical media green function rapid calculation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410503575.9A CN118069969B (en) 2024-04-25 GPU-based hierarchical media green function rapid calculation method and device

Publications (2)

Publication Number Publication Date
CN118069969A CN118069969A (en) 2024-05-24
CN118069969B true CN118069969B (en) 2024-07-09

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368454A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating
CN107368368A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368454A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating
CN107368368A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method

Similar Documents

Publication Publication Date Title
EP3179415B1 (en) Systems and methods for a multi-core optimized recurrent neural network
US20180157969A1 (en) Apparatus and Method for Achieving Accelerator of Sparse Convolutional Neural Network
Ergul et al. A hierarchical partitioning strategy for an efficient parallelization of the multilevel fast multipole algorithm
WO2020142193A1 (en) Adjusting precision and topology parameters for neural network training based on a performance metric
Liu et al. Towards an efficient accelerator for DNN-based remote sensing image segmentation on FPGAs
Fan et al. Reconfigurable acceleration of 3D-CNNs for human action recognition with block floating-point representation
CN112199636B (en) Fast convolution method and device suitable for microprocessor
CN109726441B (en) Body and surface mixed GPU parallel computing electromagnetism DGTD method
CN110163333A (en) The parallel optimization method of convolutional neural networks
CN108802726A (en) Synthetic aperture radar image-forming method based on graphics processor GPU
CN109993293A (en) A kind of deep learning accelerator suitable for stack hourglass network
Dziekonski et al. Communication and load balancing optimization for finite element electromagnetic simulations using multi-GPU workstation
CN110414672B (en) Convolution operation method, device and system
US20040078174A1 (en) Sparse and efficient block factorization for interaction data
CN115760874A (en) Multi-scale U-Net medical image segmentation method based on joint spatial domain
Duan et al. Energy-efficient architecture for FPGA-based deep convolutional neural networks with binary weights
Bjerge et al. A scalable and efficient convolutional neural network accelerator using HLS for a system-on-chip design
CN106646664B (en) Human body microwave echoes analogy method based on GPU and system
Niu et al. SPEC2: Spectral sparse CNN accelerator on FPGAs
CN114755652A (en) Method for acquiring electrically large-size target broadband RCS (radar cross section) based on ACA (advanced communication architecture) and CAT (CAT)
CN118069969B (en) GPU-based hierarchical media green function rapid calculation method and device
CN112329204B (en) Method for rapidly analyzing electromagnetic characteristic model of repetitive structure by considering carrier platform coupling
CN118069969A (en) GPU-based hierarchical media green function rapid calculation method and device
Barhen et al. High performance FFT on multicore processors
CN110736970A (en) Radar target rapid identification method based on ASIC machine learning processor

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant