CN104680235A

CN104680235A - Design method of resonance frequency of circular microstrip antenna

Info

Publication number: CN104680235A
Application number: CN201510095406.7A
Authority: CN
Inventors: 陈风; 田雨波; 刘�东
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2015-03-03
Filing date: 2015-03-03
Publication date: 2015-06-03

Abstract

The invention discloses a method for designing the resonant frequency of a circular microstrip antenna. The particle swarm neural network is used to determine the relationship between the three related parameters of the circular microstrip antenna patch radius, the thickness of the dielectric substrate, and the relative permittivity, and the measured resonant frequency. Establish a mapping relationship between them, use GPU technology to perform parallel acceleration calculations on the training process of the particle swarm neural network under the unified computing device architecture programming environment, and the trained particle swarm neural network can be used to predict the resonance frequency of other circular microstrip antennas . The invention can overcome the shortcoming of too long calculation time of the particle swarm neural network at the CPU end, and improve the modeling speed and modeling accuracy of the resonant frequency of the circular microstrip antenna.

Description

Design Method of Resonant Frequency of Circular Microstrip Antenna

技术领域technical field

本发明涉及一种圆形微带天线谐振频率设计方法，尤其涉及一种基于统一计算设备架构的GPU端并行粒子群神经网络，对圆形微带天线谐振频率的设计方法，属于天线技术领域。The invention relates to a method for designing the resonant frequency of a circular microstrip antenna, in particular to a GPU-side parallel particle swarm neural network based on a unified computing device architecture, and a method for designing the resonant frequency of a circular microstrip antenna, which belongs to the technical field of antennas.

背景技术Background technique

微带天线由于具有诸多优点而获得了广泛应用，其中谐振频率是微带天线设计过程中最为重要的一个参数，直接决定天线设计的成败。神经网络(Artificial Neural Networks，ANNs)模型由于具有良好的学习和泛化能力，已被广泛应用于微带天线谐振频率设计建模。训练好的神经网络可以在微带天线相关参数(包括贴片尺寸、介质基片厚度、相对介电常数)和实测谐振频率之间建立起映射关系，从而完成对其他微带天线谐振频率的预测，该方法较传统的方法(主要分为解析法和数值法两类)在精度上具备明显优势，并且可以有效克服解析法对许多贴片结构不适用的缺点，以及数值法对贴片几何结构的改变需要重新计算的缺点。粒子群优化(Particle Swarm Optimization)作为一种容易实现、收敛速度快的全局优化算法，正被逐渐应用到神经网络权阈值的训练中，形成所谓的粒子群神经网络(Particle SwarmOptimization-Artificial Neural Networks，PSO-ANNs)，能比常用的误差反向传播神经网络获得更好收敛精度和更强的预测能力，粒子群神经网络也被用于微带天线谐振频率建模问题。Microstrip antennas have been widely used because of their many advantages, among which the resonant frequency is the most important parameter in the design process of microstrip antennas, which directly determines the success or failure of antenna design. Artificial Neural Networks (ANNs) models have been widely used in modeling the resonant frequency design of microstrip antennas due to their good learning and generalization capabilities. The trained neural network can establish a mapping relationship between the relevant parameters of the microstrip antenna (including patch size, dielectric substrate thickness, relative permittivity) and the measured resonance frequency, so as to complete the prediction of the resonance frequency of other microstrip antennas , this method has obvious advantages in accuracy compared with traditional methods (mainly divided into two categories: analytical method and numerical method), and can effectively overcome the shortcomings of analytical methods that are not applicable to many patch structures, and the numerical method is not suitable for patch geometric structures. The change requires a recalculation of the downside. Particle Swarm Optimization (Particle Swarm Optimization), as an easy-to-implement, fast-converging global optimization algorithm, is being gradually applied to the training of neural network weight thresholds to form the so-called Particle Swarm Optimization-Artificial Neural Networks, PSO-ANNs), can obtain better convergence accuracy and stronger prediction ability than the commonly used error backpropagation neural network, particle swarm neural network has also been used to model the resonant frequency of microstrip antennas.

现有技术的粒子群神经网络用于微带天线谐振频率建模时存在的的一大问题是训练时间较长，其原因是复杂网络结构和大量粒子数目导致计算复杂度较高。利用粒子群优化算法中天然具备的群体中个体行为的并行性，采用GPU技术并行化加速训练粒子群神经网络是解决该问题的有效思路。因此，研究设计一种基于统一计算设备架构的GPU端并行粒子群神经网络，应用于圆形微带天线谐振频率的快速建模具有十分重要的意义。A major problem in the prior art particle swarm neural network used for modeling the resonant frequency of microstrip antennas is that the training time is relatively long. The reason is that the complex network structure and the large number of particles lead to high computational complexity. Utilizing the parallelism of individual behaviors in the group that is inherent in the particle swarm optimization algorithm, using GPU technology to parallelize and accelerate the training of particle swarm neural networks is an effective way to solve this problem. Therefore, it is of great significance to study and design a GPU-side parallel particle swarm neural network based on a unified computing device architecture, which is applied to the rapid modeling of the resonant frequency of circular microstrip antennas.

发明内容Contents of the invention

本发明的目的在于提供一种圆形微带天线谐振频率设计方法，基于统一计算设备架构的GPU端并行粒子群神经网络，对圆形微带天线谐振频率设计进行快速建模，以克服现有技术CPU端粒子群神经网络计算时间过长的缺点。The purpose of the present invention is to provide a method for designing the resonant frequency of a circular microstrip antenna. Based on the GPU-side parallel particle swarm neural network of the unified computing device architecture, the design of the resonant frequency of the circular microstrip antenna is quickly modeled to overcome the existing The shortcoming of the technical CPU-side particle swarm neural network calculation time is too long.

本发明的目的通过以下技术方案予以实现：The purpose of the present invention is achieved through the following technical solutions:

一种圆形微带天线谐振频率设计方法，包括以下步骤：A kind of circular microstrip antenna resonant frequency design method, comprises the following steps:

步骤1：构建圆形微带天线谐振频率的神经网络模型，将圆形微带天线TM₁₁模式下谐振频率的神经网络训练样本和测试样本归一化处理，每个样本包含贴片半径、介质基片厚度、相对介电常数和实测谐振频率这4个数据；确定神经网络模型的输入层、隐层和输出层的节点数，确定神经网络模型的隐层和输出层的激活函数；Step 1: Construct the neural network model of the resonant frequency of the circular microstrip antenna, normalize the neural network training samples and test samples of the resonant frequency of the circular microstrip antenna TM ₁₁ mode, each sample includes the patch radius, medium The four data of substrate thickness, relative permittivity and measured resonance frequency; determine the number of nodes in the input layer, hidden layer and output layer of the neural network model, and determine the activation functions of the hidden layer and output layer of the neural network model;

步骤2：构建圆形微带天线谐振频率的粒子群神经网络模型，将每个粒子编码成一个D维向量X_i，i代表粒子编号，i＝1，2，...，N，N为粒子群中的粒子数目，D维向量X_i代表一个神经网络的所有权阈值，每个神经网络的归一化训练样本的输出误差平方和即为对应粒子的适应度值F(X_i)，设定粒子群算法中的下列参数值：粒子数目N、惯性权重w、学习因子c₁和c₂、训练次数T_max；Step 2: Construct the particle swarm neural network model of the resonant frequency of the circular microstrip antenna, encode each particle into a D-dimensional vector X _i , where i represents the particle number, i=1, 2, ..., N, N is The number of particles in the particle swarm, the D-dimensional vector _Xi represents the ownership threshold of a neural network, and the sum of the squared errors of the normalized training samples of each neural network is the fitness value F(X _i ) of the corresponding particle. Let Determine the following parameter values in the particle swarm optimization algorithm: particle number N, inertia weight w, learning factors c ₁ and c ₂ , training times T _max ;

步骤3：CPU端初始化粒子群神经网络：随机初始化每个粒子的位置X_id和速度V_id，d＝1，2，...，D，计算每个粒子的适应度值F(X_i)，每个粒子的个体最优适应度值F(P_i，best)初始值设为F(X_i)，每个粒子的个体最优位置P_id，best初始值设为X_id，所有F(P_i，best)的最小值及其对应的位置分别设为全局最优适应度值F(G_best)和全局最优位置G_d，best；Step 3: Initialize the particle swarm neural network on the CPU side: randomly initialize the position X _id and velocity V _id of each particle, d=1, 2, ..., D, and calculate the fitness value F(X _i ) of each particle , the initial value of the individual optimal fitness value F(P _{i, best} ) of each particle is set to F(X _i ), the individual optimal position P _{id of each particle, the initial value of best} is set to X _id , all F( The minimum value of P _{i, best} and its corresponding position are respectively set as the global optimal fitness value F (G _best ) and the global optimal position G _{d, best} ;

步骤4：进行CPU端到GPU端数据传递：CPU端调用cudaMemcpy()函数，将CPU端的数据X_id、V_id、F(X_i)、P_id，best、F(P_i，best)、G_d，best、F(G_best)传至GPU全局内存；CPU端调用cudaMemcpyToSymbol()函数，将CPU端的归一化训练样本数据传至GPU常量内存；Step 4: Transfer data from the CPU end to the GPU end: the CPU end calls the cudaMemcpy() function to transfer the CPU end data X _id , V _id , F(X _i ), P _{id, best} , F(P _{i, best} ), G _{d, best} and F(G _best ) are transferred to the GPU global memory; the CPU side calls the cudaMemcpyToSymbol() function to transfer the normalized training sample data from the CPU side to the GPU constant memory;

步骤5：进行GPU端并行粒子群神经网络计算：利用PSO算法群体中个体行为的并行性，GPU端一个线程对应一个粒子，在GPU端反复执行T_max次并行粒子群神经网络算法迭代，每次并行粒子群神经网络算法包含依次顺序执行的以下(1)(2)(3)(4)四个步骤：Step 5: Perform parallel particle swarm neural network calculation on the GPU side: using the parallelism of individual behaviors in the PSO algorithm group, one thread on the GPU side corresponds to one particle, and repeatedly execute T _max parallel particle swarm neural network algorithm iterations on the GPU side, each time The parallel particle swarm neural network algorithm includes the following four steps (1)(2)(3)(4) executed sequentially:

(1)按粒子群速度更新和位置更新公式同时更新每个粒子的速度V_id(t)和位置X_id(t)：(1) Simultaneously update the velocity V _id (t) and position X _id (t) of each particle according to the particle swarm velocity update and position update formulas:

V_id(t+1)＝wV_id(t)+c₁r₁(P_id，best(t)-X_id(t))+c₂r₂(G_d，best(t)-X_id(t))V _id (t+1)＝wV _id (t)+c ₁ r ₁ (P _id，best (t)-X _id (t))+c ₂ r ₂ (G _d，best (t)-X _id ( t))

X_id(t+1)＝X_id(t)+V_id(t+1)X _id (t+1)＝X _id (t)+V _id (t+1)

式中，当前迭代次数t＝1，2，...，T_max，r₁和r₂是介于[0，1]的均匀分布的随机数，在GPU端产生随机数可使用CURAND库中的curand_uniform()函数)；In the formula, the current number of iterations t=1, 2, ..., T _max , r ₁ and r ₂ are uniformly distributed random numbers between [0, 1]. To generate random numbers on the GPU side, you can use the CURAND library curand_uniform() function);

(2)同时计算每个粒子对应的适应度值F(X_i)；(2) Simultaneously calculate the fitness value F(X _i ) corresponding to each particle;

(3)同时更新每个粒子的个体最优适应度值F(P_i，best)及其对应的个体最优位置P_id，best：若F(X_i)＜F(P_i，best)，则F(X_i)＝F(P_i，best)，P_id，best＝X_id；(3) Simultaneously update the individual optimal fitness value F(P _{i, best} ) of each particle and its corresponding individual optimal position P _{id, best} : if F(X _i )<F(P _{i, best} ), Then F(X _i )=F(P _{i, best} ), P _{id, best} =X _id ;

(4)用并行规约算法更新全局最优适应度值F(G_best)及其对应的全局最优位置G_d，best：若min(F(P_i，best))＜F(G_best)，则F(G_best)＝min(F(P_i，best))，取min时i＝I，G_d，best＝P_Id，best；(4) Use the parallel reduction algorithm to update the global optimal fitness value F(G _best ) and its corresponding global optimal position G _{d, best} : if min(F(P _{i, best} ))<F(G _best ), Then F(G _best )=min(F(P _{i, best} )), i=I when getting min, G _{d, best} =P _{Id, best} ;

步骤6：进行GPU端到CPU端数据传递，即CPU端调用cudaMemcpy()函数，将GPU端训练好的神经网络最优权阈值G_d，best传回至CPU端；Step 6: Perform data transfer from the GPU end to the CPU end, that is, the CPU end calls the cudaMemcpy() function, and the optimal weight threshold G _{d, best of} the neural network trained on the GPU end is sent back to the CPU end;

步骤7：将归一化的训练样本和测试样本带入训练好的神经网络，将网络输出反归一，得到圆形微带天线谐振频率的网络输出值。Step 7: Bring the normalized training samples and test samples into the trained neural network, and reverse the network output to obtain the network output value of the resonant frequency of the circular microstrip antenna.

本发明的目的还可以通过以下技术措施来进一步实现：The object of the present invention can also be further realized by the following technical measures:

前述圆形微带天线谐振频率设计方法，其中步骤1所述神经网络的输入层、隐层和输出层的节点数为：Aforesaid circular microstrip antenna resonant frequency design method, wherein the number of nodes of the input layer, hidden layer and output layer of neural network described in step 1 is:

神经网络采用常用的3层网络结构，网络输入层节点数i为3、输出层节点数j为1，隐层节点数p的取值范围由下式确定：The neural network adopts a commonly used 3-layer network structure. The number i of network input layer nodes is 3, the number j of output layer nodes is 1, and the value range of the hidden layer node number p is determined by the following formula:

$p p = = \sqrt{i i + + j j} + + α α,, 11 \leq \leq α α \leq \leq 1010;;$

其中步骤1所述神经网络的隐层和输出层的激活函数确定如下：Wherein the activation function of the hidden layer and the output layer of the neural network described in step 1 is determined as follows:

神经网络的隐层激活函数选为双极性S型函数，如下式所示：The activation function of the hidden layer of the neural network is selected as a bipolar S-type function, as shown in the following formula:

$f f ((u u)) = = \frac{22}{11 + + exp exp ((- - 22 \times \times u u))} - - 11,, - - \infty \infty < < u u < < + + \infty \infty;;$

输出层的激活函数选为单极性S型函数，如下式所示：The activation function of the output layer is selected as a unipolar S-type function, as shown in the following formula:

$f f ((u u)) = = \frac{11}{11 + + exp exp ((- - u u))},, - - \infty \infty < < u u < < + + \infty \infty . .$

前述圆形微带天线谐振频率设计方法，其中步骤1所述隐层节点数p设定为8。In the method for designing the resonant frequency of the circular microstrip antenna, the number p of hidden layer nodes in step 1 is set to 8.

前述圆形微带天线谐振频率设计方法，其中步骤2所述向量X_i粒子维数D为41。The method for designing the resonant frequency of the aforementioned circular microstrip antenna, wherein the particle dimension D of the vector X _i in step 2 is 41.

前述圆形微带天线谐振频率设计方法，其中步骤2所述粒子数目N为64的倍数值；惯性权重w取值为1至0.4线性递减；学习因子c₁和c₂分别取2.8和1.3；训练次数T_max取值为1000。The method for designing the resonant frequency of the aforementioned circular microstrip antenna, wherein the number of particles N described in step 2 is a multiple of 64; the value of the inertia weight w is 1 to 0.4 linearly decreasing; the learning factors _c1 and _c2 are respectively 2.8 and 1.3; The training times T _max is set to 1000.

与现有技术相比，本发明的有益效果是：Compared with prior art, the beneficial effect of the present invention is:

(1)通过在GPU端并行处理各个粒子的计算过程实现了对粒子群神经网络的并行加速训练，显著减少圆形微带天线谐振频率的建模时间，在寻优稳定性一致的前提下最高可获得347倍的计算加速比；(2)粒子数越多，获得的加速比越高；(3)本发明大幅增加粒子数是适应GPU计算架构的特殊方法；随着粒子数的增加，与现有技术CPU端串行粒子群神经网络相比，GPU端并行粒子群神经网络的运行时间增加极为有限；若在GPU端大幅增加粒子数，该方法能在建模时间增加极为有限的情况下大幅降低建模误差，建模误差性能优于所有现有技术的效果。(1) The parallel acceleration training of the particle swarm neural network is realized by processing the calculation process of each particle in parallel on the GPU side, which significantly reduces the modeling time of the resonant frequency of the circular microstrip antenna. Can obtain 347 times of calculation acceleration ratio; (2) the more the number of particles, the higher the acceleration ratio obtained; (3) the present invention greatly increases the number of particles is a special method to adapt to the GPU computing architecture; with the increase of the number of particles, and Compared with the serial particle swarm neural network on the CPU side of the prior art, the running time of the parallel particle swarm neural network on the GPU side is extremely limited; if the number of particles is greatly increased on the GPU side, this method can achieve a very limited increase in modeling time. The modeling error is greatly reduced, and the modeling error performance is better than that of all existing technologies.

附图说明Description of drawings

图1为圆形微带天线模型图；Fig. 1 is a circular microstrip antenna model diagram;

图2为统一计算设备架构编程模型图；Fig. 2 is a diagram of a programming model of a unified computing device architecture;

图3为本发明圆形微带天线谐振频率设计方法流程图；Fig. 3 is the flow chart of circular microstrip antenna resonant frequency design method of the present invention;

图4为圆形微带天线TM₁₁模式下谐振频率的神经网络训练样本和测试样本；Fig. 4 is the neural network training sample and test sample of the resonant frequency under the circular microstrip antenna TM ₁₁ mode;

图5为3-8-1结构神经网络的粒子群向量编码模型；Fig. 5 is the particle swarm vector coding model of 3-8-1 structure neural network;

图6为本发明使用的计算设备平台配置图；6 is a configuration diagram of a computing device platform used in the present invention;

图7为GPU端并行粒子群神经网络对圆形微带天线TM₁₁模式下谐振频率建模取得的计算加速比数据；Figure 7 shows the calculation acceleration ratio data obtained by modeling the resonant frequency of the circular microstrip antenna TM ₁₁ mode by the parallel particle swarm neural network on the GPU side;

图8为已有文献中各种CPU端神经网络模型得到的圆形微带天线TM₁₁模式下谐振频率的网络输出值与实测值之间的平均绝对误差总和数据。Figure 8 shows the average absolute error sum data between the network output value and the measured value of the resonant frequency of the circular microstrip antenna TM ₁₁ mode obtained by various CPU-side neural network models in the existing literature.

具体实施方式Detailed ways

本发明针对粒子群神经网络对圆形微带天线谐振频率建模时计算时间过长的问题，在现有GPU端并行粒子群算法研究的基础上，设计并实现了一种基于统一计算设备架构的GPU端并行粒子群神经网络求解方法，并对圆形微带天线谐振频率进行了快速建模，使建模速度和建模精度得到提升。Aiming at the problem that the calculation time is too long when the particle swarm neural network models the resonant frequency of the circular microstrip antenna, the present invention designs and implements a unified computing device architecture based on the existing GPU-side parallel particle swarm algorithm research. The GPU-side parallel particle swarm neural network solution method is used, and the resonant frequency of the circular microstrip antenna is quickly modeled, which improves the modeling speed and modeling accuracy.

下面结合附图和具体实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

如图1所示为一圆形微带天线模型，圆形的贴片1的半径为a，介质基片2的厚度为h，相对介电常数为ε_r。采用空腔模型法来分析该圆形贴片，当h＜＜λ₀时，贴片1与接地板3之间的空腔可以看成是四周为磁壁、上下为电壁的薄圆柱谐振空腔，腔内的电场应该为只有E_z分量的横磁波(Transverse-magnetic Wave，TM波)模式。将E_z展开为本征函数的叠加：As shown in Fig. 1 is a circular microstrip antenna model, the radius of the circular patch 1 is a, the thickness of the dielectric substrate 2 is h, and the relative permittivity is ε _r . The cavity model method is used to analyze the circular patch. When h<<λ ₀ , the cavity between the patch 1 and the ground plate 3 can be regarded as a thin cylindrical resonant space with magnetic walls around and electric walls up and down. cavity, the electric field in the cavity should be a transverse magnetic wave (Transverse-magnetic Wave, TM wave) mode with only E _z component. Expand E _z as a superposition of eigenfunctions:

${E E.}_{z z} = = {Σ Σ}_{m m = = 00}^{\infty \infty} {Σ Σ}_{n no = = 00}^{\infty \infty} {C C}_{mn mn} {Ψ Ψ}_{mn mn} - - - - - - ((33))$

本征函数及边界条件应该具有下列形式：The eigenfunctions and boundary conditions should have the following form:

J_m′(k_mna)＝0 (5)式中J_m(k_mnρ)为m阶贝塞尔函数，k_mn为分离参数，且J _m ′(k _mn a)＝0 (5) In the formula, J _m (k _mn ρ) is an m-order Bessel function, k _mn is a separation parameter, and

${k k}_{mn mn} = = \frac{{χ χ}^{' '}_{mn mn}}{a a} - - - - - - ((66))$

式中χ′_mn为函数J_m′(χ)的第n个根。由k＝k_mn可以近似得到圆形微带天线的谐振频率In the formula, χ′ _mn is the nth root of the function J _m ′(χ). The resonant frequency of the circular microstrip antenna can be approximated by k=k _mn

${f f}_{mn mn} = = \frac{{cχ cχ}^{' '}_{mn mn}}{22 πa πa \sqrt{{ϵ ϵ}_{r r}}} - - - - - - ((77))$

式中c是电磁波在自由空间的传播速度，m和n是整数。当计算圆形贴片的主模式TM₁₁模的谐振频率时，式(7)可写成In the formula, c is the propagation speed of electromagnetic wave in free space, and m and n are integers. When calculating the resonant frequency of the main mode TM ₁₁ mode of the circular patch, equation (7) can be written as

${f f}_{1111} = = \frac{0.293 0.293 c c}{a a \sqrt{{ϵ ϵ}_{r r}}} - - - - - - ((88))$

若考虑边缘效应，式中的半径a可用如下经验公式计算的等效半径a_e代替：If the edge effect is considered, the radius a in the formula can be replaced by the equivalent radius a _e calculated by the following empirical formula:

${a a}_{e e} = = a a {[[11 + + \frac{22 h h}{{πaϵ πaϵ}_{r r}} ((ln ln \frac{πa πa}{22 h h} + + 1.7726 1.7726))]]}^{\frac{11}{22}} - - - - - - ((99))$

显见，圆形微带天线谐振频率取决于a，h，ε_r，m，n；在TM₁₁模式下取决于a，h，ε_r。Obviously, the resonant frequency of circular microstrip antenna depends on a, h, ε _r , m, n; in TM ₁₁ mode, it depends on a, h, ε _r .

如图2所示为统一计算设备架构编程模型，它将CPU作为主机，GPU作为协处理器，两者协同工作，各司其职。CPU负责进行逻辑性强的事务处理和串行计算，GPU则专注于执行高度线程化的并行处理任务。统一计算设备架构采用单指令多线程(Single Instruction MultipleThreads，SIMT)执行模式。内核(kernel)函数执行GPU上的并行计算任务，是整个程序中的一个可以被并行执行的步骤。统一计算设备架构将线程组织成块网格(Grid)、线程块(Block)、线程(Thread)这三个不同的层次，并采用多层次的存储器结构：只对单个线程可见的本地存储器，对块内线程可见的共享存储器，对所有线程可见的全局存储器等。一个内核函数中存在两个层次的并行，即同一块网格中的线程块之间的不需要通信的粗粒度并行，同一线程块中的线程之间的允许通信的细粒度并行。统一计算设备架构计算流程通常包括CPU到GPU数据传递、内核函数执行、GPU到CPU数据传递三个步骤。As shown in Figure 2, the unified computing device architecture programming model takes the CPU as the host and the GPU as the coprocessor, and the two work together to perform their duties. The CPU is responsible for logical transaction processing and serial computing, while the GPU focuses on executing highly threaded parallel processing tasks. The unified computing device architecture adopts the Single Instruction Multiple Threads (Single Instruction MultipleThreads, SIMT) execution mode. The kernel function performs parallel computing tasks on the GPU, and is a step in the entire program that can be executed in parallel. The unified computing device architecture organizes threads into three different levels: Grid, Block, and Thread, and adopts a multi-level memory structure: local memory that is only visible to a single thread, and Shared memory visible to threads within a block, global memory visible to all threads, etc. There are two levels of parallelism in a kernel function, that is, coarse-grained parallelism that does not require communication between thread blocks in the same grid, and fine-grained parallelism that allows communication between threads in the same thread block. The computing process of the unified computing device architecture usually includes three steps: CPU-to-GPU data transfer, kernel function execution, and GPU-to-CPU data transfer.

如图3所示为基于统一计算设备架构的GPU端并行粒子群神经网络对圆形微带天线TM₁₁模式下谐振频率设计进行快速建模的算法流程图，具体步骤如下：As shown in Figure 3, the algorithm flow chart of the rapid modeling of the resonant frequency design of the circular microstrip antenna TM ₁₁ mode based on the GPU-side parallel particle swarm neural network based on the unified computing device architecture, the specific steps are as follows:

步骤1：构建圆形微带天线谐振频率的神经网络模型，包括将圆形微带天线TM₁₁模式下谐振频率的神经网络训练样本和测试样本(每个样本包含贴片半径、介质基片厚度、相对介电常数和实测谐振频率这4个数据)归一化处理，确定神经网络的输入层、隐层和输出层的节点数，确定神经网络的隐层和输出层的激活函数。Step 1: Construct the neural network model of the resonant frequency of the circular microstrip antenna, including the neural network training samples and test samples of the resonant frequency of the circular microstrip antenna TM ₁₁ mode (each sample includes patch radius, dielectric substrate thickness , relative permittivity and measured resonance frequency) normalized processing, determine the number of nodes in the input layer, hidden layer and output layer of the neural network, and determine the activation function of the hidden layer and output layer of the neural network.

(1)将圆形微带天线TM₁₁模式下谐振频率的神经网络训练样本和测试样本(每个样本包含贴片半径、介质基片厚度、相对介电常数和实测谐振频率这4个数据)归一化处理。(1) The neural network training samples and test samples of the resonant frequency in the circular microstrip antenna TM ₁₁ mode (each sample contains four data of patch radius, dielectric substrate thickness, relative permittivity and measured resonant frequency) Normalization processing.

图4中列出了16组训练样本(不带星号)和4组测试样本(带星号)，其中圆形微带天线的贴片半径、介质基片厚度、相对介电常数这3个相关参数(a，h，ε_r)作为网络输入(图4的第2到第4列)，在TM₁₁模式下对应的实测谐振频率f_ME作为网络输出(图4的第5列)。为方便数据处理，将每列数据都做归一化处理，限制数据范围在[0，1]。神经网络训练时使用归一化的训练样本。Figure 4 lists 16 sets of training samples (without asterisks) and 4 sets of test samples (with asterisks), among which the patch radius, dielectric substrate thickness, and relative permittivity of the circular microstrip antenna are three The relevant parameters (a, h, ε _r ) are taken as network input (columns 2 to 4 in Fig. 4), and the corresponding measured resonant frequency f _ME in TM ₁₁ mode is taken as network output (column 5 in Fig. 4). For the convenience of data processing, each column of data is normalized, and the data range is limited to [0, 1]. Normalized training samples are used for neural network training.

(2)确定神经网络的输入层、隐层和输出层的节点数。(2) Determine the number of nodes in the input layer, hidden layer and output layer of the neural network.

神经网络采用常用的3层网络结构，网络输入层节点数i为3(分别对应a，h，ε_r)、输出层节点数j为1(对应谐振频率)，隐层节点数p的取值范围由以下公式确定：The neural network adopts a commonly used 3-layer network structure. The number i of network input layer nodes is 3 (corresponding to a, h, ε _r respectively), the number j of output layer nodes is 1 (corresponding to the resonance frequency), and the value of the hidden layer node number p is The range is determined by the following formula:

$p p = = \sqrt{i i + + j j} + + α α,, 11 \leq \leq α α \leq \leq 1010 - - - - - - ((1010))$

较优的，将隐层节点数p设定为8，这样神经网络结构为3-8-1。Preferably, the number p of hidden layer nodes is set to 8, so that the neural network structure is 3-8-1.

(3)确定神经网络的隐层和输出层的激活函数。(3) Determine the activation function of the hidden layer and output layer of the neural network.

根据问题特点将神经网络的隐层激活函数选为双极性S型函数，其表达式如式(11)所示；输出层的激活函数选为单极性S型函数，其表达式如式(12)所示。According to the characteristics of the problem, the activation function of the hidden layer of the neural network is selected as a bipolar S-type function, and its expression is shown in formula (11); the activation function of the output layer is selected as a unipolar S-type function, and its expression is as follows (12).

$f f ((u u)) = = \frac{22}{11 + + exp exp ((- - 22 \times \times u u))} - - 11,, - - \infty \infty < < u u < < + + \infty \infty - - - - - - ((1111))$

$f f ((u u)) = = \frac{11}{11 + + exp exp ((- - u u))},, - - \infty \infty < < u u < < + + \infty \infty - - - - - - ((1212))$

步骤2：构建圆形微带天线谐振频率的粒子群神经网络模型，包括将每个粒子编码成一个D维向量X_i(i代表粒子编号，i＝1，2，...，N，N为粒子群中的粒子数目)代表一个神经网络的所有权阈值，每个神经网络的归一化训练样本的输出误差平方和(Sum of Squared Error，SSE)即为对应粒子的适应度值F(X_i)；设定粒子群算法中的下列参数值：粒子数目N、惯性权重w、学习因子c₁和c₂、训练次数T_max。Step 2: Construct the particle swarm neural network model of the resonant frequency of the circular microstrip antenna, including encoding each particle into a D-dimensional vector X _i (i represents the particle number, i=1, 2, ..., N, N is the number of particles in the particle swarm) represents the ownership threshold of a neural network, and the sum of squared error (Sum of Squared Error, SSE) output of each neural network’s normalized training samples is the fitness value F(X _i ); set the following parameter values in the particle swarm optimization algorithm: particle number N, inertia weight w, learning factors c ₁ and c ₂ , training times T _max .

(1)将每个粒子编码成一个D维向量X_i(i代表粒子编号，i＝1，2，...，N，N为粒子群中的粒子数目)代表一个神经网络的所有权阈值，每个神经网络的归一化训练样本的输出误差平方和(Sum of Squared Error，SSE)即为对应粒子的适应度值F(X_i)。(1) encode each particle into a D-dimensional vector _Xi (i represents the particle number, i=1, 2, ..., N, N is the number of particles in the particle swarm) represents the ownership threshold of a neural network, The output sum of squared error (Sum of Squared Error, SSE) of the normalized training samples of each neural network is the fitness value F(X _i ) of the corresponding particle.

图5为3-8-1结构神经网络的粒子群向量编码模型，该模型采用粒子各维与神经网络各权阈值一一对应的原则，将每个粒子编码成一个粒子维数(问题维数、权阈值数目)D为41的向量X_i(i代表粒子编号，i＝1，2，...，N，N为粒子群中的粒子数目)代表一个神经网络的所有权阈值，如式(13)所示；每个神经网络的归一化训练样本的输出误差平方和(Sum of SquaredError，SSE)即为对应粒子的适应度值F(X_i)。Fig. 5 is the particle swarm vector coding model of the 3-8-1 structure neural network. This model adopts the principle of one-to-one correspondence between each dimension of the particle and each weight threshold of the neural network, and encodes each particle into a particle dimension (problem dimension , weight threshold number) D is 41 vector _Xi (i represents the particle number, i=1, 2, ..., N, N is the number of particles in the particle swarm) represents the ownership threshold of a neural network, such as formula ( 13); the sum of squared error (Sum of SquaredError, SSE) output of the normalized training samples of each neural network is the fitness value F(X _i ) of the corresponding particle.

X_i＝[X_i1 X_i2 X_i3…X_i39 X_i40 X_i41] (13)X _i ＝[X _i1 X _i2 X _i3 ... X _i39 X _i40 X _i41 ] (13)

(2)设定粒子群算法中的下列参数值：粒子数目N、惯性权重w、学习因子c₁和c₂、训练次数T_max。(2) Set the following parameter values in the particle swarm optimization algorithm: particle number N, inertia weight w, learning factors c ₁ and c ₂ , training times T _max .

统一计算设备架构中一个线程束(Warp)包含索引相邻的32个线程，流多处理器(StreamMultiprocessor，SM)以线程束为单位调度和执行线程，因此将粒子数目N设计成32的倍数值，又因为粒子数目一般情况要大于问题维数41，因此将粒子数目N设计成64的倍数值；惯性权重w取值1至0.4线性递减；学习因子c₁和c₂分别取2.8和1.3；训练次数T_max取值1000。In the unified computing device architecture, a thread warp (Warp) contains 32 threads adjacent to each other, and the stream multiprocessor (StreamMultiprocessor, SM) schedules and executes threads in units of thread warps, so the number of particles N is designed to be a multiple of 32 , and because the number of particles is generally greater than the problem dimension of 41, the number of particles N is designed to be a multiple of 64; the inertia weight w is linearly decreased from 1 to 0.4; the learning factors c ₁ and c ₂ are 2.8 and 1.3 respectively; The number of training times T _max is 1000.

步骤3：CPU端初始化粒子群神经网络，包括随机初始化每个粒子的位置X_id和速度V_id(d＝1，2，...，D)，计算每个粒子的适应度值F(X_i)，每个粒子的个体最优适应度值F(P_i，best)初始值设为F(X_i)，每个粒子的个体最优位置P_id，best初始值设为X_id，所有F(P_i，best)的最小值及其对应的位置分别设为全局最优适应度值F(G_best)和全局最优位置G_d，best。Step 3: Initialize the particle swarm neural network on the CPU side, including randomly initializing the position X _id and velocity V _id (d=1, 2, ..., D) of each particle, and calculating the fitness value F(X _i ), the initial value of the individual optimal fitness value F(P _{i, best} ) of each particle is set to F(X _i ), the individual optimal position P _{id of each particle, the initial value of best} is set to X _id , all The minimum value of F(P _{i, best} ) and its corresponding position are respectively set as the global optimal fitness value F(G _best ) and the global optimal position G _{d, best} .

步骤4：CPU端到GPU端数据传递，包括CPU端调用cudaMemcpy()函数，将CPU端的粒子数据X_id、V_id、F(X_i)、P_id，best、F(P_i，best)、G_d，best、F(G_best)传至GPU全局内存；CPU端调用cudaMemcpyToSymbol()函数，将CPU端的归一化训练样本数据传至GPU常量内存。Step 4: Data transfer from the CPU end to the GPU end, including calling the cudaMemcpy() function on the CPU end to transfer the particle data X _id , V _id , F(X _i ), P _{id, best} , F(P _{i, best} ), G _{d, best} , F(G _best ) are transferred to the GPU global memory; the CPU side calls the cudaMemcpyToSymbol() function to transfer the normalized training sample data from the CPU side to the GPU constant memory.

步骤5：GPU端并行粒子群神经网络计算，包括利用PSO算法群体中个体行为的并行性，GPU端一个线程对应一个粒子，在GPU端反复执行T_max次并行粒子群神经网络算法迭代，每次并行粒子群神经网络算法包含依次顺序执行的如下(1)(2)(3)(4)四个步骤：Step 5: Parallel particle swarm neural network calculation on the GPU side, including using the parallelism of individual behaviors in the PSO algorithm group, one thread on the GPU side corresponds to one particle, and repeatedly execute T _max times of parallel particle swarm neural network algorithm iterations on the GPU side, each time The parallel particle swarm neural network algorithm includes the following four steps (1)(2)(3)(4) executed sequentially:

(1)按粒子群速度更新和位置更新公式同时更新每个粒子的速度V_id(t)和位置X_id(t)；(1) Simultaneously update the velocity V _id (t) and position X _id (t) of each particle according to the particle swarm velocity update and position update formulas;

V_id(t+1)＝wV_id(t)+c₁r₁(P_id，best(t)-X_id(t))+c₂r₂(G_d，best(t)-X_id(t)) (1)V _id (t+1)＝wV _id (t)+c ₁ r ₁ (P _id，best (t)-X _id (t))+c ₂ r ₂ (G _d，best (t)-X _id ( t)) (1)

X_id(t+1)＝X_id(t)+V_id(t+1) (2)X _id (t+1)＝X _id (t)+V _id (t+1) (2)

式中，当前迭代次数t＝1，2，...，T_max，r₁和r₂是介于[0，1]的均匀分布的随机数(在GPU端产生随机数可使用CURAND库中的curand_uniform()函数)；In the formula, the current number of iterations t=1, 2, ..., T _max , r ₁ and r ₂ are uniformly distributed random numbers between [0, 1] (random numbers generated on the GPU side can be used in the CURAND library curand_uniform() function);

(4)用并行规约算法更新全局最优适应度值F(G_best)及其对应的全局最优位置G_d，best：若min(F(P_i，best))＜F(G_best)，则F(G_best)＝min(F(P_i，best))(取min时i＝I)，G_d，best＝P_Id，best。(4) Use the parallel reduction algorithm to update the global optimal fitness value F(G _best ) and its corresponding global optimal position G _{d, best} : if min(F(P _{i, best} ))<F(G _best ), Then F(G _best )=min(F(P _i,best )) (i=I when taking min), G _d,best =P _Id,best .

步骤6：GPU端到CPU端数据传递，包括CPU端调用cudaMemcpy()函数，将GPU端训练好的神经网络最优权阈值G_d，best传回至CPU端。Step 6: Data transfer from the GPU end to the CPU end, including calling the cudaMemcpy() function on the CPU end, and transferring the optimal weight threshold G _{d, best} of the neural network trained on the GPU end to the CPU end.

步骤7：将归一化的训练样本和测试样本带入训练好的神经网络，将网络输出反归一，得到圆形微带天线谐振频率的网络输出值。求出网络输出值与实测值的绝对误差总和，记录程序运行的总时间，对建模速度和建模误差可作性能分析。Step 7: Bring the normalized training samples and test samples into the trained neural network, and reverse the network output to obtain the network output value of the resonant frequency of the circular microstrip antenna. Calculate the sum of the absolute errors between the network output value and the measured value, record the total running time of the program, and perform performance analysis on the modeling speed and modeling error.

根据上述步骤在图6所示设备环境下进行数值测试，结果如图7所示。图7中的“计算加速比”项作为常用的加速性能指标，定义为粒子群神经网络在相同粒子数和相同迭代次数(实验中取1000次)下CPU端程序运行时间和GPU端程序运行时间的比值。图8列出了已有文献中各种CPU端神经网络模型得到的圆形微带天线TM₁₁模式下谐振频率的网络输出值与实测值之间的平均绝对误差总和，以方便与本发明计算结果进行对比。According to the above steps, the numerical test is carried out under the equipment environment shown in Figure 6, and the results are shown in Figure 7. The "computing acceleration ratio" item in Figure 7 is used as a commonly used acceleration performance index, which is defined as the running time of the CPU-side program and the GPU-side program running time of the particle swarm neural network under the same number of particles and the same number of iterations (1000 times in the experiment). ratio. Fig. 8 has listed the average absolute error summation between the network output value of resonant frequency and measured value under the circular microstrip antenna TM ₁₁ mode that various CPU end neural network models obtain in the existing literature, to facilitate calculation with the present invention The results are compared.

对图7和图8作如下分析：Analyze Figures 7 and 8 as follows:

(a)粒子数越多，获得的计算加速比越高，GPU端并行粒子群神经网络最高获得了347倍的加速比。随着粒子数的翻倍，当粒子数小于等于16384(所用GPU最大驻留线程数为26624)，计算加速比大致翻倍；当粒子数大于等于32768，计算加速比仍能增加但增速放缓。(a) The larger the number of particles, the higher the computing speedup ratio obtained, and the GPU-side parallel particle swarm neural network achieved a speedup ratio of up to 347 times. With the doubling of the number of particles, when the number of particles is less than or equal to 16384 (the maximum number of resident threads of the GPU used is 26624), the calculation speedup ratio is roughly doubled; when the number of particles is greater than or equal to 32768, the calculation speedup ratio can still be increased but the speedup is slowed down. slow.

(b)GPU端并行粒子群神经网络具有与CPU端串行粒子群神经网络同样的寻优稳定性。随着粒子数的不断增多，CPU程序和GPU程序的误差不断减小；粒子数相同时，CPU程序和GPU程序的误差大致相同或相近。(b) The GPU-side parallel particle swarm neural network has the same optimization stability as the CPU-side serial particle swarm neural network. As the number of particles continues to increase, the errors of the CPU program and the GPU program continue to decrease; when the number of particles is the same, the errors of the CPU program and the GPU program are roughly the same or similar.

(c)大幅增加粒子数是适应GPU计算架构的特殊方法。随着粒子数的增加，与CPU端相比，GPU端的运行时间增加极为有限。GPU端并行粒子群神经网络的建模误差，当粒子数大于等于256时，优于文献中BP的结果；当粒子数大于等于1024时，优于文献中DBD的结果；当粒子数大于等于8192时，优于文献中PTS的结果；当粒子数大于等于32768时，优于文献中EDBD的结果；当粒子数大于等于65536时，优于包括文献中BiPSO在内的所有已有文献的结果。(c) Substantially increasing the number of particles is a special way to adapt to the GPU computing architecture. As the number of particles increases, there is a very limited increase in runtime on the GPU side compared to the CPU side. The modeling error of the GPU-side parallel particle swarm neural network is better than the result of BP in the literature when the number of particles is greater than or equal to 256; when the number of particles is greater than or equal to 1024, it is better than the result of DBD in the literature; when the number of particles is greater than or equal to 8192 When the number of particles is greater than or equal to 32768, it is better than the results of EDBD in the literature; when the number of particles is greater than or equal to 65536, it is better than the results of all existing literatures including BiPSO in the literature.

对以上分析作如下总结：The above analysis is summarized as follows:

(1)若GPU端使用和CPU端相同的粒子数，该方法能在收敛一致性的前提下大幅减少建模时间。(1) If the GPU side uses the same number of particles as the CPU side, this method can greatly reduce the modeling time under the premise of convergence consistency.

(2)若在GPU端大幅增加粒子数，该方法能在建模时间增加极为有限的情况下大幅降低建模误差。(2) If the number of particles is greatly increased on the GPU side, this method can greatly reduce the modeling error with a very limited increase in modeling time.

该方法能在建模时间增加极为有限的情况下大幅降低建模误差，建模误差性能优于所有现有技术的效果。The method can greatly reduce the modeling error with a very limited increase in modeling time, and the modeling error performance is better than that of all existing technologies.

除上述实施例外，本发明还可以有其他实施方式，凡采用等同替换或等效变换形成的技术方案，均落在本发明要求的保护范围内。In addition to the above-mentioned embodiments, the present invention can also have other implementations, and all technical solutions formed by equivalent replacement or equivalent transformation fall within the scope of protection required by the present invention.

Claims

1. a circular microstrip antenna resonant frequency design method, is characterized in that, the method comprises the following steps;

Step 1: Construct the neural network model of the resonant frequency of the circular microstrip antenna, normalize the neural network training samples and test samples of the resonant frequency of the circular microstrip antenna TM ₁₁ mode, each sample includes the patch radius, medium The four data of substrate thickness, relative permittivity and measured resonance frequency; determine the number of nodes in the input layer, hidden layer and output layer of the neural network model, and determine the activation functions of the hidden layer and output layer of the neural network model;

Step 2: Construct the particle swarm neural network model of the resonant frequency of the circular microstrip antenna, encode each particle into a D-dimensional vector X _i , where i represents the particle number, i=1,2,...,N, where N is The number of particles in the particle swarm, the D-dimensional vector _Xi represents the ownership threshold of a neural network, and the sum of the squared errors of the normalized training samples of each neural network is the fitness value F(X _i ) of the corresponding particle. Let Determine the following parameter values in the particle swarm optimization algorithm: particle number N, inertia weight w, learning factors c ₁ and c ₂ , training times T _max ;

Step 3: Initialize the particle swarm neural network on the CPU side; randomly initialize the position X _id and velocity V _id of each particle, d=1,2,...,D, and calculate the fitness value F(X _i ) of each particle , the initial value of the individual optimal fitness value F(P _i,best ) of each particle is set to F(X _i ), the initial value of the individual optimal position P _{id,best of} each particle is set to X _id , all F( The minimum value of P _i,best ) and its corresponding position are respectively set as the global optimal fitness value F(G _best ) and the global optimal position G _d,best ;

Step 4: Transfer data from the CPU end to the GPU end: the CPU end calls the cudaMemcpy() function to transfer the CPU end data X _id , V _id , F(X _i ), P _id,best , F(P _i,best ), G _{d, best} and F(G _best ) are transferred to the GPU global memory; the CPU side calls the cudaMemcpyToSymbol() function to transfer the normalized training sample data from the CPU side to the GPU constant memory;

Step 5: Perform parallel particle swarm neural network calculation on the GPU side: using the parallelism of individual behaviors in the PSO algorithm group, one thread on the GPU side corresponds to one particle, and repeatedly execute T _max parallel particle swarm neural network algorithm iterations on the GPU side, each time The parallel particle swarm neural network algorithm includes the following four steps (1)(2)(3)(4) executed sequentially:

(1) Simultaneously update the velocity V _id (t) and position X _id (t) of each particle according to the particle swarm velocity update and position update formulas:

V _id (t+1)＝wV _id (t)+c ₁ r ₁ (P _id,best (t)-X _id (t))+c ₂ r ₂ (G _d,best (t)–X _id ( t))

X _id (t+1)＝X _id (t)+V _id (t+1)

In the formula, the current number of iterations t=1,2,...,T _max , r ₁ and r ₂ are uniformly distributed random numbers between [0,1]. To generate random numbers on the GPU side, you can use the CURAND library curand_uniform() function);

(2) Simultaneously calculate the fitness value F(X _i ) corresponding to each particle;

(3) Simultaneously update the individual optimal fitness value F(P _i,best ) of each particle and its corresponding individual optimal position P _id,best ; if F(X _i )<F(P _i,best ), Then F(X _i )=F(P _i,best ), P _id,best =X _id ;

(4) Use the parallel reduction algorithm to update the global optimal fitness value F(G _best ) and its corresponding global optimal position G _d,best : if min(F(P _i,best ))<F(G _best ), Then F(G _best )=min(F(P _{i, best} )), i=I when getting min, G _{d, best} =P _{Id, best} ;

Step 6: Perform data transfer from the GPU end to the CPU end, that is, the CPU end calls the cudaMemcpy() function to transfer the optimal weight threshold G _d,best of the neural network trained on the GPU end to the CPU end;

Step 7: Bring the normalized training samples and test samples into the trained neural network, and reverse the network output to obtain the network output value of the resonant frequency of the circular microstrip antenna.

2. circular microstrip antenna resonance frequency design method as claimed in claim 1, is characterized in that, the number of nodes of the input layer of neural network described in step 1, hidden layer and output layer is:

The neural network adopts a commonly used 3-layer network structure. The number i of network input layer nodes is 3, the number j of output layer nodes is 1, and the value range of the hidden layer node number p is determined by the following formula:

\begin{matrix} p p = = \sqrt{i i + + j j} + + α α & 11 \leq \leq α α \leq \leq 1010 \end{matrix};;

The activation functions of the hidden layer and the output layer of the neural network described in step 1 are determined as follows:

The activation function of the hidden layer of the neural network is selected as a bipolar S-type function, as shown in the following formula;

\begin{matrix} f f ((u u)) = = \frac{22}{11 + + exp exp ((- - 22 \times \times u u))} - - 11 & - - \infty \infty < < u u < < + + \infty \infty \end{matrix};;

The activation function of the output layer is selected as a unipolar S-type function, as shown in the following formula;

\begin{matrix} f f ((u u)) = = \frac{11}{11 + + exp exp ((- - u u))} & - - \infty \infty < < u u < < + + \infty \infty \end{matrix} . .

3. the circular microstrip antenna resonant frequency design method as claimed in claim 2, is characterized in that, described hidden layer node number p is set to 8.

4. the circular microstrip antenna resonant frequency design method as claimed in claim 3, is characterized in that, the particle dimension D of vector _Xi described in step 2 is 41.

5. circular microstrip antenna resonant frequency design method as claimed in claim 4, is characterized in that, the multiple value of particle number N described in step 2 is 64; Inertia weight w takes a value of 1 to 0.4 linear decline; Learning factor c ₁ and c ₂ are set to 2.8 and 1.3 respectively; the number of training T _max is set to 1000.