CN106875002A

CN106875002A - Complex value neural network training method based on gradient descent method Yu generalized inverse

Info

Publication number: CN106875002A
Application number: CN201710091587.5A
Authority: CN
Inventors: 桑兆阳; 刘芹; 龚晓玲; 张华清; 陈华; 王健
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2017-02-20
Filing date: 2017-02-20
Publication date: 2017-06-20

Abstract

The invention relates to a complex-valued neural network training method based on gradient descent method and generalized inverse. Step 1 is to select a single hidden layer complex-valued neural network model; Step 2 is to use gradient descent method and generalized inverse to calculate single hidden layer complex-valued neural network. The weight matrix and weight vector in the network, step 3, according to the weight matrix and weight vector, obtain complex-valued neural network parameters, and calculate the mean square error; add 1 to the number of iterations, and return to step 2. The hidden layer input weight of the present invention is iteratively generated by the gradient descent method, and the output weight is always solved by generalized inversion. The number of iterations of this method is small, the corresponding training time is short, the convergence speed is fast, and the learning efficiency is high, and the number of hidden layer nodes required is small. Therefore, the present invention can more accurately reflect the performance of the complex-valued neural network model than the BSCBP method and the CELM method.

Description

Complex-valued neural network training method based on gradient descent method and generalized inverse

技术领域technical field

本发明属于图像处理、模式识别和通信传输的技术领域，尤其涉及一种基于梯度下降法与广义逆的复值神经网络训练方法。The invention belongs to the technical fields of image processing, pattern recognition and communication transmission, and in particular relates to a complex-valued neural network training method based on gradient descent method and generalized inverse.

背景技术Background technique

在图像处理、模式识别和通信传输等方面利用神经网络建模的方法进行样本训练与测试有广泛的应用。在训练样本中的神经网络模型建模，神经网络信号(输入信号和输出信号以及权值参数)，可以为实数值和复数值，从而神经网络分为实值神经网络和复值神经网络。现有的神经网络建模方法多数是建立的实值神经网络模型，但随着电子信息科学的迅速发展，复数值信号越来越频繁地出现在工程实践中，仅考虑实值的计算无法良好的解决实际问题，而复值神经网络可以解决一些实值神经网络所解决不了的问题。复值神经网络是通过复数参数和变量(即信号的输入、输出以及网络权值均为复数)来处理复数信息的神经网络。因此，一系列复值神经网络的模型被陆续提出并加以深入研究。In image processing, pattern recognition and communication transmission, the method of using neural network modeling for sample training and testing has a wide range of applications. In the neural network model modeling in the training samples, the neural network signals (input signal, output signal and weight parameter) can be real-valued or complex-valued, so the neural network is divided into real-valued neural network and complex-valued neural network. Most of the existing neural network modeling methods are real-valued neural network models, but with the rapid development of electronic information science, complex-valued signals appear more and more frequently in engineering practice, and calculations that only consider real values cannot be performed well. To solve practical problems, complex-valued neural networks can solve some problems that real-valued neural networks cannot solve. The complex-valued neural network is a neural network that processes complex information through complex parameters and variables (that is, the input, output, and network weight of the signal are all complex). Therefore, a series of complex-valued neural network models have been proposed and studied in depth.

在Batch Split-Complex Backpropagation Algorithm中提出了一种BSCBP方法用于训练复值神经网络。激活函数选择实虚部型激活函数，对隐层输入的实虚部分别激活，避免奇异点的出现；BSCBP方法先给输入权值矩阵以及输出权值矩阵进行随机赋值，再通过梯度下降法进行梯度更新，最后计算测试样本的精度。但是，基于梯度下降法的BSCBP模型需要多次迭代训练，消耗时间长并且学习效率较低。A BSCBP method is proposed in Batch Split-Complex Backpropagation Algorithm for training complex-valued neural networks. The activation function selects the activation function of the real and imaginary parts, and activates the real and imaginary parts of the hidden layer input separately to avoid the appearance of singular points; the BSCBP method first randomly assigns values to the input weight matrix and output weight matrix, and then uses the gradient descent method. The gradient is updated, and finally the accuracy on the test samples is calculated. However, the BSCBP model based on the gradient descent method requires multiple iterations of training, which consumes a long time and has low learning efficiency.

在Fully Complex Extreme Learning Machine中提出了一种CELM方法将ELM方法从实数域扩展到复数域，并应用于非线性信道均衡。CELM只需要设置合适的网络隐层节点个数，对网络的输入权值进行随机赋值，输出层权值的最优解通过最小二乘法得到，激活函数可以选sigmoid函数、(反)三角函数以及(反)双曲函数，与BSCBP不同的是，激活函数对隐层的输入矩阵直接激活。整个过程一次完成，无需迭代，因此具有参数选择容易、学习速度极快的优点。但是CELM方法为了弥补隐层节点参数的随意性选择，往往需要较多的隐层节点个数，且训练精度有待进一步提高。In Fully Complex Extreme Learning Machine, a CELM method is proposed to extend the ELM method from the real number domain to the complex number domain, and apply it to nonlinear channel equalization. CELM only needs to set the appropriate number of network hidden layer nodes, and randomly assign the input weights of the network. The optimal solution of the output layer weights is obtained by the least square method. The activation function can be selected from sigmoid function, (inverse) trigonometric function and The (inverse) hyperbolic function, unlike BSCBP, activates the input matrix of the hidden layer directly. The whole process is completed once without iteration, so it has the advantages of easy parameter selection and extremely fast learning speed. However, in order to make up for the arbitrary selection of hidden layer node parameters, the CELM method often requires a large number of hidden layer nodes, and the training accuracy needs to be further improved.

综上所述，BSCBP在训练时速度缓慢，精度较低，CELM方法虽然速度快，但所需要的网络隐层节点数过多，且精度也有待提高，现有技术中对于如何同时解决复值神经网络训练方法中训练速度慢、精度低和网络隐层节点数过多的问题，尚缺乏有效的解决方案。In summary, BSCBP is slow in training and has low precision. Although the CELM method is fast, it requires too many nodes in the hidden layer of the network, and the accuracy needs to be improved. In the prior art, how to solve complex value There is still no effective solution to the problems of slow training speed, low precision and too many nodes in the network hidden layer in the neural network training method.

发明内容Contents of the invention

本发明为了解决上述问题，克服传统的复值神经网络训练方法中无法同时解决训练速度慢、精度低和网络隐层节点数过多的问题，提供一种基于梯度下降法与广义逆的复值神经网络训练方法(Gradient based Generalized Complex Neural Networks,简称GGCNN)。In order to solve the above problems, the present invention overcomes the problems that the traditional complex-valued neural network training method cannot simultaneously solve the problems of slow training speed, low precision and too many nodes in the network hidden layer, and provides a complex-valued neural network based on gradient descent method and generalized inverse Neural network training method (Gradient based Generalized Complex Neural Networks, referred to as GGCNN).

为了实现上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

一种基于梯度下降法与广义逆的复值神经网络训练方法，所述方法步骤包括：A complex-valued neural network training method based on gradient descent method and generalized inverse, said method steps comprising:

(1)选择单隐层复值神经网络模型对样本数据集进行建模；(1) Select a single hidden layer complex-valued neural network model to model the sample data set;

(2)根据步骤(1)中选择的单隐层复值神经网络模型，利用广义逆计算所述单隐层复值神经网络中的权值矩阵，将迭代次数初始值设置为1，利用梯度下降法计算所述单隐层复值神经网络中的权值向量；(2) According to the single-hidden-layer complex-valued neural network model selected in step (1), use the generalized inverse to calculate the weight matrix in the single-hidden-layer complex-valued neural network, set the initial value of the number of iterations to 1, and use the gradient Descent method calculates the weight vector in the single hidden layer complex-valued neural network;

(3)根据步骤(2)中计算出的所述权值矩阵和所述权值向量，获取复值神经网络网络参数，计算当前样本数据的均方误差；判断当前迭代次数是否等于最大迭代次数，若是，结束训练；若否，将当前迭代次数加1，返回步骤(2)。(3) According to the weight matrix calculated in step (2) and the weight vector, obtain the complex-valued neural network parameters, calculate the mean square error of the current sample data; judge whether the current number of iterations is equal to the maximum number of iterations , if yes, end the training; if not, add 1 to the current iteration number, and return to step (2).

优选的，所述步骤(1)中的样本数据集包括训练样本数据集或测试数据集。Preferably, the sample data set in the step (1) includes a training sample data set or a test data set.

优选的，所述步骤(1)中的所述单隐层复值神经网络模型为：Preferably, the single hidden layer complex-valued neural network model in the step (1) is:

所述单隐层复值神经网络模型中的输入层、隐层和输出层神经元个数分别为L、M和1；The number of neurons in the input layer, hidden layer and output layer in the single hidden layer complex-valued neural network model is L, M and 1 respectively;

给定Q个输入样本，其样本矩阵为Z＝(z_ij)_L×Q＝Z^R+iZ^I，其中，Z^R为Z的实部，Z^I为Z的虚部；Given Q input samples, its sample matrix is Z=(z _ij ) _L×Q =Z ^R +iZ ^I , where Z ^R is the real part of Z, and Z ^I is the imaginary part of Z;

第q个样本的输入为其中，i＝1,2…L；The input for the qth sample is Among them, i=1,2...L;

输入样本相应的理想输出矩阵为D＝(d₁,d₂…d_Q)^T＝D^R+iD^I，其中，D^R为D的实部，D^I为D的虚部；The ideal output matrix corresponding to the input samples is D=(d ₁ ,d ₂ ...d _Q ) ^T ＝D ^R +iD ^I , where D ^R is the real part of D, and D ^I is the imaginary part of D;

第q个样本的理想输出为d^q∈C。The ideal output of the qth sample is d ^q ∈ C.

优选的，所述单隐层复值神经网络模型中的所述隐层的激活函数为g_c:C→C；Preferably, the activation function of the hidden layer in the single hidden layer complex-valued neural network model is g _c : C→C;

连接输入层和隐层的权值矩阵为W＝(w_ij)_M×L＝W^R+iW^I，其中，W^R为W的实部，W^I为W的虚部；The weight matrix connecting the input layer and the hidden layer is W=(w _ij ) _M×L =W ^R +iW ^I , wherein, W ^R is the real part of W, and W ^I is the imaginary part of W;

输入层与第i个隐节点的连接权值记为w_i＝(w_i1,w_i2…w_iL)∈C^L，其中，i＝1,2…M；The connection weight between the input layer and the i-th hidden node is recorded as w _i =(w _i1 ,w _i2 ...w _iL ) ^∈CL , where i=1,2...M;

连接隐层和输出层的权值向量为V＝(v₁,v₂…v_M)^T＝V^R+iV^I，其中，V^R为V的实部，V^I为V的虚部；The weight vector connecting the hidden layer and the output layer is V=(v ₁ ,v ₂ ...v _M ) ^T =V ^R ⁺ iV ^I , where VR is the real part of V, and V ^I is the imaginary part of V;

第k个隐节点与输出层的连接权值记为v_k∈C，其中，k＝1,2…M。The connection weight between the kth hidden node and the output layer is denoted as v _k ∈ C, where k=1,2...M.

优选的，所述步骤(2)中的具体步骤为：Preferably, the specific steps in the step (2) are:

(2-1)初始化输入层到隐层的权值矩阵，获取初始权值矩阵W⁰，W⁰在给定区间内随机赋值；(2-1) Initialize the weight matrix from the input layer to the hidden layer, and obtain the initial weight matrix W ⁰ , where W ⁰ is randomly assigned within a given interval;

(2-2)利用梯度下降法和广义逆计算单隐层复值神经网络中的权值矩阵和权值向量。(2-2) Using gradient descent method and generalized inverse to calculate the weight matrix and weight vector in the single hidden layer complex-valued neural network.

优选的，所述步骤(2-2)中的所述通过广义逆计算隐层到输出层的权值矩阵V的具体步骤为：Preferably, the specific steps of calculating the weight matrix V from the hidden layer to the output layer by generalized inversion in the step (2-2) are:

(2-2a-1)根据步骤(2-1)中的所述初始权值矩阵W⁰与步骤(1)中的样本矩阵Z计算隐层的输入矩阵U＝(u_ij)_M×Q，(2-2a-1) Calculate the input matrix U=(u _ij ) _M×Q of the hidden layer according to the initial weight matrix W ⁰ in step (2-1) and the sample matrix Z in step (1),

(2-2a-2)对矩阵步骤(2-2a-1)中的所述输入矩阵U的实部和虚部分别激活，得到隐层的输出矩阵H＝(h_ij)_M×Q，H＝g_c(U^R))+ig_c(U^I))＝H^R+iH^I，其中，H^R为H的实部，H^I为H的虚部；(2-2a-2) Activate the real and imaginary parts of the input matrix U in the matrix step (2-2a-1) respectively, to obtain the output matrix H=(h _ij ) _M×Q of the hidden layer, H =g _c (U ^R ))+ig _c (U ^I ))=H ^R +iH ^I , wherein, ^HR is the real part of H, and H ^I is the imaginary part of H;

(2-2a-3)：通过广义逆计算隐层到输出层的权值矩阵V，(2-2a-3): Calculate the weight matrix V from the hidden layer to the output layer by generalized inverse,

其中，H为步骤(2-2a-2)中隐层的输出矩阵，D为步骤(1)中的所述理想输出矩阵。Wherein, H is the output matrix of the hidden layer in step (2-2a-2), and D is the ideal output matrix in step (1).

优选的，所述步骤(2-2)中的对初始权值矩阵W⁰进行优化具体步骤为：Preferably, the specific steps of optimizing the initial weight matrix W0 in the step (2-2 ⁾ are:

(2-2b-1)：设置初始迭代次数k＝1，最大迭代次数为K；(2-2b-1): Set the initial number of iterations k=1, and the maximum number of iterations is K;

(2-2b-2)：计算均方误差E关于隐层权值W的梯度；(2-2b-2): Calculate the gradient of the mean square error E with respect to the hidden layer weight W;

(2-2b-3)：权值更新公式为其中，n＝1,2,…,η为学习率。(2-2b-3): The weight update formula is Among them, n=1,2,...,η is the learning rate.

优选的，所述步骤(2-2b-2)中均方误差E关于隐层权值W的梯度分成两部分计算，先求E对W^R的梯度，再求E对W^I的梯度，其中，W^R为W实部，W^I为W的虚部；Preferably, in the step (2-2b-2), the gradient of the mean square error E about the hidden layer weight W is divided into two parts for calculation, and the gradient of E to W ^R is first obtained, and then the gradient of E to W ^I is obtained, wherein , W ^R is the real part of W, and W ^I is the imaginary part of W;

式中，z^q为第q个样本的输入向量，z^q,R为第q个样本的输入向量的实部，z^q,I为第q个样本的输入向量的虚部，g_c为隐层的激活函数，为第q个样本在第m个隐节点处的输入，为第q个样本在第m个隐节点的输入的实部，为第q个样本在第m个隐节点的输入的虚部。In the formula, z ^q is the input vector of the qth sample, z ^{q, R} is the real part of the input vector of the qth sample, z ^{q, I} is the imaginary part of the input vector of the qth sample, g _c is the implicit The activation function of the layer, is the input of the qth sample at the mth hidden node, is the real part of the input of the qth sample at the mth hidden node, is the imaginary part of the input of the qth sample to the mth hidden node.

优选的，所述步骤(3)中的具体步骤为：Preferably, the specific steps in the step (3) are:

(3-1)输出层的激活函数选择线性函数，则输出层的输入等于输出层的输出，网络的实际输出为O＝(o₁,o₂…o_Q)^T，第q个样本的实际输出为o_q∈C，q＝1,2…Q,将矩阵O分为O^R(实部)和O^I(虚部)两部分,O＝H^TV＝O^R+iO^I第q个样本的实际输出:(3-1) The activation function of the output layer chooses a linear function, then the input of the output layer is equal to the output of the output layer, the actual output of the network is O=(o ₁ ,o ₂ …o _Q ) ^T , the actual qth sample The output is o _q ∈ C, q=1,2...Q, the matrix O is divided into two parts O ^R (real part) and O ^I (imaginary part), O=H ^T V=O ^R +iO ^I the qth Actual output of the sample:

其中，v^R为隐层到输出层权值向量的实部，v^I为隐层到输出层权值向量的虚部，h^q,R为第q个样本隐层输出向量的实部，h^q,I为第q个样本隐层输出向量的虚部；Among them, v ^R is the real part of the weight vector from the hidden layer to the output layer, v ^I is the imaginary part of the weight vector from the hidden layer to the output layer, h ^{q, R} is the real part of the qth sample hidden layer output vector, h ^{q, I} is the imaginary part of the qth sample hidden layer output vector;

(3-2)计算当前样本数据的均方误差，判断当前迭代次数k是否等于最大迭代次数K，若是，结束训练；若否，将当前迭代次数加1，返回步骤(2-2b-2)。(3-2) Calculate the mean square error of the current sample data, judge whether the current iteration number k is equal to the maximum iteration number K, if so, end the training; if not, add 1 to the current iteration number, and return to step (2-2b-2) .

优选的，所述步骤(3)中的计算当前样本数据的均方误差采用：Preferably, the mean square error of calculating the current sample data in the step (3) adopts:

其中，o^q为第q个样本的实际输出，d^q为第q个样本的理想输出。Among them, o ^q is the actual output of the qth sample, and d ^q is the ideal output of the qth sample.

本发明的有益效果：Beneficial effects of the present invention:

1、本发明的一种基于梯度下降法与广义逆的复值神经网络训练方法，隐层输入权值是通过梯度下降法迭代产生，输出权值始终通过广义逆来求解。因此，与BSCBP相比，本方法迭代次数少，相应的训练时间短，收敛速度快，并且学习效率高。与CELM相比，所需的隐层节点个数少，并且学习效率高。因此，本发明比BSCBP方法与CELM方法精度高，能够较为准确的反映复值神经网络模型的性能。1. A complex-valued neural network training method based on the gradient descent method and generalized inverse of the present invention, the hidden layer input weight is iteratively generated by the gradient descent method, and the output weight is always solved by the generalized inverse. Therefore, compared with BSCBP, the number of iterations of this method is less, the corresponding training time is shorter, the convergence speed is faster, and the learning efficiency is higher. Compared with CELM, the required number of hidden layer nodes is small, and the learning efficiency is high. Therefore, the present invention has higher precision than the BSCBP method and the CELM method, and can more accurately reflect the performance of the complex-valued neural network model.

2、本发明的一种基于梯度下降法与广义逆的复值神经网络训练方法，求解从隐层到输出层旳权值时，采用了广义逆的方法，无需迭代，一步求出输出权值的最小范数最小二乘解，相比于基于梯度下降法的相关方法(如CBP方法)训练速度要快。2, a kind of complex-valued neural network training method based on gradient descent method and generalized inverse of the present invention, when solving the weight value from hidden layer to output layer, has adopted the method for generalized inverse, does not need to iterate, and one step obtains output weight The minimum norm least squares solution of , compared with related methods based on the gradient descent method (such as the CBP method), the training speed is faster.

附图说明Description of drawings

图1是本发明的方法流程示意图；Fig. 1 is a schematic flow sheet of the method of the present invention;

图2是本发明与BSCBP和CELM建模方法对比的曲线图。Fig. 2 is a graph comparing the present invention with BSCBP and CELM modeling methods.

具体实施方式：detailed description:

应该指出，以下详细说明都是例示性的，旨在对本申请提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本申请的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and/or combinations thereof.

在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。下面结合附图与实施例对本发明作进一步说明。In the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

实施例1：Example 1:

在本实施例中引用文献“Channel Equalization Using Adaptive ComplexRadial Basis Function Networks”中的4-QAM信号的非线性失真的三维均衡器模型。其中，均衡器的输入为 In this embodiment, the three-dimensional equalizer model of the nonlinear distortion of the 4-QAM signal in the document "Channel Equalization Using Adaptive Complex Radial Basis Function Networks" is cited. Among them, the input of the equalizer is

均衡器的理想输出为0.7+0.7i，0.7-0.7i，-0.7+0.7i，-0.7-0.7i。The ideal output of the equalizer is 0.7+0.7i, 0.7-0.7i, -0.7+0.7i, -0.7-0.7i.

在本实施例中训练数据集和测试数据集分别取整体样本数据集的70％和30％。In this embodiment, the training data set and the testing data set take 70% and 30% of the overall sample data set respectively.

首先，通过本发明的一种基于梯度下降法与广义逆的复值神经网络训练方法对数据集进行建模。First, the data set is modeled by a complex-valued neural network training method based on the gradient descent method and the generalized inverse of the present invention.

一种基于梯度下降法与广义逆的复值神经网络训练方法，其方法流程图如图1所示，所述方法步骤包括：A kind of complex-valued neural network training method based on gradient descent method and generalized inverse, its method flowchart as shown in Figure 1, described method step comprises:

(1)选择单隐层复值神经网络模型对样本训练数据集或样本测试数据集进行建模；(1) Select a single hidden layer complex-valued neural network model to model the sample training data set or the sample test data set;

所述步骤(1)中的所述单隐层复值神经网络模型为：The single hidden layer complex-valued neural network model in the described step (1) is:

给定Q个输入样本，其样本矩阵为Z＝(z_ij)_L×Q＝Z^R+iZ^I，其中，Z^R为Given Q input samples, its sample matrix is Z=(z _ij ) _L×Q ＝Z ^R +iZ ^I , where Z ^R is

Z的实部，Z^I为Z的虚部；The real part of Z, Z ^I is the imaginary part of Z;

所述单隐层复值神经网络模型中的所述隐层的激活函数为g_c:C→C；The activation function of the hidden layer in the single hidden layer complex-valued neural network model is g _c : C→C;

步骤S21：初始化输入层到隐层的权值矩阵，获取初始权值矩阵W⁰，W⁰在给定区间随机赋值；Step S21: Initialize the weight matrix from the input layer to the hidden layer, and obtain the initial weight matrix W ⁰ , where W ⁰ is randomly assigned in a given interval;

步骤S22：隐层的输入矩阵为U＝(u_ij)_M×Q,矩阵U分为U^R(实部)和U^I(虚部)两部分,U＝WZ＝U^R+iU^I，第q个样本第m个隐层节点的输入：Step S22: The input matrix of the hidden layer is U=(u _ij ) _M×Q , the matrix U is divided into U ^R (real part) and U ^I (imaginary part), U=WZ=U ^R +iU ^I , the first The input of the mth hidden layer node of q samples:

式中，x^q为第q个输入样本的实部，y^q为第q个输入样本的虚部，为第m个隐节点的输入权向量的实部，为第m个隐节点的输入权向量的虚部；In the formula, x ^q is the real part of the qth input sample, y ^q is the imaginary part of the qth input sample, is the real part of the input weight vector of the mth hidden node, is the imaginary part of the input weight vector of the mth hidden node;

步骤S23：对矩阵U的实部和虚部分别激活，得到隐层的输出矩阵为H＝(h_ij)_M×Q,矩阵H分为H^R(实部)和H^I(虚部)两部分，H＝g_c(U^R)+ig_c(U^I)＝H^R+iH^I Step S23: Activate the real part and the imaginary part of the matrix U respectively, and obtain the output matrix of the hidden layer as H=(h _ij ) _M×Q , and the matrix H is divided into two parts: H ^R (real part) and H ^I (imaginary part). part, H=g _c (U ^R )+ig _c (U ^I )=H ^R +iH ^I

步骤S24：通过广义逆计算隐层到输出层的权值矩阵V，Step S24: Calculate the weight matrix V from the hidden layer to the output layer by generalized inversion,

步骤S25：对初始权值矩阵W⁰进行优化，优化步骤S25如下子步骤：Step S25 ^: Optimizing the initial weight matrix W0, the optimization step S25 is as follows:

步骤S251：设置初始迭代次数k＝1；(最大迭代次数为K)Step S251: set the initial number of iterations k=1; (the maximum number of iterations is K)

步骤S252：计算均方误差E关于隐层权值W的梯度，分成两部分计算，先求E对W^R的梯度,再求E对W^I的梯度， Step S252: Calculate the gradient of the mean square error E with respect to the hidden layer weight W, and divide the calculation into two parts. First, calculate the gradient of E to W ^R , and then calculate the gradient of E to W ^I ,

步骤S27：权值更新公式为Wⁿ⁺¹＝Wⁿ+ΔWⁿn为迭代次数n＝1,2,…,其中而即第m个隐节点在第n次迭代时的梯度；学习率η为常数；Step S27: The weight update formula is W ⁿ⁺¹ =W ⁿ +ΔW ⁿ n is the number of iterations n=1,2,..., where and That is, the gradient of the mth hidden node at the nth iteration; the learning rate η is a constant;

步骤S31：输出层的激活函数选择线性函数，则输出层的输入等于输出层的输出，网络的实际输出为O＝(o₁,o₂…o_Q)^T，第q个样本的实际输出为o_q∈C，q＝1,2…Q,将矩阵O分为O^R(实部)和O^I(虚部)两部分,O＝H^TV＝O^R+iO^I第q个样本的实际输出:Step S31: The activation function of the output layer is a linear function, then the input of the output layer is equal to the output of the output layer, the actual output of the network is O=(o ₁ ,o ₂ ...o _Q ) ^T , and the actual output of the qth sample is o _q ∈ C, q=1,2...Q, divide the matrix O into two parts, O ^R (real part) and O ^I (imaginary part), O=H ^T V=O ^R +iO ^I The qth sample Actual output:

式中，v^R为隐层到输出层权值向量的实部，v^I为隐层到输出层权值向量的虚部，h^q,R为第q个样本隐层输出向量的实部，h^q,I为第q个样本隐层输出向量的虚部；In the formula, v ^R is the real part of the weight vector from the hidden layer to the output layer, v ^I is the imaginary part of the weight vector from the hidden layer to the output layer, h ^q,R is the real part of the hidden layer output vector of the qth sample, h ^{q, I} is the imaginary part of the qth sample hidden layer output vector;

步骤S32：计算训练样本的误差函数： Step S32: Calculate the error function of the training samples:

令k＝k+1，返回步骤S22(隐层到输出层的权值矩阵V始终通过广义逆来求解)。Let k=k+1, return to step S22 (the weight matrix V from the hidden layer to the output layer is always solved by generalized inversion).

本实施例中包括两个对比建模方法，BSCBP方法和CELM方法。CELM方法是文献“Fully complex extreme learning machine”中的方法，该方法是对输入权值矩阵随机赋值，输出权值矩阵通过广义逆来求解。分别通过BSCBP方法和CELM方法对数据集(文献“Channel Equalization Using Adaptive Complex Radial Basis Function Networks”中的4-QAM信号的非线性失真的三维均衡器模型进行处理。其中，均衡器的输入为This embodiment includes two comparative modeling methods, the BSCBP method and the CELM method. The CELM method is a method in the document "Fully complex extreme learning machine". This method randomly assigns values to the input weight matrix, and the output weight matrix is solved by generalized inversion. The three-dimensional equalizer model of the nonlinear distortion of the 4-QAM signal in the data set (document "Channel Equalization Using Adaptive Complex Radial Basis Function Networks" in the literature "Channel Equalization Using Adaptive Complex Radial Basis Function Networks" is processed by the BSCBP method and the CELM method respectively. Wherein, the input of the equalizer is

均衡器的理想输出为0.7+0.7i，0.7-0.7i，-0.7+0.7i，-0.7-0.7i)进行建模。实验结果如图2所示。 The ideal output of the equalizer is 0.7+0.7i, 0.7-0.7i, -0.7+0.7i, -0.7-0.7i) for modeling. The experimental results are shown in Figure 2.

从图2中可以看出在网络结构相同的条件下，本方法的训练误差均低于BSCBP方法和CELM方法，说明本发明方法有效地优化了训练权值，达到了很高的训练精度。It can be seen from Figure 2 that under the same network structure, the training errors of this method are lower than those of the BSCBP method and the CELM method, indicating that the method of the present invention effectively optimizes the training weights and achieves very high training accuracy.

本发明方法中，隐层的输入权值矩阵通过梯度下降法进行更新，因此，与CELM相比，在初始权值相同的条件下，GGCNN模型需要的隐节点数远远少于CELM所需的隐节点数，且小于CELM模型产生的均方误差；In the method of the present invention, the input weight matrix of the hidden layer is updated by the gradient descent method. Therefore, compared with CELM, under the condition of the same initial weight value, the number of hidden nodes required by the GGCNN model is far less than that required by CELM The number of hidden nodes is smaller than the mean square error produced by the CELM model;

本发明的基于梯度下降法与广义逆的复值神经网络的建模方法(Gradient basedGeneralized Complex Neural Networks,简称GGCNN方法)中，隐层的输出权值矩阵通过广义逆来求解，无需迭代，一步求出输出权值矩阵的最小范数最小二乘解。因此在初始权值与隐节点个数相同的条件下，本发明方法不仅比BSCBP训练速度快，并且训练误差和测试误差均有大幅度降低，详细对比结果如表1所示。In the complex-valued neural network modeling method (Gradient based Generalized Complex Neural Networks, referred to as the GGCNN method) based on the gradient descent method and the generalized inverse of the present invention, the output weight matrix of the hidden layer is solved through the generalized inverse, without iteration, and can be obtained in one step. Output the minimum norm least squares solution of the output weight matrix. Therefore, under the condition that the initial weight value and the number of hidden nodes are the same, the method of the present invention is not only faster than the BSCBP training speed, but also greatly reduces the training error and test error. The detailed comparison results are shown in Table 1.

表1Table 1

以上所述仅为本申请的优选实施例而已，并不用于限制本申请，对于本领域的技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. For those skilled in the art, there may be various modifications and changes in the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims

1. A complex-valued neural network training method based on gradient descent method and generalized inverse, it is characterized in that: the method steps include:

(1) Select a single hidden layer complex-valued neural network model to model the sample data set;

(2) According to the single-hidden-layer complex-valued neural network model selected in step (1), use the generalized inverse to calculate the weight matrix in the single-hidden-layer complex-valued neural network, set the initial value of the number of iterations to 1, and use the gradient Descent method calculates the weight vector in the single hidden layer complex-valued neural network;

(3) According to the weight matrix calculated in step (2) and the weight vector, obtain the complex-valued neural network parameters, calculate the mean square error of the current sample data; judge whether the current number of iterations is equal to the maximum number of iterations , if yes, end the training; if not, add 1 to the current iteration number, and return to step (2).

2. a kind of complex-valued neural network training method based on gradient descent method and generalized inverse as claimed in claim 1, is characterized in that: described single hidden layer complex-valued neural network model in described step (1) is:

The number of neurons in the input layer, hidden layer and output layer in the single hidden layer complex-valued neural network model is L, M and 1 respectively; given Q input samples; its sample matrix is Z=(z _ij ) _{L ×Q} , the ideal output matrix corresponding to the input sample is D=(d ₁ ,d ₂ …d _Q ) ^T are all complex numbers; the input of the qth sample is Among them, i=1,2...L; the ideal output of the qth sample is d ^q ∈C.

3. a kind of complex-valued neural network training method based on gradient descent method and generalized inverse as claimed in claim 2, is characterized in that: in the described single hidden layer complex-valued neural network model in described step (1) The activation function of the hidden layer is g _c :C→C; the weight matrix connecting the input layer and the hidden layer is W=(w _ij ) _M×L =W ^R +iW ^I , where W ^R is the real part, W ^I is the imaginary part of W; the connection weight between the input layer and the i-th hidden node is recorded as w _i =(w _i1 ,w _i2 …w _iL ) ^∈CL , where i=1,2…M ;The weight vector connecting the hidden layer and the output layer is V=(v ₁ , v ₂ ...v _M ) ^T =V ^R +iV ^I , ^wherein , VR is the real part of V, and V ^I is the imaginary part of V; The connection weight between the kth hidden node and the output layer is denoted as v _k ∈ C, where k=1,2...M.

4. a kind of complex-valued neural network training method based on gradient descent method and generalized inverse as claimed in claim 3, is characterized in that: the concrete steps in described step (2) are:

(2-1) Initialize the weight matrix from the input layer to the hidden layer, and obtain the initial weight matrix W ⁰ , where W ⁰ is randomly assigned within a given interval;

(2-2) Using gradient descent method and generalized inverse to calculate the weight matrix and weight vector in the single hidden layer complex-valued neural network.

5. a kind of complex-valued neural network training method based on gradient descent method and generalized inverse as claimed in claim 4, is characterized in that: described in the described step (2-2) by generalized inverse calculation hidden layer to output The specific steps of the weight matrix V of the layer are:

(2-2a-1) Calculate the input matrix U=(u _ij ) _M×Q of the hidden layer according to the initial weight matrix W ⁰ in step (2-1) and the sample matrix Z in step (1),

(2-2a-2) Activate the real and imaginary parts of the input matrix U in the matrix step (2-2a-1) respectively, to obtain the output matrix H=(h _ij ) _M×Q of the hidden layer, H =g _c (U ^R )+ig _c (U ^I )=HR ⁺ iH ^I , where HR is the real part of H, and ^H ^I is the imaginary part of H;

(2-2a-3): Calculate the weight matrix V from the hidden layer to the output layer by generalized inverse,

Wherein, H is the output matrix of the hidden layer in step (2-2a-2), and D is the ideal output matrix in step (1).

6. a kind of complex-valued neural network training method based on gradient descent method and generalized inverse as claimed in claim 4, is characterized in that: initial weight matrix W in described step (2-2 ⁾ is optimized specifically The steps are:

(2-2b-1): Set the initial number of iterations k=1, and the maximum number of iterations is K;

(2-2b-2): Calculate the gradient of the mean square error E with respect to the hidden layer weight W;

(2-2b-3): The weight update formula is Among them, n=1,2,...,η is the learning rate.

7. a kind of complex-valued neural network training method based on gradient descent method and generalized inverse as claimed in claim 6, is characterized in that: in the described step (2-2b-2), the mean square error E is about hidden layer weights The gradient of W is calculated in two parts. First, calculate the gradient of E to W ^R , and then calculate the gradient of E to W ^I , where W ^R is the real part of W, and W ^I is the imaginary part of W;

8. a kind of complex-valued neural network training method based on gradient descent method and generalized inverse as claimed in claim 7, is characterized in that:

In the formula, z ^q is the input vector of the qth sample, z ^{q, R} is the real part of the input vector of the qth sample, z ^{q, I} is the imaginary part of the input vector of the qth sample, g _c is the implicit The activation function of the layer, is the input of the qth sample at the mth hidden node, is the real part of the input of the qth sample at the mth hidden node, is the imaginary part of the input of the qth sample to the mth hidden node.

9. a kind of complex-valued neural network training method based on gradient descent method and generalized inverse as claimed in claim 1, is characterized in that: the concrete steps in described step (3) are:

(3-1) The activation function of the output layer chooses a linear function, then the input of the output layer is equal to the output of the output layer, the actual output of the network is O=(o ₁ ,o ₂ …o _Q ) ^T , the actual qth sample The output is o _q ∈ C, q=1,2...Q, the matrix O is divided into two parts O ^R (real part) and O ^I (imaginary part), O=H ^T V=O ^R +iO ^I the qth Actual output of the sample:

Among them, v ^R is the real part of the weight vector from the hidden layer to the output layer, v ^I is the imaginary part of the weight vector from the hidden layer to the output layer, h ^{q, R} is the real part of the qth sample hidden layer output vector, h ^{q, I} is the imaginary part of the qth sample hidden layer output vector;

(3-2) Calculate the mean square error of the current sample data, judge whether the current iteration number k is equal to the maximum iteration number K, if so, end the training; if not, add 1 to the current iteration number, and return to step (2-2b-2) .

10. a kind of complex-valued neural network training method based on gradient descent method and generalized inverse as claimed in claim 9, is characterized in that: the mean square error of calculating current sample data in the described step (3) adopts:

Among them, o ^q is the actual output of the qth sample, and d ^q is the ideal output of the qth sample.