CN106875002A - Complex value neural network training method based on gradient descent method Yu generalized inverse - Google Patents
Complex value neural network training method based on gradient descent method Yu generalized inverse Download PDFInfo
- Publication number
- CN106875002A CN106875002A CN201710091587.5A CN201710091587A CN106875002A CN 106875002 A CN106875002 A CN 106875002A CN 201710091587 A CN201710091587 A CN 201710091587A CN 106875002 A CN106875002 A CN 106875002A
- Authority
- CN
- China
- Prior art keywords
- neural network
- complex
- layer
- output
- hidden layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 49
- 238000012549 training Methods 0.000 title claims abstract description 46
- 238000011478 gradient descent method Methods 0.000 title claims abstract description 32
- 239000011159 matrix material Substances 0.000 claims abstract description 66
- 238000003062 neural network model Methods 0.000 claims abstract description 21
- 230000006870 function Effects 0.000 claims description 18
- 230000004913 activation Effects 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000012886 linear function Methods 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000012360 testing method Methods 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Complex Calculations (AREA)
Abstract
本发明涉及一种基于梯度下降法与广义逆的复值神经网络训练方法,步骤一,选择单隐层复值神经网络模型;步骤二,利用梯度下降法和广义逆计算单隐层复值神经网络中的权值矩阵和权值向量,步骤三,根据权值矩阵和权值向量,获取复值神经网络网络参数,进行计算均方误差;将迭代次数加1,返回步骤二。本发明的隐层输入权值是通过梯度下降法迭代产生,输出权值始终通过广义逆来求解。本方法迭代次数少,相应的训练时间短,收敛速度快,并且学习效率高,同时所需的隐层节点个数少。因此,本发明比BSCBP方法与CELM方法能够较为准确的反映复值神经网络模型的性能。
The invention relates to a complex-valued neural network training method based on gradient descent method and generalized inverse. Step 1 is to select a single hidden layer complex-valued neural network model; Step 2 is to use gradient descent method and generalized inverse to calculate single hidden layer complex-valued neural network. The weight matrix and weight vector in the network, step 3, according to the weight matrix and weight vector, obtain complex-valued neural network parameters, and calculate the mean square error; add 1 to the number of iterations, and return to step 2. The hidden layer input weight of the present invention is iteratively generated by the gradient descent method, and the output weight is always solved by generalized inversion. The number of iterations of this method is small, the corresponding training time is short, the convergence speed is fast, and the learning efficiency is high, and the number of hidden layer nodes required is small. Therefore, the present invention can more accurately reflect the performance of the complex-valued neural network model than the BSCBP method and the CELM method.
Description
技术领域technical field
本发明属于图像处理、模式识别和通信传输的技术领域,尤其涉及一种基于梯度下降法与广义逆的复值神经网络训练方法。The invention belongs to the technical fields of image processing, pattern recognition and communication transmission, and in particular relates to a complex-valued neural network training method based on gradient descent method and generalized inverse.
背景技术Background technique
在图像处理、模式识别和通信传输等方面利用神经网络建模的方法进行样本训练与测试有广泛的应用。在训练样本中的神经网络模型建模,神经网络信号(输入信号和输出信号以及权值参数),可以为实数值和复数值,从而神经网络分为实值神经网络和复值神经网络。现有的神经网络建模方法多数是建立的实值神经网络模型,但随着电子信息科学的迅速发展,复数值信号越来越频繁地出现在工程实践中,仅考虑实值的计算无法良好的解决实际问题,而复值神经网络可以解决一些实值神经网络所解决不了的问题。复值神经网络是通过复数参数和变量(即信号的输入、输出以及网络权值均为复数)来处理复数信息的神经网络。因此,一系列复值神经网络的模型被陆续提出并加以深入研究。In image processing, pattern recognition and communication transmission, the method of using neural network modeling for sample training and testing has a wide range of applications. In the neural network model modeling in the training samples, the neural network signals (input signal, output signal and weight parameter) can be real-valued or complex-valued, so the neural network is divided into real-valued neural network and complex-valued neural network. Most of the existing neural network modeling methods are real-valued neural network models, but with the rapid development of electronic information science, complex-valued signals appear more and more frequently in engineering practice, and calculations that only consider real values cannot be performed well. To solve practical problems, complex-valued neural networks can solve some problems that real-valued neural networks cannot solve. The complex-valued neural network is a neural network that processes complex information through complex parameters and variables (that is, the input, output, and network weight of the signal are all complex). Therefore, a series of complex-valued neural network models have been proposed and studied in depth.
在Batch Split-Complex Backpropagation Algorithm中提出了一种BSCBP方法用于训练复值神经网络。激活函数选择实虚部型激活函数,对隐层输入的实虚部分别激活,避免奇异点的出现;BSCBP方法先给输入权值矩阵以及输出权值矩阵进行随机赋值,再通过梯度下降法进行梯度更新,最后计算测试样本的精度。但是,基于梯度下降法的BSCBP模型需要多次迭代训练,消耗时间长并且学习效率较低。A BSCBP method is proposed in Batch Split-Complex Backpropagation Algorithm for training complex-valued neural networks. The activation function selects the activation function of the real and imaginary parts, and activates the real and imaginary parts of the hidden layer input separately to avoid the appearance of singular points; the BSCBP method first randomly assigns values to the input weight matrix and output weight matrix, and then uses the gradient descent method. The gradient is updated, and finally the accuracy on the test samples is calculated. However, the BSCBP model based on the gradient descent method requires multiple iterations of training, which consumes a long time and has low learning efficiency.
在Fully Complex Extreme Learning Machine中提出了一种CELM方法将ELM方法从实数域扩展到复数域,并应用于非线性信道均衡。CELM只需要设置合适的网络隐层节点个数,对网络的输入权值进行随机赋值,输出层权值的最优解通过最小二乘法得到,激活函数可以选sigmoid函数、(反)三角函数以及(反)双曲函数,与BSCBP不同的是,激活函数对隐层的输入矩阵直接激活。整个过程一次完成,无需迭代,因此具有参数选择容易、学习速度极快的优点。但是CELM方法为了弥补隐层节点参数的随意性选择,往往需要较多的隐层节点个数,且训练精度有待进一步提高。In Fully Complex Extreme Learning Machine, a CELM method is proposed to extend the ELM method from the real number domain to the complex number domain, and apply it to nonlinear channel equalization. CELM only needs to set the appropriate number of network hidden layer nodes, and randomly assign the input weights of the network. The optimal solution of the output layer weights is obtained by the least square method. The activation function can be selected from sigmoid function, (inverse) trigonometric function and The (inverse) hyperbolic function, unlike BSCBP, activates the input matrix of the hidden layer directly. The whole process is completed once without iteration, so it has the advantages of easy parameter selection and extremely fast learning speed. However, in order to make up for the arbitrary selection of hidden layer node parameters, the CELM method often requires a large number of hidden layer nodes, and the training accuracy needs to be further improved.
综上所述,BSCBP在训练时速度缓慢,精度较低,CELM方法虽然速度快,但所需要的网络隐层节点数过多,且精度也有待提高,现有技术中对于如何同时解决复值神经网络训练方法中训练速度慢、精度低和网络隐层节点数过多的问题,尚缺乏有效的解决方案。In summary, BSCBP is slow in training and has low precision. Although the CELM method is fast, it requires too many nodes in the hidden layer of the network, and the accuracy needs to be improved. In the prior art, how to solve complex value There is still no effective solution to the problems of slow training speed, low precision and too many nodes in the network hidden layer in the neural network training method.
发明内容Contents of the invention
本发明为了解决上述问题,克服传统的复值神经网络训练方法中无法同时解决训练速度慢、精度低和网络隐层节点数过多的问题,提供一种基于梯度下降法与广义逆的复值神经网络训练方法(Gradient based Generalized Complex Neural Networks,简称GGCNN)。In order to solve the above problems, the present invention overcomes the problems that the traditional complex-valued neural network training method cannot simultaneously solve the problems of slow training speed, low precision and too many nodes in the network hidden layer, and provides a complex-valued neural network based on gradient descent method and generalized inverse Neural network training method (Gradient based Generalized Complex Neural Networks, referred to as GGCNN).
为了实现上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
一种基于梯度下降法与广义逆的复值神经网络训练方法,所述方法步骤包括:A complex-valued neural network training method based on gradient descent method and generalized inverse, said method steps comprising:
(1)选择单隐层复值神经网络模型对样本数据集进行建模;(1) Select a single hidden layer complex-valued neural network model to model the sample data set;
(2)根据步骤(1)中选择的单隐层复值神经网络模型,利用广义逆计算所述单隐层复值神经网络中的权值矩阵,将迭代次数初始值设置为1,利用梯度下降法计算所述单隐层复值神经网络中的权值向量;(2) According to the single-hidden-layer complex-valued neural network model selected in step (1), use the generalized inverse to calculate the weight matrix in the single-hidden-layer complex-valued neural network, set the initial value of the number of iterations to 1, and use the gradient Descent method calculates the weight vector in the single hidden layer complex-valued neural network;
(3)根据步骤(2)中计算出的所述权值矩阵和所述权值向量,获取复值神经网络网络参数,计算当前样本数据的均方误差;判断当前迭代次数是否等于最大迭代次数,若是,结束训练;若否,将当前迭代次数加1,返回步骤(2)。(3) According to the weight matrix calculated in step (2) and the weight vector, obtain the complex-valued neural network parameters, calculate the mean square error of the current sample data; judge whether the current number of iterations is equal to the maximum number of iterations , if yes, end the training; if not, add 1 to the current iteration number, and return to step (2).
优选的,所述步骤(1)中的样本数据集包括训练样本数据集或测试数据集。Preferably, the sample data set in the step (1) includes a training sample data set or a test data set.
优选的,所述步骤(1)中的所述单隐层复值神经网络模型为:Preferably, the single hidden layer complex-valued neural network model in the step (1) is:
所述单隐层复值神经网络模型中的输入层、隐层和输出层神经元个数分别为L、M和1;The number of neurons in the input layer, hidden layer and output layer in the single hidden layer complex-valued neural network model is L, M and 1 respectively;
给定Q个输入样本,其样本矩阵为Z=(zij)L×Q=ZR+iZI,其中,ZR为Z的实部,ZI为Z的虚部;Given Q input samples, its sample matrix is Z=(z ij ) L×Q =Z R +iZ I , where Z R is the real part of Z, and Z I is the imaginary part of Z;
第q个样本的输入为其中,i=1,2…L;The input for the qth sample is Among them, i=1,2...L;
输入样本相应的理想输出矩阵为D=(d1,d2…dQ)T=DR+iDI,其中,DR为D的实部,DI为D的虚部;The ideal output matrix corresponding to the input samples is D=(d 1 ,d 2 ...d Q ) T =D R +iD I , where D R is the real part of D, and D I is the imaginary part of D;
第q个样本的理想输出为dq∈C。The ideal output of the qth sample is d q ∈ C.
优选的,所述单隐层复值神经网络模型中的所述隐层的激活函数为gc:C→C;Preferably, the activation function of the hidden layer in the single hidden layer complex-valued neural network model is g c : C→C;
连接输入层和隐层的权值矩阵为W=(wij)M×L=WR+iWI,其中,WR为W的实部,WI为W的虚部;The weight matrix connecting the input layer and the hidden layer is W=(w ij ) M×L =W R +iW I , wherein, W R is the real part of W, and W I is the imaginary part of W;
输入层与第i个隐节点的连接权值记为wi=(wi1,wi2…wiL)∈CL,其中,i=1,2…M;The connection weight between the input layer and the i-th hidden node is recorded as w i =(w i1 ,w i2 ...w iL ) ∈CL , where i=1,2...M;
连接隐层和输出层的权值向量为V=(v1,v2…vM)T=VR+iVI,其中,VR为V的实部,VI为V的虚部;The weight vector connecting the hidden layer and the output layer is V=(v 1 ,v 2 ...v M ) T =V R + iV I , where VR is the real part of V, and V I is the imaginary part of V;
第k个隐节点与输出层的连接权值记为vk∈C,其中,k=1,2…M。The connection weight between the kth hidden node and the output layer is denoted as v k ∈ C, where k=1,2...M.
优选的,所述步骤(2)中的具体步骤为:Preferably, the specific steps in the step (2) are:
(2-1)初始化输入层到隐层的权值矩阵,获取初始权值矩阵W0,W0在给定区间内随机赋值;(2-1) Initialize the weight matrix from the input layer to the hidden layer, and obtain the initial weight matrix W 0 , where W 0 is randomly assigned within a given interval;
(2-2)利用梯度下降法和广义逆计算单隐层复值神经网络中的权值矩阵和权值向量。(2-2) Using gradient descent method and generalized inverse to calculate the weight matrix and weight vector in the single hidden layer complex-valued neural network.
优选的,所述步骤(2-2)中的所述通过广义逆计算隐层到输出层的权值矩阵V的具体步骤为:Preferably, the specific steps of calculating the weight matrix V from the hidden layer to the output layer by generalized inversion in the step (2-2) are:
(2-2a-1)根据步骤(2-1)中的所述初始权值矩阵W0与步骤(1)中的样本矩阵Z计算隐层的输入矩阵U=(uij)M×Q,(2-2a-1) Calculate the input matrix U=(u ij ) M×Q of the hidden layer according to the initial weight matrix W 0 in step (2-1) and the sample matrix Z in step (1),
(2-2a-2)对矩阵步骤(2-2a-1)中的所述输入矩阵U的实部和虚部分别激活,得到隐层的输出矩阵H=(hij)M×Q,H=gc(UR))+igc(UI))=HR+iHI,其中,HR为H的实部,HI为H的虚部;(2-2a-2) Activate the real and imaginary parts of the input matrix U in the matrix step (2-2a-1) respectively, to obtain the output matrix H=(h ij ) M×Q of the hidden layer, H =g c (U R ))+ig c (U I ))=H R +iH I , wherein, HR is the real part of H, and H I is the imaginary part of H;
(2-2a-3):通过广义逆计算隐层到输出层的权值矩阵V,(2-2a-3): Calculate the weight matrix V from the hidden layer to the output layer by generalized inverse,
其中,H为步骤(2-2a-2)中隐层的输出矩阵,D为步骤(1)中的所述理想输出矩阵。Wherein, H is the output matrix of the hidden layer in step (2-2a-2), and D is the ideal output matrix in step (1).
优选的,所述步骤(2-2)中的对初始权值矩阵W0进行优化具体步骤为:Preferably, the specific steps of optimizing the initial weight matrix W0 in the step (2-2 ) are:
(2-2b-1):设置初始迭代次数k=1,最大迭代次数为K;(2-2b-1): Set the initial number of iterations k=1, and the maximum number of iterations is K;
(2-2b-2):计算均方误差E关于隐层权值W的梯度;(2-2b-2): Calculate the gradient of the mean square error E with respect to the hidden layer weight W;
(2-2b-3):权值更新公式为其中,n=1,2,…,η为学习率。(2-2b-3): The weight update formula is Among them, n=1,2,...,η is the learning rate.
优选的,所述步骤(2-2b-2)中均方误差E关于隐层权值W的梯度分成两部分计算,先求E对WR的梯度,再求E对WI的梯度,其中,WR为W实部,WI为W的虚部;Preferably, in the step (2-2b-2), the gradient of the mean square error E about the hidden layer weight W is divided into two parts for calculation, and the gradient of E to W R is first obtained, and then the gradient of E to W I is obtained, wherein , W R is the real part of W, and W I is the imaginary part of W;
式中,zq为第q个样本的输入向量,zq,R为第q个样本的输入向量的实部,zq,I为第q个样本的输入向量的虚部,gc为隐层的激活函数,为第q个样本在第m个隐节点处的输入,为第q个样本在第m个隐节点的输入的实部,为第q个样本在第m个隐节点的输入的虚部。In the formula, z q is the input vector of the qth sample, z q, R is the real part of the input vector of the qth sample, z q, I is the imaginary part of the input vector of the qth sample, g c is the implicit The activation function of the layer, is the input of the qth sample at the mth hidden node, is the real part of the input of the qth sample at the mth hidden node, is the imaginary part of the input of the qth sample to the mth hidden node.
优选的,所述步骤(3)中的具体步骤为:Preferably, the specific steps in the step (3) are:
(3-1)输出层的激活函数选择线性函数,则输出层的输入等于输出层的输出,网络的实际输出为O=(o1,o2…oQ)T,第q个样本的实际输出为oq∈C,q=1,2…Q,将矩阵O分为OR(实部)和OI(虚部)两部分,O=HTV=OR+iOI第q个样本的实际输出:(3-1) The activation function of the output layer chooses a linear function, then the input of the output layer is equal to the output of the output layer, the actual output of the network is O=(o 1 ,o 2 …o Q ) T , the actual qth sample The output is o q ∈ C, q=1,2...Q, the matrix O is divided into two parts O R (real part) and O I (imaginary part), O=H T V=O R +iO I the qth Actual output of the sample:
其中,vR为隐层到输出层权值向量的实部,vI为隐层到输出层权值向量的虚部,hq,R为第q个样本隐层输出向量的实部,hq,I为第q个样本隐层输出向量的虚部;Among them, v R is the real part of the weight vector from the hidden layer to the output layer, v I is the imaginary part of the weight vector from the hidden layer to the output layer, h q, R is the real part of the qth sample hidden layer output vector, h q, I is the imaginary part of the qth sample hidden layer output vector;
(3-2)计算当前样本数据的均方误差,判断当前迭代次数k是否等于最大迭代次数K,若是,结束训练;若否,将当前迭代次数加1,返回步骤(2-2b-2)。(3-2) Calculate the mean square error of the current sample data, judge whether the current iteration number k is equal to the maximum iteration number K, if so, end the training; if not, add 1 to the current iteration number, and return to step (2-2b-2) .
优选的,所述步骤(3)中的计算当前样本数据的均方误差采用:Preferably, the mean square error of calculating the current sample data in the step (3) adopts:
其中,oq为第q个样本的实际输出,dq为第q个样本的理想输出。Among them, o q is the actual output of the qth sample, and d q is the ideal output of the qth sample.
本发明的有益效果:Beneficial effects of the present invention:
1、本发明的一种基于梯度下降法与广义逆的复值神经网络训练方法,隐层输入权值是通过梯度下降法迭代产生,输出权值始终通过广义逆来求解。因此,与BSCBP相比,本方法迭代次数少,相应的训练时间短,收敛速度快,并且学习效率高。与CELM相比,所需的隐层节点个数少,并且学习效率高。因此,本发明比BSCBP方法与CELM方法精度高,能够较为准确的反映复值神经网络模型的性能。1. A complex-valued neural network training method based on the gradient descent method and generalized inverse of the present invention, the hidden layer input weight is iteratively generated by the gradient descent method, and the output weight is always solved by the generalized inverse. Therefore, compared with BSCBP, the number of iterations of this method is less, the corresponding training time is shorter, the convergence speed is faster, and the learning efficiency is higher. Compared with CELM, the required number of hidden layer nodes is small, and the learning efficiency is high. Therefore, the present invention has higher precision than the BSCBP method and the CELM method, and can more accurately reflect the performance of the complex-valued neural network model.
2、本发明的一种基于梯度下降法与广义逆的复值神经网络训练方法,求解从隐层到输出层旳权值时,采用了广义逆的方法,无需迭代,一步求出输出权值的最小范数最小二乘解,相比于基于梯度下降法的相关方法(如CBP方法)训练速度要快。2, a kind of complex-valued neural network training method based on gradient descent method and generalized inverse of the present invention, when solving the weight value from hidden layer to output layer, has adopted the method for generalized inverse, does not need to iterate, and one step obtains output weight The minimum norm least squares solution of , compared with related methods based on the gradient descent method (such as the CBP method), the training speed is faster.
附图说明Description of drawings
图1是本发明的方法流程示意图;Fig. 1 is a schematic flow sheet of the method of the present invention;
图2是本发明与BSCBP和CELM建模方法对比的曲线图。Fig. 2 is a graph comparing the present invention with BSCBP and CELM modeling methods.
具体实施方式:detailed description:
应该指出,以下详细说明都是例示性的,旨在对本申请提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本申请的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and/or combinations thereof.
在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面结合附图与实施例对本发明作进一步说明。In the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present invention will be further described below in conjunction with the accompanying drawings and embodiments.
实施例1:Example 1:
在本实施例中引用文献“Channel Equalization Using Adaptive ComplexRadial Basis Function Networks”中的4-QAM信号的非线性失真的三维均衡器模型。其中,均衡器的输入为 In this embodiment, the three-dimensional equalizer model of the nonlinear distortion of the 4-QAM signal in the document "Channel Equalization Using Adaptive Complex Radial Basis Function Networks" is cited. Among them, the input of the equalizer is
均衡器的理想输出为0.7+0.7i,0.7-0.7i,-0.7+0.7i,-0.7-0.7i。The ideal output of the equalizer is 0.7+0.7i, 0.7-0.7i, -0.7+0.7i, -0.7-0.7i.
在本实施例中训练数据集和测试数据集分别取整体样本数据集的70%和30%。In this embodiment, the training data set and the testing data set take 70% and 30% of the overall sample data set respectively.
首先,通过本发明的一种基于梯度下降法与广义逆的复值神经网络训练方法对数据集进行建模。First, the data set is modeled by a complex-valued neural network training method based on the gradient descent method and the generalized inverse of the present invention.
一种基于梯度下降法与广义逆的复值神经网络训练方法,其方法流程图如图1所示,所述方法步骤包括:A kind of complex-valued neural network training method based on gradient descent method and generalized inverse, its method flowchart as shown in Figure 1, described method step comprises:
(1)选择单隐层复值神经网络模型对样本训练数据集或样本测试数据集进行建模;(1) Select a single hidden layer complex-valued neural network model to model the sample training data set or the sample test data set;
所述步骤(1)中的所述单隐层复值神经网络模型为:The single hidden layer complex-valued neural network model in the described step (1) is:
所述单隐层复值神经网络模型中的输入层、隐层和输出层神经元个数分别为L、M和1;The number of neurons in the input layer, hidden layer and output layer in the single hidden layer complex-valued neural network model is L, M and 1 respectively;
给定Q个输入样本,其样本矩阵为Z=(zij)L×Q=ZR+iZI,其中,ZR为Given Q input samples, its sample matrix is Z=(z ij ) L×Q =Z R +iZ I , where Z R is
Z的实部,ZI为Z的虚部;The real part of Z, Z I is the imaginary part of Z;
第q个样本的输入为其中,i=1,2…L;The input for the qth sample is Among them, i=1,2...L;
输入样本相应的理想输出矩阵为D=(d1,d2…dQ)T=DR+iDI,其中,DR为D的实部,DI为D的虚部;The ideal output matrix corresponding to the input samples is D=(d 1 ,d 2 ...d Q ) T =D R +iD I , where D R is the real part of D, and D I is the imaginary part of D;
第q个样本的理想输出为dq∈C。The ideal output of the qth sample is d q ∈ C.
所述单隐层复值神经网络模型中的所述隐层的激活函数为gc:C→C;The activation function of the hidden layer in the single hidden layer complex-valued neural network model is g c : C→C;
连接输入层和隐层的权值矩阵为W=(wij)M×L=WR+iWI,其中,WR为W的实部,WI为W的虚部;The weight matrix connecting the input layer and the hidden layer is W=(w ij ) M×L =W R +iW I , wherein, W R is the real part of W, and W I is the imaginary part of W;
输入层与第i个隐节点的连接权值记为wi=(wi1,wi2…wiL)∈CL,其中,i=1,2…M;The connection weight between the input layer and the i-th hidden node is recorded as w i =(w i1 ,w i2 ...w iL ) ∈CL , where i=1,2...M;
连接隐层和输出层的权值向量为V=(v1,v2…vM)T=VR+iVI,其中,VR为V的实部,VI为V的虚部;The weight vector connecting the hidden layer and the output layer is V=(v 1 ,v 2 ...v M ) T =V R + iV I , where VR is the real part of V, and V I is the imaginary part of V;
第k个隐节点与输出层的连接权值记为vk∈C,其中,k=1,2…M。The connection weight between the kth hidden node and the output layer is denoted as v k ∈ C, where k=1,2...M.
(2)根据步骤(1)中选择的单隐层复值神经网络模型,利用广义逆计算所述单隐层复值神经网络中的权值矩阵,将迭代次数初始值设置为1,利用梯度下降法计算所述单隐层复值神经网络中的权值向量;(2) According to the single-hidden-layer complex-valued neural network model selected in step (1), use the generalized inverse to calculate the weight matrix in the single-hidden-layer complex-valued neural network, set the initial value of the number of iterations to 1, and use the gradient Descent method calculates the weight vector in the single hidden layer complex-valued neural network;
步骤S21:初始化输入层到隐层的权值矩阵,获取初始权值矩阵W0,W0在给定区间随机赋值;Step S21: Initialize the weight matrix from the input layer to the hidden layer, and obtain the initial weight matrix W 0 , where W 0 is randomly assigned in a given interval;
步骤S22:隐层的输入矩阵为U=(uij)M×Q,矩阵U分为UR(实部)和UI(虚部)两部分,U=WZ=UR+iUI,第q个样本第m个隐层节点的输入:Step S22: The input matrix of the hidden layer is U=(u ij ) M×Q , the matrix U is divided into U R (real part) and U I (imaginary part), U=WZ=U R +iU I , the first The input of the mth hidden layer node of q samples:
式中,xq为第q个输入样本的实部,yq为第q个输入样本的虚部,为第m个隐节点的输入权向量的实部,为第m个隐节点的输入权向量的虚部;In the formula, x q is the real part of the qth input sample, y q is the imaginary part of the qth input sample, is the real part of the input weight vector of the mth hidden node, is the imaginary part of the input weight vector of the mth hidden node;
步骤S23:对矩阵U的实部和虚部分别激活,得到隐层的输出矩阵为H=(hij)M×Q,矩阵H分为HR(实部)和HI(虚部)两部分,H=gc(UR)+igc(UI)=HR+iHI Step S23: Activate the real part and the imaginary part of the matrix U respectively, and obtain the output matrix of the hidden layer as H=(h ij ) M×Q , and the matrix H is divided into two parts: H R (real part) and H I (imaginary part). part, H=g c (U R )+ig c (U I )=H R +iH I
步骤S24:通过广义逆计算隐层到输出层的权值矩阵V,Step S24: Calculate the weight matrix V from the hidden layer to the output layer by generalized inversion,
步骤S25:对初始权值矩阵W0进行优化,优化步骤S25如下子步骤:Step S25 : Optimizing the initial weight matrix W0, the optimization step S25 is as follows:
步骤S251:设置初始迭代次数k=1;(最大迭代次数为K)Step S251: set the initial number of iterations k=1; (the maximum number of iterations is K)
步骤S252:计算均方误差E关于隐层权值W的梯度,分成两部分计算,先求E对WR的梯度,再求E对WI的梯度, Step S252: Calculate the gradient of the mean square error E with respect to the hidden layer weight W, and divide the calculation into two parts. First, calculate the gradient of E to W R , and then calculate the gradient of E to W I ,
步骤S27:权值更新公式为Wn+1=Wn+ΔWnn为迭代次数n=1,2,…,其中而即第m个隐节点在第n次迭代时的梯度;学习率η为常数;Step S27: The weight update formula is W n+1 =W n +ΔW n n is the number of iterations n=1,2,..., where and That is, the gradient of the mth hidden node at the nth iteration; the learning rate η is a constant;
(3)根据步骤(2)中计算出的所述权值矩阵和所述权值向量,获取复值神经网络网络参数,计算当前样本数据的均方误差;判断当前迭代次数是否等于最大迭代次数,若是,结束训练;若否,将当前迭代次数加1,返回步骤(2)。(3) According to the weight matrix calculated in step (2) and the weight vector, obtain the complex-valued neural network parameters, calculate the mean square error of the current sample data; judge whether the current number of iterations is equal to the maximum number of iterations , if yes, end the training; if not, add 1 to the current iteration number, and return to step (2).
步骤S31:输出层的激活函数选择线性函数,则输出层的输入等于输出层的输出,网络的实际输出为O=(o1,o2…oQ)T,第q个样本的实际输出为oq∈C,q=1,2…Q,将矩阵O分为OR(实部)和OI(虚部)两部分,O=HTV=OR+iOI第q个样本的实际输出:Step S31: The activation function of the output layer is a linear function, then the input of the output layer is equal to the output of the output layer, the actual output of the network is O=(o 1 ,o 2 ...o Q ) T , and the actual output of the qth sample is o q ∈ C, q=1,2...Q, divide the matrix O into two parts, O R (real part) and O I (imaginary part), O=H T V=O R +iO I The qth sample Actual output:
式中,vR为隐层到输出层权值向量的实部,vI为隐层到输出层权值向量的虚部,hq,R为第q个样本隐层输出向量的实部,hq,I为第q个样本隐层输出向量的虚部;In the formula, v R is the real part of the weight vector from the hidden layer to the output layer, v I is the imaginary part of the weight vector from the hidden layer to the output layer, h q,R is the real part of the hidden layer output vector of the qth sample, h q, I is the imaginary part of the qth sample hidden layer output vector;
步骤S32:计算训练样本的误差函数: Step S32: Calculate the error function of the training samples:
令k=k+1,返回步骤S22(隐层到输出层的权值矩阵V始终通过广义逆来求解)。Let k=k+1, return to step S22 (the weight matrix V from the hidden layer to the output layer is always solved by generalized inversion).
本实施例中包括两个对比建模方法,BSCBP方法和CELM方法。CELM方法是文献“Fully complex extreme learning machine”中的方法,该方法是对输入权值矩阵随机赋值,输出权值矩阵通过广义逆来求解。分别通过BSCBP方法和CELM方法对数据集(文献“Channel Equalization Using Adaptive Complex Radial Basis Function Networks”中的4-QAM信号的非线性失真的三维均衡器模型进行处理。其中,均衡器的输入为This embodiment includes two comparative modeling methods, the BSCBP method and the CELM method. The CELM method is a method in the document "Fully complex extreme learning machine". This method randomly assigns values to the input weight matrix, and the output weight matrix is solved by generalized inversion. The three-dimensional equalizer model of the nonlinear distortion of the 4-QAM signal in the data set (document "Channel Equalization Using Adaptive Complex Radial Basis Function Networks" in the literature "Channel Equalization Using Adaptive Complex Radial Basis Function Networks" is processed by the BSCBP method and the CELM method respectively. Wherein, the input of the equalizer is
均衡器的理想输出为0.7+0.7i,0.7-0.7i,-0.7+0.7i,-0.7-0.7i)进行建模。实验结果如图2所示。 The ideal output of the equalizer is 0.7+0.7i, 0.7-0.7i, -0.7+0.7i, -0.7-0.7i) for modeling. The experimental results are shown in Figure 2.
从图2中可以看出在网络结构相同的条件下,本方法的训练误差均低于BSCBP方法和CELM方法,说明本发明方法有效地优化了训练权值,达到了很高的训练精度。It can be seen from Figure 2 that under the same network structure, the training errors of this method are lower than those of the BSCBP method and the CELM method, indicating that the method of the present invention effectively optimizes the training weights and achieves very high training accuracy.
本发明方法中,隐层的输入权值矩阵通过梯度下降法进行更新,因此,与CELM相比,在初始权值相同的条件下,GGCNN模型需要的隐节点数远远少于CELM所需的隐节点数,且小于CELM模型产生的均方误差;In the method of the present invention, the input weight matrix of the hidden layer is updated by the gradient descent method. Therefore, compared with CELM, under the condition of the same initial weight value, the number of hidden nodes required by the GGCNN model is far less than that required by CELM The number of hidden nodes is smaller than the mean square error produced by the CELM model;
本发明的基于梯度下降法与广义逆的复值神经网络的建模方法(Gradient basedGeneralized Complex Neural Networks,简称GGCNN方法)中,隐层的输出权值矩阵通过广义逆来求解,无需迭代,一步求出输出权值矩阵的最小范数最小二乘解。因此在初始权值与隐节点个数相同的条件下,本发明方法不仅比BSCBP训练速度快,并且训练误差和测试误差均有大幅度降低,详细对比结果如表1所示。In the complex-valued neural network modeling method (Gradient based Generalized Complex Neural Networks, referred to as the GGCNN method) based on the gradient descent method and the generalized inverse of the present invention, the output weight matrix of the hidden layer is solved through the generalized inverse, without iteration, and can be obtained in one step. Output the minimum norm least squares solution of the output weight matrix. Therefore, under the condition that the initial weight value and the number of hidden nodes are the same, the method of the present invention is not only faster than the BSCBP training speed, but also greatly reduces the training error and test error. The detailed comparison results are shown in Table 1.
表1Table 1
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. For those skilled in the art, there may be various modifications and changes in the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application shall be included within the protection scope of this application.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710091587.5A CN106875002A (en) | 2017-02-20 | 2017-02-20 | Complex value neural network training method based on gradient descent method Yu generalized inverse |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710091587.5A CN106875002A (en) | 2017-02-20 | 2017-02-20 | Complex value neural network training method based on gradient descent method Yu generalized inverse |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106875002A true CN106875002A (en) | 2017-06-20 |
Family
ID=59166995
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710091587.5A Pending CN106875002A (en) | 2017-02-20 | 2017-02-20 | Complex value neural network training method based on gradient descent method Yu generalized inverse |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106875002A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108334947A (en) * | 2018-01-17 | 2018-07-27 | 上海爱优威软件开发有限公司 | A kind of the SGD training methods and system of intelligent optimization |
CN109255308A (en) * | 2018-11-02 | 2019-01-22 | 陕西理工大学 | There are the neural network angle-of- arrival estimation methods of array error |
CN109274624A (en) * | 2018-11-07 | 2019-01-25 | 中国电子科技集团公司第三十六研究所 | A kind of carrier frequency bias estimation based on convolutional neural networks |
CN110011733A (en) * | 2019-03-25 | 2019-07-12 | 华中科技大学 | A Momentum Factor-Based Depolarization Multiplexing Method and System |
CN110034827A (en) * | 2019-03-25 | 2019-07-19 | 华中科技大学 | A kind of depolarization multiplexing method and system based on reverse observation error |
CN110824922A (en) * | 2019-11-22 | 2020-02-21 | 电子科技大学 | Smith estimation compensation method based on six-order B-spline wavelet neural network |
CN111950711A (en) * | 2020-08-14 | 2020-11-17 | 苏州大学 | A Second-Order Hybrid Construction Method and System for Complex-valued Feedforward Neural Networks |
CN112148730A (en) * | 2020-06-30 | 2020-12-29 | 网络通信与安全紫金山实验室 | A method for batch extraction of product data features using the generalized inverse of a matrix |
CN112770013A (en) * | 2021-01-15 | 2021-05-07 | 电子科技大学 | Heterogeneous information network embedding method based on side sampling |
CN113158582A (en) * | 2021-05-24 | 2021-07-23 | 苏州大学 | Wind speed prediction method based on complex value forward neural network |
US11120333B2 (en) | 2018-04-30 | 2021-09-14 | International Business Machines Corporation | Optimization of model generation in deep learning neural networks using smarter gradient descent calibration |
CN114091327A (en) * | 2021-11-10 | 2022-02-25 | 中国航发沈阳发动机研究所 | Method for determining radar scattering characteristics of engine cavity |
WO2023216383A1 (en) * | 2022-05-13 | 2023-11-16 | 苏州大学 | Complex-valued timing signal prediction method based on complex-valued neural network |
US11863221B1 (en) * | 2020-07-14 | 2024-01-02 | Hrl Laboratories, Llc | Low size, weight and power (swap) efficient hardware implementation of a wide instantaneous bandwidth neuromorphic adaptive core (NeurACore) |
US12057989B1 (en) | 2020-07-14 | 2024-08-06 | Hrl Laboratories, Llc | Ultra-wide instantaneous bandwidth complex neuromorphic adaptive core processor |
-
2017
- 2017-02-20 CN CN201710091587.5A patent/CN106875002A/en active Pending
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108334947A (en) * | 2018-01-17 | 2018-07-27 | 上海爱优威软件开发有限公司 | A kind of the SGD training methods and system of intelligent optimization |
US11120333B2 (en) | 2018-04-30 | 2021-09-14 | International Business Machines Corporation | Optimization of model generation in deep learning neural networks using smarter gradient descent calibration |
CN109255308A (en) * | 2018-11-02 | 2019-01-22 | 陕西理工大学 | There are the neural network angle-of- arrival estimation methods of array error |
CN109255308B (en) * | 2018-11-02 | 2023-07-21 | 陕西理工大学 | Neural Network Angle of Arrival Estimation Method with Array Error |
CN109274624B (en) * | 2018-11-07 | 2021-04-27 | 中国电子科技集团公司第三十六研究所 | Carrier frequency offset estimation method based on convolutional neural network |
CN109274624A (en) * | 2018-11-07 | 2019-01-25 | 中国电子科技集团公司第三十六研究所 | A kind of carrier frequency bias estimation based on convolutional neural networks |
CN110034827A (en) * | 2019-03-25 | 2019-07-19 | 华中科技大学 | A kind of depolarization multiplexing method and system based on reverse observation error |
CN110011733A (en) * | 2019-03-25 | 2019-07-12 | 华中科技大学 | A Momentum Factor-Based Depolarization Multiplexing Method and System |
CN110824922A (en) * | 2019-11-22 | 2020-02-21 | 电子科技大学 | Smith estimation compensation method based on six-order B-spline wavelet neural network |
CN112148730A (en) * | 2020-06-30 | 2020-12-29 | 网络通信与安全紫金山实验室 | A method for batch extraction of product data features using the generalized inverse of a matrix |
US11863221B1 (en) * | 2020-07-14 | 2024-01-02 | Hrl Laboratories, Llc | Low size, weight and power (swap) efficient hardware implementation of a wide instantaneous bandwidth neuromorphic adaptive core (NeurACore) |
US12057989B1 (en) | 2020-07-14 | 2024-08-06 | Hrl Laboratories, Llc | Ultra-wide instantaneous bandwidth complex neuromorphic adaptive core processor |
CN111950711A (en) * | 2020-08-14 | 2020-11-17 | 苏州大学 | A Second-Order Hybrid Construction Method and System for Complex-valued Feedforward Neural Networks |
CN112770013A (en) * | 2021-01-15 | 2021-05-07 | 电子科技大学 | Heterogeneous information network embedding method based on side sampling |
CN113158582A (en) * | 2021-05-24 | 2021-07-23 | 苏州大学 | Wind speed prediction method based on complex value forward neural network |
WO2022247049A1 (en) * | 2021-05-24 | 2022-12-01 | 苏州大学 | Method for predicting wind speed based on complex-valued forward neural network |
CN114091327A (en) * | 2021-11-10 | 2022-02-25 | 中国航发沈阳发动机研究所 | Method for determining radar scattering characteristics of engine cavity |
CN114091327B (en) * | 2021-11-10 | 2022-09-20 | 中国航发沈阳发动机研究所 | Method for determining radar scattering characteristics of engine cavity |
WO2023216383A1 (en) * | 2022-05-13 | 2023-11-16 | 苏州大学 | Complex-valued timing signal prediction method based on complex-valued neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106875002A (en) | Complex value neural network training method based on gradient descent method Yu generalized inverse | |
WO2023019601A1 (en) | Signal modulation recognition method for complex-valued neural network based on structure optimization algorithm | |
CN109948029B (en) | Neural network self-adaptive depth Hash image searching method | |
CN110969250B (en) | Neural network training method and device | |
CN103413174B (en) | Based on the short-term wind speed multistep forecasting method of degree of depth learning method | |
CN102982373B (en) | OIN (Optimal Input Normalization) neural network training method for mixed SVM (Support Vector Machine) regression algorithm | |
CN108734301A (en) | A kind of machine learning method and machine learning device | |
CN107203891A (en) | A kind of automatic many threshold values characteristic filter method and devices | |
US11625614B2 (en) | Small-world nets for fast neural network training and execution | |
CN107480774A (en) | Dynamic neural network model training method and device based on integrated study | |
CN110162739B (en) | RFFKLMS Algorithm Weight Update Optimization Method Based on Variable Forgetting Factor | |
CN108566257A (en) | Signal recovery method based on back propagation neural network | |
CN112099345B (en) | A fuzzy tracking control method, system and medium based on input hysteresis | |
CN114117945B (en) | A deep learning cloud service QoS prediction method based on user-service interaction graph | |
CN106021829B (en) | A kind of nonlinear system modeling method based on RBF-ARX model stability parameter Estimation | |
CN111950711A (en) | A Second-Order Hybrid Construction Method and System for Complex-valued Feedforward Neural Networks | |
CN109754122A (en) | A kind of Numerical Predicting Method of the BP neural network based on random forest feature extraction | |
CN118536407B (en) | A sea surface wind speed prediction algorithm and system based on symbolic dynamic analysis | |
CN114912489A (en) | Signal modulation identification method | |
CN110490324A (en) | A kind of gradient decline width learning system implementation method | |
CN113962163A (en) | An optimization method, device and equipment for realizing efficient design of passive microwave devices | |
CN115983105A (en) | Occam inversion Lagrange multiplier optimization method based on deep learning weighting decision | |
CN106407932A (en) | Handwritten number recognition method based on fractional calculus and generalized inverse neural network | |
CN107391442A (en) | A kind of augmentation linear model and its application process | |
CN114819107B (en) | Hybrid data assimilation method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170620 |