CN109816107A - A BFGS Quasi-Newton Neural Network Training Algorithm Based on Heterogeneous Computing Platform - Google Patents
A BFGS Quasi-Newton Neural Network Training Algorithm Based on Heterogeneous Computing Platform Download PDFInfo
- Publication number
- CN109816107A CN109816107A CN201711158651.3A CN201711158651A CN109816107A CN 109816107 A CN109816107 A CN 109816107A CN 201711158651 A CN201711158651 A CN 201711158651A CN 109816107 A CN109816107 A CN 109816107A
- Authority
- CN
- China
- Prior art keywords
- neural network
- calculating
- newton
- task
- work
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 67
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 53
- 238000000034 method Methods 0.000 claims abstract description 17
- 239000011159 matrix material Substances 0.000 claims abstract description 14
- 238000011156 evaluation Methods 0.000 claims abstract description 9
- 230000009467 reduction Effects 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 29
- 238000004364 calculation method Methods 0.000 claims description 22
- 230000005540 biological transmission Effects 0.000 claims description 8
- 230000001537 neural effect Effects 0.000 claims 4
- 239000000463 material Substances 0.000 claims 1
- 210000002569 neuron Anatomy 0.000 description 15
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000005457 optimization Methods 0.000 description 6
- 210000002364 input neuron Anatomy 0.000 description 5
- 210000004205 output neuron Anatomy 0.000 description 5
- 230000008859 change Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Landscapes
- Feedback Control In General (AREA)
Abstract
本发明公开了一种基于异构计算平台的BFGS拟牛顿神经网络训练算法,包括以下步骤:(1)划分任务:(2)划分并行度,完成计算任务所需的work‑item的总数以及将work‑item组织成的work‑group的数量;求出神经网络训练误差;(3)计算搜索方向dir;(4)基于黄金分割法计算步长λ及更新权重w;(5)计算神经网络训练误差评估函数的梯度g;(6)计算海森矩阵H;(7)并行归约。本发明采用CPU和GPU异构计算平台作为神经网络训练计算设备,使用GPU对BFGS拟牛顿算法进行加速,与传统的基于CPU的实现相比运行速度得到巨大提升;与其他优化算法相比具有较高的收敛效率和全局搜索能力。
The invention discloses a BFGS quasi-Newton neural network training algorithm based on a heterogeneous computing platform. (3) Calculate the search direction dir; (4) Calculate the step size λ and update the weight w based on the golden section method; (5) Calculate the neural network training Gradient g of the error evaluation function; (6) Calculate the Hessian matrix H; (7) Parallel reduction. The invention adopts CPU and GPU heterogeneous computing platform as neural network training computing equipment, uses GPU to accelerate the BFGS quasi-Newton algorithm, and greatly improves the running speed compared with the traditional CPU-based implementation; High convergence efficiency and global search capability.
Description
技术领域technical field
本发明涉及高性能计算与机器学习领域,具体涉及一种基于异构计算平台的BFGS拟牛顿神经网络训练算法。The invention relates to the fields of high-performance computing and machine learning, in particular to a BFGS quasi-Newton neural network training algorithm based on a heterogeneous computing platform.
背景技术Background technique
人工神经网络是一种信息处理系统,它可以通过一系列数据学习任何输入输出关系,并建立准确的模型。目前,人工神经网络面临的主要挑战之一就是训练。在训练之前,神经网络不携带任何信息;经过训练,神经网络的权重值得以确定,从而基于训练数据建立精确的模型。神经网络权重值的确定过程,实际上是一个优化过程,即神经网络通过各种优化算法,比如梯度下降法、拟牛顿算法(QN)、粒子群优化算法(PSO)以及共轭梯度算法(CG)等,经过反复迭代计算得出比较精确的拟合权重值。因此,神经网络训练包含了大量训练数据的多次迭代计算,是一个十分耗时的过程。An artificial neural network is an information processing system that can learn any input-output relationship through a series of data and build an accurate model. Currently, one of the main challenges facing artificial neural networks is training. Before training, the neural network does not carry any information; after training, the weights of the neural network are determined, so as to build an accurate model based on the training data. The process of determining the weight value of the neural network is actually an optimization process, that is, the neural network uses various optimization algorithms, such as gradient descent, quasi-Newton algorithm (QN), particle swarm optimization (PSO) and conjugate gradient algorithm (CG). ), etc., after repeated iterative calculation, a more accurate fitting weight value is obtained. Therefore, neural network training involves multiple iterative calculations of a large amount of training data, which is a very time-consuming process.
随着GPU技术的快速发展,当前GPU已经具备很强的并行计算能力和数据处理能力,但GPU的逻辑处理能力与CPU相比还有很大差距,因此,急需一种综合考虑速度提升、方法扩展性以及设计成本的算法。With the rapid development of GPU technology, the current GPU has strong parallel computing capabilities and data processing capabilities, but the logical processing capabilities of GPUs are still far behind that of CPUs. Algorithms for scalability and design cost.
发明内容SUMMARY OF THE INVENTION
本发明的目的是为了克服现有技术中的不足,解决传统人工神经网络在训练过程中效率低的问题,提供一种基于异构计算平台的BFGS拟牛顿神经网络训练算法,采用CPU和GPU异构计算平台作为神经网络训练计算设备,使用GPU对BFGS拟牛顿算法进行加速,获得一种可以较快完成神经网络训练并建立一个高精度模型的方法。The purpose of the present invention is to overcome the deficiencies in the prior art, solve the problem of low efficiency in the training process of traditional artificial neural networks, and provide a BFGS quasi-Newton neural network training algorithm based on a heterogeneous computing platform. The computing platform is used as a computing device for neural network training, and the GPU is used to accelerate the BFGS quasi-Newton algorithm to obtain a method that can quickly complete the neural network training and build a high-precision model.
本发明的目的是通过以下技术方案实现的:The purpose of this invention is to realize through the following technical solutions:
一种基于异构计算平台的BFGS拟牛顿神经网络训练算法,包括以下步骤:A BFGS quasi-Newton neural network training algorithm based on a heterogeneous computing platform, comprising the following steps:
(1)划分任务:BFGS拟牛顿神经网络训练算法包括计算任务和控制任务,控制任务由CPU完成,计算任务由GPU来完成,计算任务分为五个内核函数(kernel);(1) Division of tasks: The BFGS quasi-Newton neural network training algorithm includes computing tasks and control tasks. The control tasks are completed by the CPU, and the computing tasks are completed by the GPU. The computing tasks are divided into five kernel functions (kernel);
(2)划分并行度,完成计算任务所需的work-item的总数以及将work-item组织成的work-group的数量;(2) Divide the degree of parallelism, the total number of work-item required to complete the computing task and the number of work-groups that organize the work-item into;
(3)求出神经网络训练误差;(3) Find the training error of the neural network;
(4)计算搜索方向dir;(4) Calculate the search direction dir;
(5)基于黄金分割法计算步长λ及更新权重w;(5) Calculate the step size λ and update the weight w based on the golden section method;
(6)计算神经网络训练误差评估函数的梯度g;(6) Calculate the gradient g of the neural network training error evaluation function;
(7)计算海森矩阵H;(7) Calculate the Hessian matrix H;
(8)并行归约。(8) Parallel reduction.
进一步的,步骤(1)中五个内核函数分别为:kernel1表示神经网络训练误差的计算,kernel2表示搜索方向的计算,kernel3表示搜索步长的计算及神经网络连接权重的更新,kernel4表示神经网络训练误差评估函数的梯度的计算,kernel5表示海森矩阵的计算。Further, the five kernel functions in step (1) are respectively: kernel1 represents the calculation of the training error of the neural network, kernel2 represents the calculation of the search direction, kernel3 represents the calculation of the search step size and the update of the neural network connection weight, and kernel4 represents the neural network. The calculation of the gradient of the training error evaluation function, kernel5 represents the calculation of the Hessian matrix.
进一步的,步骤(1)中控制任务包括初始数据由主机端到计算设备端的传输控制、计算结果由计算设备端到主机端的传输、是否达到迭代次数上限的条件判断、训练误差是否符合要求的判断以及计算任务是否终止的控制。Further, in step (1), the control task includes the transmission control of the initial data from the host to the computing device, the transmission of the calculation result from the computing device to the host, the conditional judgment of whether the upper limit of the number of iterations is reached, and the judgment of whether the training error meets the requirements. And the control of whether the computing task is terminated.
与现有技术相比,本发明的技术方案所带来的有益效果是:Compared with the prior art, the beneficial effects brought by the technical solution of the present invention are:
1.本发明使用CPU与GPU异构计算平台作为计算设备,与传统的基于CPU的实现相比运行速度得到巨大提升;1. The present invention uses CPU and GPU heterogeneous computing platform as computing equipment, and compared with the traditional realization based on CPU, the running speed is greatly improved;
2.本发明使用OpenCL作为编程语言实现,与使用CUDA语言的实现相比具有较高的可移植性,可以在不同厂家的GPU以及FPGA上使用;2. The present invention uses OpenCL as a programming language to implement, has higher portability compared to the implementation using CUDA language, and can be used on GPUs and FPGAs of different manufacturers;
3.本发明采用BFGS拟牛顿算法作为神经网络训练的优化算法,与其他优化算法相比具有较高的收敛效率和全局搜索能力。3. The present invention adopts the BFGS quasi-Newton algorithm as the optimization algorithm for neural network training, which has higher convergence efficiency and global search capability than other optimization algorithms.
附图说明Description of drawings
图1是神经网络结构示意图。Figure 1 is a schematic diagram of the neural network structure.
图2是CPU和GPU异构计算平台示意图。Figure 2 is a schematic diagram of a CPU and GPU heterogeneous computing platform.
图3是并行归约示意图。Figure 3 is a schematic diagram of parallel reduction.
图4是并行拟牛顿神经网络训练算法设计流程图。Figure 4 is a flow chart of the design of the parallel quasi-Newton neural network training algorithm.
图5是各个模块之间数据传输示意图。Figure 5 is a schematic diagram of data transmission between various modules.
图6是具体操作流程图。FIG. 6 is a flow chart of a specific operation.
图7是结果示意图Figure 7 is a schematic diagram of the results
具体实施方式Detailed ways
下面结合附图对本发明作进一步的描述。The present invention will be further described below in conjunction with the accompanying drawings.
如图1所示,是神经网络结构示意图,包括:输入层、隐含层和输出层。相邻两次神经元全部连接在一起,每个连接对应一个权重。输入层神经元个数对应一组训练数据的输入个数,输出层神经元个数对应一组训练数据的输出个数,隐含层神经元个数根据需要设定,一般大于输入层神经元数量。As shown in Figure 1, it is a schematic diagram of the neural network structure, including: input layer, hidden layer and output layer. The two adjacent neurons are all connected together, and each connection corresponds to a weight. The number of neurons in the input layer corresponds to the number of inputs of a set of training data, the number of neurons in the output layer corresponds to the number of outputs of a set of training data, and the number of neurons in the hidden layer is set as required, generally greater than the number of neurons in the input layer quantity.
如图2所示,是CPU与GPU异构计算平台示意图。CPU与GPU通过PCIE总线进行通信。CPU所在的部分为host端,主管控制,GPU所在的部分为device端,主管计算,host端的存储空间为host memory,该存储空间可以与GPU端的global memory进行数据的读写。在进行操作时,只需将数据放入相应目录,软件便会自行读取。As shown in Figure 2, it is a schematic diagram of a CPU and GPU heterogeneous computing platform. The CPU and GPU communicate through the PCIE bus. The part where the CPU is located is the host side, which is in charge of control, the part where the GPU is located is the device side, which is in charge of computing, and the storage space on the host side is the host memory, which can read and write data with the global memory on the GPU side. When operating, just put the data into the corresponding directory, and the software will read it by itself.
如图4所示,是本发明算法的设计流程图,根据CPU以及GPU的特点对总体任务进行了划分。其中判断部分以及数据的初始化部分都由CPU完成,神经网络训练误差函数以及BFGS拟牛顿算法都在GPU上实现。各个模块并行度的划分,既考虑了算法自身的特点又考虑了GPU的结构特点。As shown in FIG. 4 , it is the design flow chart of the algorithm of the present invention, and the overall tasks are divided according to the characteristics of the CPU and GPU. The judgment part and the data initialization part are completed by the CPU, and the neural network training error function and the BFGS quasi-Newton algorithm are implemented on the GPU. The division of parallelism of each module takes into account both the characteristics of the algorithm itself and the structural characteristics of the GPU.
具体如下:details as follows:
1)任务划分:该算法总体包含两类任务,分别为计算任务和控制任务。控制任务由CPU来完成,包括初始数据由主机端到计算设备端的传输控制、计算结果由计算设备端到主机端的传输、是否达到迭代次数上限的条件判断、训练误差是否符合要求的判断以及计算任务是否终止的控制。计算任务由GPU来完成,为了便于任务调度及优化,计算任务被分为五个内核函数kernel实现,如图5所示,包括kernel1神经网络训练误差的计算,kernel2搜索方向的计算,kernel3搜索步长的计算及神经网络连接权重的更新,kernel4神经网络训练误差评估函数的梯度的计算,kernel5海森矩阵的计算。1) Task division: The algorithm generally includes two types of tasks, namely computing tasks and control tasks. The control task is completed by the CPU, including the transmission control of the initial data from the host to the computing device, the transmission of the calculation result from the computing device to the host, the conditional judgment of whether the upper limit of the number of iterations has been reached, the judgment of whether the training error meets the requirements, and the calculation task. Whether to terminate the control. The computing task is completed by the GPU. In order to facilitate task scheduling and optimization, the computing task is divided into five kernel functions, as shown in Figure 5, including the calculation of the training error of the neural network of kernel1, the calculation of the search direction of kernel2, and the search step of kernel3. Long calculation and update of neural network connection weight, calculation of gradient of kernel4 neural network training error evaluation function, calculation of kernel5 Hessian matrix.
2)并行度划分:对于1)中五个内核函数kernel,要在GPU上并行实现首先需要划分并行度,即完成这些计算任务所需要的work-item的总数以及将这些work-item组织成的work-group的数量。在五个内核函数kernel中kernel1和kernel4根据训练数据规模划分并行度,例如如果训练数据由1024组,那么这两个kernel需要1024个work-item参与计算,每个work-item进行一组训练数据的相关计算。kernel2、kernel3和kernel5根据神经网络权重数量划分并行度,如图1所示神经网络的连接权重数量为32个,那么这该神经网络结构下的这三个kernel所需的work-item数量为32。2) Parallelism division: For the five kernel functions in 1), parallel implementation on the GPU first needs to divide the parallelism, that is, the total number of work-items required to complete these computing tasks and the organization of these work-items into The number of work-groups. Among the five kernel function kernels, kernel1 and kernel4 divide the degree of parallelism according to the scale of training data. For example, if the training data consists of 1024 groups, then these two kernels need 1024 work-items to participate in the calculation, and each work-item performs a set of training data. related calculations. kernel2, kernel3 and kernel5 are divided into parallelism according to the number of neural network weights. As shown in Figure 1, the number of connection weights of the neural network is 32, then the number of work-items required by the three kernels under the neural network structure is 32 .
3)神经网络训练误差评估函数:该函数对应图5中kernel1。该kernel从内存中读取训练数据及连接权重w进行计算,最后获得w对应的训练误差ET(w)。该函数中,第s个work-item的计算内容如公式(1)(2)(3)所示,其中输入神经元的值即为训练数据,用xi s表示,其表示与第i个输入神经元对应的第s组训练数据的值;隐藏神经元的值由公式(1)和(2)计算得到,其中hj表示第j个隐藏神经对应的输入神经元与权重的累和,f(hj)表示第j个隐藏神经元的值,wij表示第i个输入神经元到第j个隐藏神经元的权重,n表示输入神经元数量;输出神经元ym的值由公式(3)计算得到,其中m表示第m个输出神经元,N表示隐藏神经元的数量。最后,使用归约技术根据公式(4)求得神经网络训练误差ET。公式(4)中ST表示训练数据集的规模,Ny表示输出神经元个数,ym表示第m个输出神经元的计算结果,dkm表示第k组训练数据与第m个输出神经元对应的理想的输出结果。dmax,m表示给定训练数据中最大的理想输出值,而dmin,m表示给定的训练数据中最小的理想输出值。3) Neural network training error evaluation function: This function corresponds to kernel1 in Figure 5. The kernel reads the training data and the connection weight w from the memory for calculation, and finally obtains the training error E T (w) corresponding to w. In this function, the calculation content of the sth work-item is shown in formula (1)(2)(3), where the value of the input neuron is the training data, which is represented by x i s , which is the same as the i-th work-item. The value of the sth group of training data corresponding to the input neuron; the value of the hidden neuron is calculated by formulas (1) and (2), where hj represents the accumulation of the input neuron and the weight corresponding to the jth hidden neuron, f (h j ) represents the value of the jth hidden neuron, w ij represents the weight from the ith input neuron to the jth hidden neuron, n represents the number of input neurons; the value of the output neuron y m is determined by the formula ( 3) Calculated, where m represents the mth output neuron, and N represents the number of hidden neurons. Finally, the neural network training error ET is obtained according to formula (4) using the reduction technique. In formula (4), S T represents the scale of the training data set, N y represents the number of output neurons, y m represents the calculation result of the m-th output neuron, and d km represents the k-th group of training data and the m-th output neuron. Meta corresponds to the ideal output result. d max,m represents the maximum ideal output value in the given training data, and d min,m represents the smallest ideal output value in the given training data.
4)计算搜索方向dir:该函数对应图5中kernel2,其计算过程需要用到海森矩阵H和梯度g,每一个work-item读取矩阵H的一行数据与梯度g进行乘累加运算产生搜索方向dir的一个元素。4) Calculate the search direction dir: This function corresponds to kernel2 in Figure 5. The calculation process requires the use of the Hessian matrix H and the gradient g. Each work-item reads a row of data in the matrix H and performs a multiply-accumulate operation with the gradient g to generate a search. An element of the direction dir.
5)基于黄金分割法计算步长λ及更新权重w:该函数对应图5中kernel3,其过程为首先初始化一个步长区间并计算该段区间的两个黄金分割点,再以这两个点为步长更新w,然后调用ET函数分别求两组w对应的值并比较大小,留下较小的那个步长并删除较大步长外侧的一段区间。如此反复,直到剩下的步长区间小于一个定值,停止迭代并确定步长。由于该kernel主要目的是更新权重w,因此函数内部并行度的划分基于w的维度进行的。即每个work-item计算权重w的一个元素。该函数要反复调用神经网络训练误差评估函数,即需要在一个kernel内部调用另一个kernel。该功能可以通过高于1.1版本的OpenCL实现。但由于两个kernel的并行度并不相同,因此虽然总体的work-item数量相同,但活跃的work-item并不相同。为了避免由此带来的work-item冲突,在每次调用训练误差函数之前,要对所有work-item强制进行同步操作。5) Calculate the step size λ and update the weight w based on the golden section method: This function corresponds to kernel3 in Figure 5. The process is to first initialize a step size interval and calculate the two golden section points of the interval, and then use these two points. Update w for the step size, and then call the ET function to find the corresponding values of the two groups of w and compare the sizes, leaving the smaller step size and deleting a section outside the larger step size. Repeat this until the remaining step size interval is less than a fixed value, stop the iteration and determine the step size. Since the main purpose of the kernel is to update the weight w, the division of the internal parallelism of the function is based on the dimension of w. That is, each work-item computes one element of the weight w. This function needs to repeatedly call the neural network training error evaluation function, that is, it needs to call another kernel inside one kernel. This functionality can be implemented with OpenCL versions higher than 1.1. However, since the parallelism of the two kernels is not the same, although the overall number of work-item is the same, the active work-item is not the same. To avoid the resulting work-item conflict, a synchronous operation is enforced on all work-items before each invocation of the training error function.
6)计算神经网络训练误差评估函数的梯度g:该函数对应图5中kernel4,其过程主要为逐层计算每组权重w中各个元素的偏导数,然后使用并行归约的方法求各个元素偏导数的和,即可得到神经网络训练误差函数梯度。该函数中,每个work-item计算基于一组训练数据的权重w中所有元素的偏导数,最后对所有work-item进行归约求和6) Calculate the gradient g of the neural network training error evaluation function: this function corresponds to kernel4 in Figure 5. The process is mainly to calculate the partial derivatives of each element in each group of weights w layer by layer, and then use the parallel reduction method to find the partial derivatives of each element. The sum of the derivatives, the gradient of the neural network training error function can be obtained. In this function, each work-item calculates the partial derivatives of all elements in the weight w based on a set of training data, and finally reduces and sums all work-items
7)计算海森矩阵H:所有公式中下角标k表示第k次迭代,该函数对应图5中kernel5,由步骤5)和6)计算得到的权重w和梯度g通过公式(5)(6)计算得到s和z,再由这两个向量依照公式(7)计算得到x,其中s为权重的改变量,z为梯度的改变量,x为校正项。最后根据公式(8)计算海森矩阵。该部分中每个work-item计算得到海森矩阵H一列的元素。Hk为第k次迭代的海森矩阵H。7) Calculate the Hessian matrix H: the subscript k in all formulas represents the kth iteration, this function corresponds to kernel5 in Figure 5, and the weight w and gradient g calculated by steps 5) and 6) are obtained through formula (5) (6) ) to obtain s and z, and then calculate x from these two vectors according to formula (7), where s is the change in weight, z is the change in gradient, and x is the correction term. Finally, the Hessian matrix is calculated according to formula (8). Each work-item in this part is calculated to obtain the elements of one column of the Hessian matrix H. H k is the Hessian matrix H of the k-th iteration.
sk=wk+1-wk (5)s k =w k+1 -w k (5)
zk=gk+1-gk (6)z k =g k+1 -g k (6)
8)并行归约:在BFGS拟牛顿神经网络训练并行算法的kernel1和kernel4中需要用到并行归约的技术来处理一些向量求得向量中各元素累加之和。并行归约技术如图3所示,第一个work-item计算第一个变量与最后一个变量之和,第二个work-item计算第二个变量与倒数第二个变量之和,依此类推,最后结果由第一个work-item计算得到。8) Parallel reduction: In the kernel1 and kernel4 of the BFGS quasi-Newton neural network training parallel algorithm, the parallel reduction technology needs to be used to process some vectors to obtain the cumulative sum of each element in the vector. The parallel reduction technique is shown in Figure 3. The first work-item calculates the sum of the first variable and the last variable, the second work-item calculates the sum of the second variable and the penultimate variable, and so on By analogy, the final result is calculated by the first work-item.
具体的,图5是各个模块之间数据传输示意图。每一个模块对应一个kernel。首先,kernel2使用初始的海森矩阵H和梯度g计算搜索方向d;然后,kernel3根据黄金分割算法确定步长,并由搜索方向d和步长λ更新权重w并传入kernel1计算训练误差error并进行评估,直到满足条件确定最终步长并确定w;kernel4根据kernel1中的一些中间变量计算ET(w)的偏导数并确定该函数梯度;最后,在kernel5中,使用之前的几个kernel产生的权重w和梯度g计算海森矩阵矩阵H。经过外部CPU上判断,若达到要求则输出结果,否则继续循环运行这五个kernel。Specifically, FIG. 5 is a schematic diagram of data transmission between various modules. Each module corresponds to a kernel. First, kernel2 uses the initial Hessian matrix H and gradient g to calculate the search direction d; then, kernel3 determines the step size according to the golden section algorithm, and updates the weight w by the search direction d and step size λ and passes it into kernel1 to calculate the training error error and Evaluate until the conditions are met to determine the final step size and determine w; kernel4 calculates the partial derivative of E T (w) according to some intermediate variables in kernel1 and determines the gradient of the function; finally, in kernel5, use the previous several kernels to generate The weights w and gradients g compute the Hessian matrix H. After judgment on the external CPU, if the requirements are met, the results will be output, otherwise, the five kernels will continue to be run in a loop.
其具体操作如图6所示:Its specific operation is shown in Figure 6:
(1)神经网络参数设定(1) Neural network parameter setting
如图1所示,是神经网络的结构图;本实施例选择了单隐含层神经网络,输入层神经元数量根据所要拟合的模型的输入变量个数设置,输出层神经元数量根据所要拟合的模型的输出变量个数设置,隐含层神经元数量根据自身需要设置,一般大于输入层神经元数量。神经网络权重数量会根据前面所设神经元的数量按照公式w=(input+output)*hidden自行计算更改。As shown in Figure 1, it is the structure diagram of the neural network; in this embodiment, a single hidden layer neural network is selected, the number of neurons in the input layer is set according to the number of input variables of the model to be fitted, and the number of neurons in the output layer is set according to the required number of input variables. The number of output variables of the fitted model is set, and the number of neurons in the hidden layer is set according to its own needs, which is generally larger than the number of neurons in the input layer. The number of neural network weights will be calculated and changed by the formula w=(input+output)*hidden according to the number of neurons set earlier.
(2)GPU端参数设定(2) GPU side parameter setting
GPU端参数设置主要有work-item数量的设置和work-group规模的设置。work-item是GPU上运行的最小单位,其数量的设置可根据训练数据规模而定。比如训练数据规模为4096组,那么work-item数量可设为4096个。一定数量的work-item可组织为work-group,这样便于同组work-item之间数据传输以及管理。work-group内work-item数量可根据神经网络权重数量设置,如图1所示神经网络,其权重数量为(3+1)*8=32个,那么一个work-group中work-item的数量便设为32个。The GPU-side parameter settings mainly include the setting of the number of work-items and the setting of the scale of the work-group. Work-item is the smallest unit running on the GPU, and its number can be set according to the size of the training data. For example, the training data scale is 4096 groups, then the number of work-items can be set to 4096. A certain number of work-items can be organized into work-groups, which facilitates data transfer and management between work-items in the same group. The number of work-items in a work-group can be set according to the number of neural network weights. As shown in Figure 1, the number of weights of the neural network is (3+1)*8=32, then the number of work-items in a work-group It is set to 32.
(3)训练数据导入(3) Import training data
软件只能读取csv格式的文件,所以需要将训练数据先存储于csv文件中,然后将其存入软件目录下,并将相应的文件名更改为training-data即可。The software can only read files in csv format, so you need to store the training data in the csv file first, then store it in the software directory, and change the corresponding file name to training-data.
(4)终止条件设定(4) Termination condition setting
以上步骤完成后,还需要设定软件终止条件,一般包括迭代最大次数设置以及训练误差边界条件设置。设置完成后,软件达到最大迭代次数或者训练误差小于训练误差边界条件则会终止程序并输出结果。After the above steps are completed, it is also necessary to set the termination conditions of the software, which generally include the setting of the maximum number of iterations and the setting of the training error boundary conditions. After the setting is completed, the software will terminate the program and output the result when the software reaches the maximum number of iterations or the training error is less than the training error boundary condition.
(5)结果记录(5) Result record
输出结果如图7所示。结果包含四个信息,第一行为训练误差,表示神经网络拟合模型的精确度,如图所示为37.5297;第二行是迭代次数,表示程序从运行到终止的迭代次数,如图所示为20;第三行开始为获得的神经网络权重;最后一行为运行时间,单位是秒。The output result is shown in Figure 7. The result contains four pieces of information. The first line is the training error, which represents the accuracy of the neural network fitting model, as shown in the figure, which is 37.5297; the second line is the number of iterations, which represents the number of iterations from running to termination of the program, as shown in the figure. is 20; the third row starts with the obtained neural network weights; the last row is the running time, in seconds.
本发明并不限于上文描述的实施方式。以上对具体实施方式的描述旨在描述和说明本发明的技术方案,上述的具体实施方式仅仅是示意性的,并不是限制性的。在不脱离本发明宗旨和权利要求所保护的范围情况下,本领域的普通技术人员在本发明的启示下还可做出很多形式的具体变换,这些均属于本发明的保护范围之内。The present invention is not limited to the embodiments described above. The above description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above-mentioned specific embodiments are only illustrative and not restrictive. Without departing from the spirit of the present invention and the protection scope of the claims, those of ordinary skill in the art can also make many specific transformations under the inspiration of the present invention, which all fall within the protection scope of the present invention.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711158651.3A CN109816107A (en) | 2017-11-20 | 2017-11-20 | A BFGS Quasi-Newton Neural Network Training Algorithm Based on Heterogeneous Computing Platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711158651.3A CN109816107A (en) | 2017-11-20 | 2017-11-20 | A BFGS Quasi-Newton Neural Network Training Algorithm Based on Heterogeneous Computing Platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109816107A true CN109816107A (en) | 2019-05-28 |
Family
ID=66598678
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711158651.3A Pending CN109816107A (en) | 2017-11-20 | 2017-11-20 | A BFGS Quasi-Newton Neural Network Training Algorithm Based on Heterogeneous Computing Platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109816107A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111476346A (en) * | 2020-02-28 | 2020-07-31 | 之江实验室 | Deep learning network architecture based on Newton conjugate gradient method |
CN113515822A (en) * | 2021-01-28 | 2021-10-19 | 长春工业大学 | Return-to-zero neural network-based stretching integral structure form finding method |
WO2021208808A1 (en) * | 2020-04-14 | 2021-10-21 | International Business Machines Corporation | Cooperative neural networks with spatial containment constraints |
US11222201B2 (en) | 2020-04-14 | 2022-01-11 | International Business Machines Corporation | Vision-based cell structure recognition using hierarchical neural networks |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102999756A (en) * | 2012-11-09 | 2013-03-27 | 重庆邮电大学 | Method for recognizing road signs by PSO-SVM (particle swarm optimization-support vector machine) based on GPU (graphics processing unit) |
CN105303252A (en) * | 2015-10-12 | 2016-02-03 | 国家计算机网络与信息安全管理中心 | Multi-stage nerve network model training method based on genetic algorithm |
CN106503803A (en) * | 2016-10-31 | 2017-03-15 | 天津大学 | A kind of limited Boltzmann machine iteration map training method based on pseudo-Newtonian algorithm |
CN106528357A (en) * | 2016-11-24 | 2017-03-22 | 天津大学 | FPGA system and implementation method based on on-line training neural network of quasi-newton method |
CN106775905A (en) * | 2016-11-19 | 2017-05-31 | 天津大学 | Higher synthesis based on FPGA realizes the method that Quasi-Newton algorithm accelerates |
-
2017
- 2017-11-20 CN CN201711158651.3A patent/CN109816107A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102999756A (en) * | 2012-11-09 | 2013-03-27 | 重庆邮电大学 | Method for recognizing road signs by PSO-SVM (particle swarm optimization-support vector machine) based on GPU (graphics processing unit) |
CN105303252A (en) * | 2015-10-12 | 2016-02-03 | 国家计算机网络与信息安全管理中心 | Multi-stage nerve network model training method based on genetic algorithm |
CN106503803A (en) * | 2016-10-31 | 2017-03-15 | 天津大学 | A kind of limited Boltzmann machine iteration map training method based on pseudo-Newtonian algorithm |
CN106775905A (en) * | 2016-11-19 | 2017-05-31 | 天津大学 | Higher synthesis based on FPGA realizes the method that Quasi-Newton algorithm accelerates |
CN106528357A (en) * | 2016-11-24 | 2017-03-22 | 天津大学 | FPGA system and implementation method based on on-line training neural network of quasi-newton method |
Non-Patent Citations (1)
Title |
---|
JIAJUN LI.ET AL: ""Neural Network Training Acceleration with PSO Algorithm on a GPU Using OpenCL"", 《PROCEEDING OF THE 8TH INTERNATIONAL SYMPOSIUM ON HIGHLY EFFICIENT ACCELERATORS AND RECONFIGURABLE TECHNOLOGIES》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111476346A (en) * | 2020-02-28 | 2020-07-31 | 之江实验室 | Deep learning network architecture based on Newton conjugate gradient method |
CN111476346B (en) * | 2020-02-28 | 2022-11-29 | 之江实验室 | Deep learning network architecture based on Newton conjugate gradient method |
WO2021208808A1 (en) * | 2020-04-14 | 2021-10-21 | International Business Machines Corporation | Cooperative neural networks with spatial containment constraints |
US11222201B2 (en) | 2020-04-14 | 2022-01-11 | International Business Machines Corporation | Vision-based cell structure recognition using hierarchical neural networks |
GB2610098A (en) * | 2020-04-14 | 2023-02-22 | Ibm | Cooperative neural networks with spatial containment constraints |
US11734939B2 (en) | 2020-04-14 | 2023-08-22 | International Business Machines Corporation | Vision-based cell structure recognition using hierarchical neural networks and cell boundaries to structure clustering |
US11734576B2 (en) | 2020-04-14 | 2023-08-22 | International Business Machines Corporation | Cooperative neural networks with spatial containment constraints |
CN113515822A (en) * | 2021-01-28 | 2021-10-19 | 长春工业大学 | Return-to-zero neural network-based stretching integral structure form finding method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gschwend | Zynqnet: An fpga-accelerated embedded convolutional neural network | |
US11574195B2 (en) | Operation method | |
US10872290B2 (en) | Neural network processor with direct memory access and hardware acceleration circuits | |
US20180260709A1 (en) | Calculating device and method for a sparsely connected artificial neural network | |
CN112052642B (en) | Systems and methods for ESL modeling for machine learning | |
CN109816107A (en) | A BFGS Quasi-Newton Neural Network Training Algorithm Based on Heterogeneous Computing Platform | |
CN104346629A (en) | Model parameter training method, device and system | |
CN106875013A (en) | The system and method for optimizing Recognition with Recurrent Neural Network for multinuclear | |
CN105488565A (en) | Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm | |
CN115526287A (en) | Techniques for memory-efficient and parameter-efficient graph neural networks | |
JP2022504995A (en) | Methods and systems for accelerating AI training using advanced interconnect technology | |
EP3836030A1 (en) | Method and apparatus with model optimization, and accelerator system | |
CN110163338A (en) | Chip computing method, device, terminal and chip with computing array | |
CN110580519A (en) | A kind of convolution operation structure and its method | |
CN109816103A (en) | A PSO-BFGS Neural Network Training Algorithm | |
CN102662322B (en) | FPGA (field programmable gate array) processor and PID (proportion integration differentiation) membrane optimization neural network controller | |
JP2022502724A (en) | Methods, equipment, and related products for processing data | |
CN107527071A (en) | A kind of sorting technique and device that k nearest neighbor is obscured based on flower pollination algorithm optimization | |
CN110309918A (en) | Verification method, device and the computer equipment of Neural Network Online model | |
Andri et al. | Going further with winograd convolutions: Tap-wise quantization for efficient inference on 4x4 tiles | |
CN111429974B (en) | Molecular dynamics simulation short-range force parallel optimization method on super computer platform | |
CN109359542A (en) | The determination method and terminal device of vehicle damage rank neural network based | |
EP3805995A1 (en) | Method of and apparatus for processing data of a deep neural network | |
US12314758B2 (en) | Task manager, processing device, and method for checking task dependencies thereof | |
WO2021013117A1 (en) | Systems and methods for providing block-wise sparsity in a neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190528 |
|
RJ01 | Rejection of invention patent application after publication |