WO2020051751A1 - 卷积神经网络计算的加速方法、装置、设备及存储介质 - Google Patents

卷积神经网络计算的加速方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020051751A1
WO2020051751A1 PCT/CN2018/104901 CN2018104901W WO2020051751A1 WO 2020051751 A1 WO2020051751 A1 WO 2020051751A1 CN 2018104901 W CN2018104901 W CN 2018104901W WO 2020051751 A1 WO2020051751 A1 WO 2020051751A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
image
address generator
neural network
output
Prior art date
Application number
PCT/CN2018/104901
Other languages
English (en)
French (fr)
Inventor
李善辽
王峥
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2018/104901 priority Critical patent/WO2020051751A1/zh
Publication of WO2020051751A1 publication Critical patent/WO2020051751A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the invention belongs to the technical field of data processing, and particularly relates to a method, a device, a device, and a storage medium for accelerating computation of a convolutional neural network.
  • the purpose of the present invention is to provide a method, device, device and storage medium for accelerating the calculation of a convolutional neural network, which aims to solve the problem that the prior art cannot provide an effective method for accelerating the calculation of a convolutional neural network, which leads to The problem of low accuracy of network calculation results.
  • the present invention provides a method for accelerating computation of a convolutional neural network.
  • the method includes the following steps:
  • control address generator When receiving a request for performing convolution calculation on the image data corresponding to the target image through a convolutional neural network, the control address generator reads a pre-stored address generator instruction from a configuration register;
  • Controlling the data memory to read the image data from the data address output by the address generator, and input the read image data into an input shift register
  • the method before the step of controlling the address generator to read the pre-stored address generator instruction from the configuration register, the method further includes:
  • the step of controlling the address generator to output a data address of the image data stored in a data memory includes:
  • the method before the step of controlling the address generator to read the pre-stored address generator instruction from the configuration register, the method further includes:
  • the respective image channel values corresponding to each image pixel are sequentially stored in the data memory with continuous data addresses.
  • the present invention provides a device for accelerating computation of a convolutional neural network, the device including:
  • An instruction reading unit configured to control the address generator to read a pre-stored address generator instruction from a configuration register when a request for convolution calculation of image data corresponding to a target image is received through a convolutional neural network;
  • a data address output unit configured to control the address generator to output a data address of the image data stored in a data memory according to the address generator instruction
  • An image data reading unit configured to control the data memory to read the image data from the data address output by the address generator, and input the read image data into an input shift register ;
  • a neuron calculation unit configured to control the input shift register to input the received image data into the convolutional neural network to perform neuron calculation to obtain corresponding feature map data, and the feature map data Input into an output shift register;
  • a feature map storage unit is used to control the output shift register to input the received feature map data into the data memory for storage according to a preset image data memory storage mode to accelerate completion of the current volume Product calculation.
  • the data address output unit includes:
  • a pixel obtaining unit configured to obtain, according to the convolutional neural network configuration parameter stored in the configuration register, a pixel of a feature map corresponding to the image data to be output calculated by the current convolution;
  • An address output subunit configured to control the address generator to obtain, according to the feature map pixels, a pixel range of the target image corresponding to the feature map pixels, and continuously output corresponding data according to the pixel range address.
  • the device further comprises:
  • a parameter setting storage unit configured to set an address generator instruction related to the address generator and a convolutional neural network configuration parameter related to the convolutional neural network, and copy the address generator instruction and the volume Product neural network configuration parameters are stored in the configuration register;
  • a channel value acquiring unit configured to acquire the width, height, and number of image channels of the target image, calculate the number of image pixels of the target image according to the width and the height, and acquire the target according to the number of image channels Each image channel value corresponding to each image pixel in the image;
  • a data storage unit is configured to sequentially store the respective image channel values corresponding to the each image pixel point in the data memory with continuous data addresses according to the number of image pixel points.
  • the present invention also provides a computing device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor is implemented when the processor executes the computer program. Steps as described in the above acceleration method for convolutional neural network calculations.
  • the present invention also provides a computer-readable storage medium storing a computer program.
  • the computer program is executed by a processor, the acceleration method of the convolutional neural network calculation is implemented. The steps described.
  • the address generator is controlled to convert an address generator instruction read from a configuration register into a data address where image data is stored in a data memory, and the data memory is controlled from Read the image data from the data address, and input the image data into the input shift register, and control the input shift register to input the image data into the convolutional neural network for neuron calculation to obtain the corresponding feature map data.
  • the feature map data is input into the output shift register, and the output shift register is controlled to input the received feature map data into the data memory for storage according to a preset image data memory storage method to accelerate the completion of the current convolution calculation. , Thereby improving the reusability of data and reducing the number of times to read data in memory, thereby increasing the speed of convolutional neural network calculation.
  • FIG. 1 is a flowchart for implementing a method for accelerating a convolutional neural network calculation according to a first embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an acceleration device for calculating a convolutional neural network according to a second embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of an acceleration device for calculating a convolutional neural network according to a third embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a computing device according to a fourth embodiment of the present invention.
  • FIG. 1 shows an implementation flow of a method for accelerating the calculation of a convolutional neural network according to the first embodiment of the present invention. For convenience of explanation, only parts related to the embodiment of the present invention are shown, and the details are as follows:
  • step S101 when receiving a request for convolution calculation of image data corresponding to a target image through a convolutional neural network, the control address generator reads a pre-stored address generator instruction from a configuration register.
  • the embodiments of the present invention are applicable to a data processing platform, device, or system, such as a personal computer, a server, and the like.
  • the control address generator When receiving a request to perform convolution calculation on the corresponding image data in the target image input by the user through a convolutional neural network, the control address generator reads a pre-stored address generator instruction from a configuration register, and the address generator according to the corresponding The address generator instruction performs the corresponding action.
  • the address generator instructions and the convolutional neural network configuration parameters are stored in the configuration register, where the convolutional neural network configuration parameters include the size of the convolution kernel, the step size, and the number of feature map pixels output by the convolution calculation, thereby Improves convenience when reading data.
  • the width, height, and number of image channels of the target image calculate the number of image pixels of the target image according to the width and height, and According to the number of image channels, each image channel value corresponding to each image pixel in the target image is obtained, and according to the number of image pixels, each image channel value corresponding to each image pixel is stored in the data memory in a continuous data address. Therefore, the convenience of data storage is improved, and the algorithm complexity of data reading is reduced.
  • the number of image channels of the RGB image is 3, which are respectively R channel, G channel, and B channel.
  • Each pixel of the RGB image is respectively composed of channel values corresponding to the three channels.
  • all pixels of the RGB image are stored in sequence with consecutive memory addresses.
  • 3 consecutive addresses are allocated for the 3 channel values corresponding to each pixel, respectively, that is, one is stored first.
  • the three channel values corresponding to the pixel are stored in three consecutive memory addresses, and then the memory address of the pixel is used to store the three channel values corresponding to the next pixel, and so on, until all pixels are stored.
  • the address stored in the next pixel is 0x04, 0x05, 0x06, and so on.
  • step S102 according to the address generator instruction, the address generator is controlled to output the data address of the image data stored in the data memory.
  • the address generator mainly converts the data read from the configuration register or the address generator instruction into a corresponding data address in the data memory, and then sends the generated data address to the data memory for data
  • the memory reads the corresponding image data.
  • a feature corresponding to the image data to be output calculated by the current convolution is obtained Map pixels
  • control the address generator to obtain the pixel range of the target image corresponding to the feature map pixels according to the feature map pixels, and continuously output the corresponding data address according to the pixel range, so as to realize the continuous feature map pixel data according to the output
  • the data addresses of the continuous data memory mapped to the target image pixels improve the data reusability and reduce the number of times the memory data is read.
  • the address generator not only generates a data address, but also generates a neuron left starting point line (Neuron Left) corresponding to the convolutional neural network. Start Point Row) and other parameters corresponding to the starting data memory address are passed to the process element through another data line. Element (PE), which enables the PE to input the shift register synchronously, thereby completing the operation of the convolution calculation and improving the speed of the convolution calculation.
  • PE neuron left starting point line
  • step S103 the control data memory reads image data from a data address output by the address generator, and inputs the read image data into an input shift register.
  • the data memory reads the corresponding image data according to the data address output by the address generator, and then inputs the read image data into the input shift register in a parallel or serial manner.
  • step S104 the control input shift register inputs the received image data into the convolutional neural network for neuron calculation, obtains corresponding feature map data, and inputs the feature map data into the output shift register.
  • the image data input to the input shift register may be shifted right or left bit by bit sequentially under the effect of a shift pulse, and the input shift register is controlled to input / output the image data in parallel or serially.
  • / Output or parallel input, serial output or serial input, parallel output input-output mode output to the convolutional neural network through the calculation of each neuron in the convolutional neural network, to obtain the corresponding feature map data, and then control the volume Convolutional neural network inputs the calculated feature map data into the output shift register, where the convolutional neural network consists of many independent neurons (for example, model neurons, data selector neurons, activated neurons, convolution pools Neurons, etc.), the convolutional neural network performs different neuron calculations based on the different image data received.
  • step S105 the control output shift register inputs the received feature map data into the data memory for storage according to a preset image data memory storage mode, so as to accelerate the completion of the current convolution calculation.
  • the feature map data in the output shift register can be sequentially shifted right or left bit by bit under the effect of the shift pulse.
  • the feature map data can be input / output in parallel or serial input / output. It can also input in parallel, serial output, or serial input, parallel output.
  • the output shift register stores the received feature map data into the data memory in a preset image data memory storage mode to accelerate the completion of the current volume.
  • Product calculation, and the feature map data stored in the data memory is read into the processor in the way described in steps S101 to S104 in the next layer of neural network calculation. In this way, multiple layers of nerves can be completed quickly and efficiently. Forward derivation of the network.
  • control The output shift register When controlling the output shift register to input the received feature map data into the data memory for storage according to a preset image data memory storage method, preferably, according to the number of image channels of the target image and the number of pixels of the feature map data, control The output shift register sequentially stores each image channel value corresponding to each pixel of the feature map data in the data memory with continuous data addresses, thereby improving the convenience of data storage and reducing the complexity of the algorithm when reading data. degree.
  • the control address generator converts the address generator instruction read from the configuration register into a data address where the image data is stored in the data memory. , Control the data memory to read the image data from the data address, and input the image data into the input shift register, and control the input shift register to input the image data into the convolutional neural network for neuron calculation to obtain the corresponding The feature map data and input the feature map data into the output shift register, and control the output shift register to input the received feature map data into the data memory for storage according to a preset image data memory storage method to accelerate the completion
  • the current convolution calculations improve the reusability of the data and reduce the number of times the memory data is read, thereby increasing the speed of the convolutional neural network calculation.
  • FIG. 2 shows the structure of a device for accelerating the calculation of a convolutional neural network provided in Embodiment 2 of the present invention. For ease of description, only parts related to the embodiment of the present invention are shown, including:
  • the instruction reading unit 21 is configured to control the address generator to read a pre-stored address generator instruction from a configuration register when receiving a request for performing convolution calculation on image data corresponding to a target image through a convolutional neural network.
  • the embodiments of the present invention are applicable to a data processing platform, device, or system, such as a personal computer, a server, and the like.
  • the control address generator When receiving a request to perform convolution calculation on the corresponding image data in the target image input by the user through a convolutional neural network, the control address generator reads a pre-stored address generator instruction from a configuration register, and the address generator according to the corresponding The address generator instruction performs the corresponding action.
  • the data address output unit 22 is configured to control the address generator to output the data address of the image data stored in the data memory according to the address generator instruction.
  • the address generator mainly converts the data read from the configuration register or the address generator instruction into a corresponding data address in the data memory, and then sends the generated data address to the data memory for data
  • the memory reads the corresponding image data.
  • the address generator not only generates a data address, but also generates a neuron left starting point line (Neuron Left) corresponding to the convolutional neural network. Start Point Row) and other parameters corresponding to the starting data memory address are passed to the process element through another data line. Element (PE), which enables the PE to input the shift register synchronously, thereby completing the operation of the convolution calculation and improving the speed of the convolution calculation.
  • PE neuron left starting point line
  • the image data reading unit 23 is configured to control the data memory to read image data from a data address output by the address generator, and input the read image data into an input shift register.
  • the data memory reads the corresponding image data according to the data address output by the address generator, and then inputs the read image data into the input shift register in a parallel or serial manner.
  • the neuron calculation unit 24 is configured to control an input shift register to input the received image data into a convolutional neural network for neuron calculation, obtain corresponding feature map data, and input the feature map data into an output shift register. .
  • the image data input to the input shift register may be shifted right or left bit by bit sequentially under the effect of a shift pulse, and the input shift register is controlled to input / output the image data in parallel or serially.
  • / Output or parallel input, serial output or serial input, parallel output input-output mode output to the convolutional neural network through the calculation of each neuron in the convolutional neural network, to obtain the corresponding feature map data, and then control the volume Convolutional neural network inputs the calculated feature map data into the output shift register, where the convolutional neural network consists of many independent neurons (for example, model neurons, data selector neurons, activated neurons, convolution pools Neurons, etc.), the convolutional neural network performs different neuron calculations based on the different image data received.
  • the feature map storage unit 25 is configured to control the output shift register to input the received feature map data into the data memory for storage according to a preset image data memory storage mode, so as to accelerate the completion of the current convolution calculation.
  • the feature map data in the output shift register can be sequentially shifted right or left bit by bit under the effect of the shift pulse.
  • the feature map data can be input / output in parallel or serial input / output. It can also input in parallel, serial output, or serial input, parallel output.
  • the output shift register stores the received feature map data into the data memory in a preset image data memory storage mode to accelerate the completion of the current volume.
  • Product calculation, and these feature map data stored in the data memory are read into the processor in the way described by the instruction reading unit 21 to the neuron calculation unit 24 in the next layer of neural network calculation. Efficiently complete the forward derivation of multilayer neural networks.
  • control The output shift register When controlling the output shift register to input the received feature map data into the data memory for storage according to a preset image data memory storage method, preferably, according to the number of image channels of the target image and the number of pixels of the feature map data, control The output shift register sequentially stores each image channel value corresponding to each pixel of the feature map data in the data memory with continuous data addresses, thereby improving the convenience of data storage and reducing the complexity of the algorithm when reading data. degree.
  • each unit of the acceleration device of the convolutional neural network calculation may be implemented by corresponding hardware or software units.
  • Each unit may be an independent software and hardware unit, or may be integrated into one software and hardware unit.
  • FIG. 3 shows the structure of a device for accelerating the calculation of a convolutional neural network provided in Embodiment 3 of the present invention. For convenience of explanation, only parts related to the embodiment of the present invention are shown, including:
  • a parameter setting storage unit 31 is configured to set an address generator instruction related to the address generator and a convolutional neural network configuration parameter related to the convolutional neural network, and store the address generator instruction and the convolutional neural network configuration parameter in Configuration register
  • a channel value acquisition unit 32 is configured to acquire the width, height, and number of image channels of the target image, calculate the number of image pixels of the target image according to the width and height, and acquire each image corresponding to each image pixel in the target image according to the number of image channels.
  • a data storage unit 33 is configured to sequentially store each image channel value corresponding to each image pixel point in the data memory according to the number of image pixels in a continuous data address;
  • An instruction reading unit 34 is configured to control the address generator to read a pre-stored address generator instruction from a configuration register when a request for convolution calculation of image data corresponding to a target image is received through a convolutional neural network;
  • the data address output unit 35 is configured to control the address generator to output the data address of the image data stored in the data memory according to the address generator instruction;
  • the image data reading unit 36 is configured to control the data memory to read image data from a data address output by the address generator, and input the read image data into an input shift register;
  • the neuron calculation unit 37 is configured to control an input shift register to input the received image data into a convolutional neural network for neuron calculation, to obtain corresponding feature map data, and to input the feature map data into an output shift register. ;as well as
  • the feature map storage unit 38 is configured to control the output shift register to input the received feature map data into the data memory for storage according to a preset image data memory storage mode, so as to accelerate the completion of the current convolution calculation.
  • the data address output unit 35 includes:
  • a pixel obtaining unit 351, configured to obtain the feature map pixels corresponding to the image data to be output through the current convolution according to the convolutional neural network configuration parameters stored in the configuration register;
  • the address output subunit 352 is configured to control the address generator to obtain the pixel range of the target image corresponding to the pixels of the feature map according to the pixels of the feature map, and continuously output corresponding data addresses according to the pixel range.
  • each unit of the acceleration device of the convolutional neural network calculation may be implemented by corresponding hardware or software units.
  • Each unit may be an independent software and hardware unit, or may be integrated into one software and hardware unit.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • FIG. 4 shows the structure of a computing device provided in Embodiment 4 of the present invention. For ease of description, only parts related to the embodiment of the present invention are shown.
  • the computing device 4 includes a processor 40, a memory 41, and a computer program 42 stored in the memory 41 and executable on the processor 40.
  • the processor 40 executes the computer program 42, the steps in the embodiment of the acceleration method of the convolutional neural network calculation described above are implemented, for example, steps S101 to S105 shown in FIG.
  • the processor 40 executes the computer program 42, the functions of the units in the foregoing device embodiments are realized, for example, the functions of the units 21 to 25 shown in FIG. 2.
  • the control address generator converts the address generator instruction read from the configuration register into a data address where the image data is stored in the data memory. , Control the data memory to read the image data from the data address, and input the image data into the input shift register, and control the input shift register to input the image data into the convolutional neural network for neuron calculation to obtain the corresponding The feature map data and input the feature map data into the output shift register, and control the output shift register to input the received feature map data into the data memory for storage according to a preset image data memory storage method to accelerate the completion
  • the current convolution calculations improve the reusability of the data and reduce the number of times the memory data is read, thereby increasing the speed of the convolutional neural network calculation.
  • the computing device in the embodiment of the present invention may be a personal computer or a server.
  • the processor 40 in the computing device 4 executes the computer program 42 to implement the acceleration method of the convolutional neural network calculation, reference may be made to the description of the foregoing method embodiments, and details are not described herein again.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • a computer-readable storage medium stores a computer program.
  • the acceleration method of the convolutional neural network calculation in the embodiment described above is implemented. Steps, for example, steps S101 to S105 shown in FIG. 1.
  • the functions of the units in the foregoing device embodiments are implemented, for example, the functions of the units 21 to 25 shown in FIG. 2.
  • the control address generator converts the address generator instruction read from the configuration register into a data address where the image data is stored in the data memory. , Control the data memory to read the image data from the data address, and input the image data into the input shift register, and control the input shift register to input the image data into the convolutional neural network for neuron calculation to obtain the corresponding The feature map data and input the feature map data into the output shift register, and control the output shift register to input the received feature map data into the data memory for storage according to a preset image data memory storage method to accelerate the completion
  • the current convolution calculations improve the reusability of the data and reduce the number of times the memory data is read, thereby increasing the speed of the convolutional neural network calculation.
  • the computer-readable storage medium of the embodiment of the present invention may include any entity or device capable of carrying computer program code, a recording medium, for example, a memory such as a ROM / RAM, a magnetic disk, an optical disk, a flash memory, or the like.
  • a recording medium for example, a memory such as a ROM / RAM, a magnetic disk, an optical disk, a flash memory, or the like.

Abstract

一种卷积神经网络计算的加速方法、装置、设备及存储介质,该方法包括:当接收到通过卷积神经网络对目标图像对应的图像数据进行卷积计算的请求时,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令(S101);根据地址生成器指令,控制地址生成器输出图像数据在数据内存中存储的数据地址(S102);控制数据内存从地址生成器输出的数据地址中读取图像数据,并将读取到的图像数据输入到输入移位寄存器中(S103);控制输入移位寄存器将接收到的图像数据输入到卷积神经网络中进行神经元计算,得到对应的特征图数据,并将所述特征图数据输入到输出移位寄存器中(S104);控制输出移位寄存器根据预设的图像数据内存存储方式将接收到的特征图数据输入到所述数据内存中进行存储,以加速完成当前的所述卷积计算(S105)。通过该方法,提高了数据的复用性,并降低读取内存数据的次数,进而提高了卷积神经网络计算的速度。

Description

卷积神经网络计算的加速方法、装置、设备及存储介质 技术领域
本发明属于数据处理技术领域,尤其涉及一种卷积神经网络计算的加速方法、装置、设备及存储介质。
背景技术
近年来,由于大数据应用的普及与计算机硬件的进步,深度学习技术在计算机视觉、自然语言处理以及智能系统决策等领域被广泛用来对数据进行特征提取、分类以及递归运算,而卷积运算是一种非常重要的深度学习特征提取方法,现在主流的深度学习神经网络(例如,基于神经网络的手写自动识别系统LeNet错误!未定义书签。、AlexNet以及VGG-16)都是由一层层的卷积层堆叠而成的,随着神经网络层数的提高,使得分类的准确率得到提升,同时也造成了卷积运算算力消耗大的问题。
目前加速卷积计算的方法主要有两种,一种是通过剪裁神经网络的节点,将一些不重要的计算节点去掉,以达到减少计算量的目的,然而,这种方法明显的缺陷是由于人为的剪裁了计算节点,导致最后神经网络精度的降低,另一个缺点就是由于深度学习技术还在快速迭代过程中,在没有确切知道哪些计算节点比较重要的情况下,就采用剪裁网络的方法太过激进。另一种加速卷积网络的方法是将卷积计算的参数量化,比如将原来的float64类型的数据转化为低精度的float16或者float8精度的数据,虽然参数精度的降低,确实减小了计算量,然而还是无法避免神经网络精度降低的问题。
以上两种方法虽然在一定程度上减轻了卷积运算带来的算力消耗,而由于通用计算机硬件平台的计算能力和处理速度跟不上,因此,就需要设计出专用的卷积处理芯片,通过卷积处理芯片对图像内存数据进行读取,对于卷积处理芯片来说,大约80%的能量是消耗在数据的传输上,因此,如何对图像数据的内存存储进行优化是亟待解决的问题。
技术问题
本发明的目的在于提供一种卷积神经网络计算的加速方法、装置、设备及存储介质,旨在解决由于现有技术无法提供一种有效的卷积神经网络计算的加速方法,导致卷积神经网络的计算结果精度低的问题。
技术解决方案
一方面,本发明提供了一种卷积神经网络计算的加速方法,所述方法包括下述步骤:
当接收到通过卷积神经网络对目标图像对应的图像数据进行卷积计算的请求时,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令;
根据所述地址生成器指令,控制所述地址生成器输出所述图像数据在数据内存中存储的数据地址;
控制所述数据内存从所述地址生成器输出的所述数据地址中读取所述图像数据,并将读取到的所述图像数据输入到输入移位寄存器中;
控制所述输入移位寄存器将接收到的所述图像数据输入到所述卷积神经网络中进行神经元计算,得到对应的特征图数据,并将所述特征图数据输入到输出移位寄存器中;
控制所述输出移位寄存器根据预设的图像数据内存存储方式将接收到的所述特征图数据输入到所述数据内存中进行存储,以加速完成当前的所述卷积计算。
优选地,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令的步骤之前,所述方法还包括:
设置与所述地址生成器相关的地址生成器指令、以及与所述卷积神经网络相关的卷积神经网络配置参数,并将所述地址生成器指令和所述卷积神经网络配置参数存储在所述配置寄存器中。
优选地,控制所述地址生成器输出所述图像数据在数据内存中存储的数据地址的步骤,包括:
根据存储在所述配置寄存器中的所述卷积神经网络配置参数,得到通过当前的所述卷积计算待输出的所述图像数据对应的特征图像素点;
控制所述地址生成器根据所述特征图像素点获得与所述特征图像素点对应的所述目标图像的像素点范围,根据所述像素点范围连续输出对应的数据地址。
优选地,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令的步骤之前,所述方法还包括:
获取所述目标图像的宽度、高度和图像通道数,根据所述宽度和所述高度计算所述目标图像的图像像素点数,并根据所述图像通道数获取所述目标图像中每个图像像素点对应的各个图像通道值;
根据所述图像像素点数,依次将所述每个图像像素点对应的所述各个图像通道值以连续的数据地址在所述数据内存中进行存储。
另一方面,本发明提供了一种卷积神经网络计算的加速装置,所述装置包括:
指令读取单元,用于当接收到通过卷积神经网络对目标图像对应的图像数据进行卷积计算的请求时,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令;
数据地址输出单元,用于根据所述地址生成器指令,控制所述地址生成器输出所述图像数据在数据内存中存储的数据地址;
图像数据读取单元,用于控制所述数据内存从所述地址生成器输出的所述数据地址中读取所述图像数据,并将读取到的所述图像数据输入到输入移位寄存器中;
神经元计算单元,用于控制所述输入移位寄存器将接收到的所述图像数据输入到所述卷积神经网络中进行神经元计算,得到对应的特征图数据,并将所述特征图数据输入到输出移位寄存器中;以及
特征图存储单元,用于控制所述输出移位寄存器根据预设的图像数据内存存储方式将接收到的所述特征图数据输入到所述数据内存中进行存储,以加速完成当前的所述卷积计算。
优选地,所述数据地址输出单元包括:
像素点获取单元,用于根据存储在所述配置寄存器中的所述卷积神经网络配置参数,得到通过当前的所述卷积计算待输出的所述图像数据对应的特征图像素点;以及
地址输出子单元,用于控制所述地址生成器根据所述特征图像素点获得与所述特征图像素点对应的所述目标图像的像素点范围,根据所述像素点范围连续输出对应的数据地址。
优选地,所述装置还包括:
参数设置存储单元,用于设置与所述地址生成器相关的地址生成器指令、以及与所述卷积神经网络相关的卷积神经网络配置参数,并将所述地址生成器指令和所述卷积神经网络配置参数存储在所述配置寄存器中;
通道值获取单元,用于获取所述目标图像的宽度、高度和图像通道数,根据所述宽度和所述高度计算所述目标图像的图像像素点数,并根据所述图像通道数获取所述目标图像中每个图像像素点对应的各个图像通道值;以及
数据存储单元,用于根据所述图像像素点数,依次将所述每个图像像素点对应的所述各个图像通道值以连续的数据地址在所述数据内存中进行存储。
另一方面,本发明还提供了一种计算设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上述卷积神经网络计算的加速方法所述的步骤。
另一方面,本发明还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述卷积神经网络计算的加速方法所述的步骤。
有益效果
本发明根据对目标图像对应的图像数据进行卷积计算的请求,控制地址生成器将从配置寄存器中读取的地址生成器指令转化成图像数据在数据内存中存储的数据地址,控制数据内存从该数据地址中读取图像数据,并将该图像数据输入到输入移位寄存器中,控制输入移位寄存器将该图像数据输入到卷积神经网络中进行神经元计算,得到对应的特征图数据,并将特征图数据输入到输出移位寄存器中,控制输出移位寄存器根据预设的图像数据内存存储方式将接收到的特征图数据输入到数据内存中进行存储,以加速完成当前的卷积计算,从而提高了数据的复用性,并降低读取内存数据的次数,进而提高了卷积神经网络计算的速度。
附图说明
图1是本发明实施例一提供的卷积神经网络计算的加速方法的实现流程图;
图2是本发明实施例二提供的卷积神经网络计算的加速装置的结构示意图;
图3是本发明实施例三提供的卷积神经网络计算的加速装置的结构示意图;以及
图4是本发明实施例四提供的计算设备的结构示意图。
本发明的实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
以下结合具体实施例对本发明的具体实现进行详细描述:
实施例一:
图1示出了本发明实施例一提供的卷积神经网络计算的加速方法的实现流程,为了便于说明,仅示出了与本发明实施例相关的部分,详述如下:
在步骤S101中,当接收到通过卷积神经网络对目标图像对应的图像数据进行卷积计算的请求时,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令。
本发明实施例适用于数据处理平台、设备或系统,例如,个人计算机、服务器等。当接收到通过卷积神经网络对用户输入的目标图像中对应的图像数据进行卷积计算的请求时,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令,地址生成器根据相应的地址生成器指令执行相应的动作。
在控制地址生成器从配置寄存器中读取预先存储的地址生成器指令之前,优选地,设置与地址生成器相关的地址生成器指令、以及与卷积神经网络相关的卷积神经网络配置参数,并将地址生成器指令和卷积神经网络配置参数存储在配置寄存器中,其中,卷积神经网络配置参数包括卷积核大小、步长以及卷积计算输出的特征图像素点的个数,从而提高数据读取时的便利性。
在控制地址生成器从配置寄存器中读取预先存储的地址生成器指令之前,又一优选地,获取目标图像的宽度、高度和图像通道数,根据宽度和高度计算目标图像的图像像素点数,并根据图像通道数获取目标图像中每个图像像素点对应的各个图像通道值,根据图像像素点数,依次将每个图像像素点对应的各个图像通道值以连续的数据地址在数据内存中进行存储,从而提高数据存储时的便利性,进而降低数据读取时的算法复杂度。
作为示例地,RGB图像的图像通道数为3,分别为R通道、G通道以及B通道,RGB图像的每个像素点是分别由这3个通道对应的通道值构成,在通过本发明实施例进行存储的时候,以连续的内存地址依次对RGB图像的所有像素点进行存储,在存储时,为每个像素点对应的3个通道值分配3个连续的地址分别进行存储,即先将一个像素点对应的3个通道值分别存储在3个连续的内存地址中,再接着该像素点的内存地址,存储后一个像素点对应的3个通道值,以此类推,直至将所有像素点都进行存储,例如,一个像素点对应的3个通道值存储在0x01、0x02、0x03内存地址中,则下一个像素点存储的地址为0x04、0x05、0x06,以此类推。
在步骤S102中,根据地址生成器指令,控制地址生成器输出图像数据在数据内存中存储的数据地址。
在本发明实施例中,地址生成器主要是将从配置寄存器里读取的数据或地址生成器指令转化成数据内存里对应的数据地址,然后将产生的数据地址发送给数据内存,以供数据内存读取相应的图像数据。
在控制地址生成器输出图像数据在数据内存中存储的数据地址时,优选地,根据存储在配置寄存器中的卷积神经网络配置参数,得到通过当前的卷积计算待输出的图像数据对应的特征图像素点,控制地址生成器根据特征图像素点获得与特征图像素点对应的目标图像的像素点范围,根据像素点范围连续输出对应的数据地址,实现根据输出的连续的特征图像素点数据映射到目标图像像素的连续的数据内存的数据地址,从而提高数据的复用性,降低读取内存数据的次数。
在本发明实施例中,优选地,地址生成器不仅生成数据地址,还产生卷积神经网络对应的神经元左起始点行(Neuron Left Start Point Row)对应的起始数据内存的地址等参数,通过另外一条数据线传递给流程元素(Process Element,PE),使得PE同步上输入移位寄存器,从而完成卷积计算的操作,提高卷积计算的速度。
在步骤S103中,控制数据内存从地址生成器输出的数据地址中读取图像数据,并将读取到的图像数据输入到输入移位寄存器中。
在本发明实施例中,数据内存根据地址生成器输出的数据地址读取相应的图像数据,再将读取到的图像数据通过并行或者串行的方式输入到输入移位寄存器中。
在步骤S104中,控制输入移位寄存器将接收到的图像数据输入到卷积神经网络中进行神经元计算,得到对应的特征图数据,并将特征图数据输入到输出移位寄存器中。
在本发明实施例中,输入到输入移位寄存器中的图像数据可以在移位脉冲作用下依次逐位右移或左移,控制输入移位寄存器将图像数据以并行输入/输出或者串行输入/输出或者并行输入、串行输出或者串行输入、并行输出的输入输出方式输出到卷积神经网络,通过该卷积神经网络中各个神经元的计算,得到对应的特征图数据,再控制卷积神经网络将计算得到的特征图数据输入到输出移位寄存器中,其中,卷积神经网络由许多独立的神经元(例如,模式神经元、数据选择器神经元、激活神经元、卷积池化神经元等)组成,卷积神经网络根据接收到的不同的图像数据进行不同的神经元计算。
在步骤S105中,控制输出移位寄存器根据预设的图像数据内存存储方式将接收到的特征图数据输入到数据内存中进行存储,以加速完成当前的卷积计算。
在本发明实施例中,输出移位寄存器中的特征图数据可以在移位脉冲作用下依次逐位右移或左移,特征图数据既可以并行输入/输出,也可以串行输入/输出,还可以并行输入、串行输出或者串行输入、并行输出,输出移位寄存器将接收到的特征图数据,以预设的图像数据内存存储方式存入到数据内存中,以加速完成当前的卷积计算,而这些存入数据内存的特征图数据在下一层神经网络计算中再以步骤S101~步骤S104介绍的方式读入处理器中,如此循环往复,就能快速、高效地完成多层神经网络的前向推导。
在控制输出移位寄存器根据预设的图像数据内存存储方式将接收到的特征图数据输入到数据内存中进行存储时,优选地,根据目标图像的图像通道数以及特征图数据的像素点数,控制输出移位寄存器依次将特征图数据的每个像素点对应的各个图像通道值以连续的数据地址在数据内存中进行存储,从而提高数据存储时的便利性,进而降低数据读取时的算法复杂度。
在本发明实施例中,根据对目标图像对应的图像数据进行卷积计算的请求,控制地址生成器将从配置寄存器中读取的地址生成器指令转化成图像数据在数据内存中存储的数据地址,控制数据内存从该数据地址中读取图像数据,并将该图像数据输入到输入移位寄存器中,控制输入移位寄存器将该图像数据输入到卷积神经网络中进行神经元计算,得到对应的特征图数据,并将特征图数据输入到输出移位寄存器中,控制输出移位寄存器根据预设的图像数据内存存储方式将接收到的特征图数据输入到数据内存中进行存储,以加速完成当前的卷积计算,从而提高了数据的复用性,并降低读取内存数据的次数,进而提高了卷积神经网络计算的速度。
实施例二:
图2示出了本发明实施例二提供的卷积神经网络计算的加速装置的结构,为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:
指令读取单元21,用于当接收到通过卷积神经网络对目标图像对应的图像数据进行卷积计算的请求时,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令。
本发明实施例适用于数据处理平台、设备或系统,例如,个人计算机、服务器等。当接收到通过卷积神经网络对用户输入的目标图像中对应的图像数据进行卷积计算的请求时,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令,地址生成器根据相应的地址生成器指令执行相应的动作。
数据地址输出单元22,用于根据地址生成器指令,控制地址生成器输出图像数据在数据内存中存储的数据地址。
在本发明实施例中,地址生成器主要是将从配置寄存器里读取的数据或地址生成器指令转化成数据内存里对应的数据地址,然后将产生的数据地址发送给数据内存,以供数据内存读取相应的图像数据。
在本发明实施例中,优选地,地址生成器不仅生成数据地址,还产生卷积神经网络对应的神经元左起始点行(Neuron Left Start Point Row)对应的起始数据内存的地址等参数,通过另外一条数据线传递给流程元素(Process Element,PE),使得PE同步上输入移位寄存器,从而完成卷积计算的操作,提高卷积计算的速度。
图像数据读取单元23,用于控制数据内存从地址生成器输出的数据地址中读取图像数据,并将读取到的图像数据输入到输入移位寄存器中。
在本发明实施例中,数据内存根据地址生成器输出的数据地址读取相应的图像数据,再将读取到的图像数据通过并行或者串行的方式输入到输入移位寄存器中。
神经元计算单元24,用于控制输入移位寄存器将接收到的图像数据输入到卷积神经网络中进行神经元计算,得到对应的特征图数据,并将特征图数据输入到输出移位寄存器中。
在本发明实施例中,输入到输入移位寄存器中的图像数据可以在移位脉冲作用下依次逐位右移或左移,控制输入移位寄存器将图像数据以并行输入/输出或者串行输入/输出或者并行输入、串行输出或者串行输入、并行输出的输入输出方式输出到卷积神经网络,通过该卷积神经网络中各个神经元的计算,得到对应的特征图数据,再控制卷积神经网络将计算得到的特征图数据输入到输出移位寄存器中,其中,卷积神经网络由许多独立的神经元(例如,模式神经元、数据选择器神经元、激活神经元、卷积池化神经元等)组成,卷积神经网络根据接收到的不同的图像数据进行不同的神经元计算。
特征图存储单元25,用于控制输出移位寄存器根据预设的图像数据内存存储方式将接收到的特征图数据输入到数据内存中进行存储,以加速完成当前的卷积计算。
在本发明实施例中,输出移位寄存器中的特征图数据可以在移位脉冲作用下依次逐位右移或左移,特征图数据既可以并行输入/输出,也可以串行输入/输出,还可以并行输入、串行输出或者串行输入、并行输出,输出移位寄存器将接收到的特征图数据,以预设的图像数据内存存储方式存入到数据内存中,以加速完成当前的卷积计算,而这些存入数据内存的特征图数据在下一层神经网络计算中再以指令读取单元21~神经元计算单元24介绍的方式读入处理器中,如此循环往复,就能快速、高效地完成多层神经网络的前向推导。
在控制输出移位寄存器根据预设的图像数据内存存储方式将接收到的特征图数据输入到数据内存中进行存储时,优选地,根据目标图像的图像通道数以及特征图数据的像素点数,控制输出移位寄存器依次将特征图数据的每个像素点对应的各个图像通道值以连续的数据地址在数据内存中进行存储,从而提高数据存储时的便利性,进而降低数据读取时的算法复杂度。
在本发明实施例中,卷积神经网络计算的加速装置的各单元可由相应的硬件或软件单元实现,各单元可以为独立的软、硬件单元,也可以集成为一个软、硬件单元,在此不用以限制本发明。
实施例三:
图3示出了本发明实施例三提供的卷积神经网络计算的加速装置的结构,为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:
参数设置存储单元31,用于设置与地址生成器相关的地址生成器指令、以及与卷积神经网络相关的卷积神经网络配置参数,并将地址生成器指令和卷积神经网络配置参数存储在配置寄存器中;
通道值获取单元32,用于获取目标图像的宽度、高度和图像通道数,根据宽度和高度计算目标图像的图像像素点数,并根据图像通道数获取目标图像中每个图像像素点对应的各个图像通道值;
数据存储单元33,用于根据图像像素点数,依次将每个图像像素点对应的各个图像通道值以连续的数据地址在数据内存中进行存储;
指令读取单元34,用于当接收到通过卷积神经网络对目标图像对应的图像数据进行卷积计算的请求时,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令;
数据地址输出单元35,用于根据地址生成器指令,控制地址生成器输出图像数据在数据内存中存储的数据地址;
图像数据读取单元36,用于控制数据内存从地址生成器输出的数据地址中读取图像数据,并将读取到的图像数据输入到输入移位寄存器中;
神经元计算单元37,用于控制输入移位寄存器将接收到的图像数据输入到卷积神经网络中进行神经元计算,得到对应的特征图数据,并将特征图数据输入到输出移位寄存器中;以及
特征图存储单元38,用于控制输出移位寄存器根据预设的图像数据内存存储方式将接收到的特征图数据输入到数据内存中进行存储,以加速完成当前的卷积计算。
优选地,数据地址输出单元35包括:
像素点获取单元351,用于根据存储在配置寄存器中的卷积神经网络配置参数,得到通过当前的卷积计算待输出的图像数据对应的特征图像素点;以及
地址输出子单元352,用于控制地址生成器根据特征图像素点获得与特征图像素点对应的目标图像的像素点范围,根据像素点范围连续输出对应的数据地址。
在本发明实施例中,卷积神经网络计算的加速装置的各单元可由相应的硬件或软件单元实现,各单元可以为独立的软、硬件单元,也可以集成为一个软、硬件单元,在此不用以限制本发明。具体地,各单元的实施方式可参考前述实施例一的描述,在此不再赘述。
实施例四:
图4示出了本发明实施例四提供的计算设备的结构,为了便于说明,仅示出了与本发明实施例相关的部分。
本发明实施例的计算设备4包括处理器40、存储器41以及存储在存储器41中并可在处理器40上运行的计算机程序42。该处理器40执行计算机程序42时实现上述卷积神经网络计算的加速方法实施例中的步骤,例如图1所示的步骤S101至S105。或者,处理器40执行计算机程序42时实现上述各装置实施例中各单元的功能,例如图2所示单元21至25的功能。
在本发明实施例中,根据对目标图像对应的图像数据进行卷积计算的请求,控制地址生成器将从配置寄存器中读取的地址生成器指令转化成图像数据在数据内存中存储的数据地址,控制数据内存从该数据地址中读取图像数据,并将该图像数据输入到输入移位寄存器中,控制输入移位寄存器将该图像数据输入到卷积神经网络中进行神经元计算,得到对应的特征图数据,并将特征图数据输入到输出移位寄存器中,控制输出移位寄存器根据预设的图像数据内存存储方式将接收到的特征图数据输入到数据内存中进行存储,以加速完成当前的卷积计算,从而提高了数据的复用性,并降低读取内存数据的次数,进而提高了卷积神经网络计算的速度。
本发明实施例的计算设备可以为个人计算机、服务器。该计算设备4中处理器40执行计算机程序42时实现卷积神经网络计算的加速方法时实现的步骤可参考前述方法实施例的描述,在此不再赘述。
实施例五:
在本发明实施例中,提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述卷积神经网络计算的加速方法实施例中的步骤,例如,图1所示的步骤S101至S105。或者,该计算机程序被处理器执行时实现上述各装置实施例中各单元的功能,例如图2所示单元21至25的功能。
在本发明实施例中,根据对目标图像对应的图像数据进行卷积计算的请求,控制地址生成器将从配置寄存器中读取的地址生成器指令转化成图像数据在数据内存中存储的数据地址,控制数据内存从该数据地址中读取图像数据,并将该图像数据输入到输入移位寄存器中,控制输入移位寄存器将该图像数据输入到卷积神经网络中进行神经元计算,得到对应的特征图数据,并将特征图数据输入到输出移位寄存器中,控制输出移位寄存器根据预设的图像数据内存存储方式将接收到的特征图数据输入到数据内存中进行存储,以加速完成当前的卷积计算,从而提高了数据的复用性,并降低读取内存数据的次数,进而提高了卷积神经网络计算的速度。
本发明实施例的计算机可读存储介质可以包括能够携带计算机程序代码的任何实体或装置、记录介质,例如,ROM/RAM、磁盘、光盘、闪存等存储器。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种卷积神经网络计算的加速方法,其特征在于,所述方法包括下述步骤:
    当接收到通过卷积神经网络对目标图像对应的图像数据进行卷积计算的请求时,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令;
    根据所述地址生成器指令,控制所述地址生成器输出所述图像数据在数据内存中存储的数据地址;
    控制所述数据内存从所述地址生成器输出的所述数据地址中读取所述图像数据,并将读取到的所述图像数据输入到输入移位寄存器中;
    控制所述输入移位寄存器将接收到的所述图像数据输入到所述卷积神经网络中进行神经元计算,得到对应的特征图数据,并将所述特征图数据输入到输出移位寄存器中;
    控制所述输出移位寄存器根据预设的图像数据内存存储方式将接收到的所述特征图数据输入到所述数据内存中进行存储,以加速完成当前的所述卷积计算。
  2. 如权利要求1所述的方法,其特征在于,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令的步骤之前,所述方法还包括:
    设置与所述地址生成器相关的地址生成器指令、以及与所述卷积神经网络相关的卷积神经网络配置参数,并将所述地址生成器指令和所述卷积神经网络配置参数存储在所述配置寄存器中。
  3. 如权利要求1和2所述的方法,其特征在于,控制所述地址生成器输出所述图像数据在数据内存中存储的数据地址的步骤,包括:
    根据存储在所述配置寄存器中的所述卷积神经网络配置参数,得到通过当前的所述卷积计算待输出的所述图像数据对应的特征图像素点;
    控制所述地址生成器根据所述特征图像素点获得与所述特征图像素点对应的所述目标图像的像素点范围,根据所述像素点范围连续输出对应的数据地址。
  4. 如权利要求1所述的方法,其特征在于,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令的步骤之前,所述方法还包括:
    获取所述目标图像的宽度、高度和图像通道数,根据所述宽度和所述高度计算所述目标图像的图像像素点数,并根据所述图像通道数获取所述目标图像中每个图像像素点对应的各个图像通道值;
    根据所述图像像素点数,依次将所述每个图像像素点对应的所述各个图像通道值以连续的数据地址在所述数据内存中进行存储。
  5. 一种卷积神经网络计算的加速装置,其特征在于,所述装置包括:
    指令读取单元,用于当接收到通过卷积神经网络对目标图像对应的图像数据进行卷积计算的请求时,控制地址生成器从配置寄存器中读取预先存储的地址生成器指令;
    数据地址输出单元,用于根据所述地址生成器指令,控制所述地址生成器输出所述图像数据在数据内存中存储的数据地址;
    图像数据读取单元,用于控制所述数据内存从所述地址生成器输出的所述数据地址中读取所述图像数据,并将读取到的所述图像数据输入到输入移位寄存器中;
    神经元计算单元,用于控制所述输入移位寄存器将接收到的所述图像数据输入到所述卷积神经网络中进行神经元计算,得到对应的特征图数据,并将所述特征图数据输入到输出移位寄存器中;以及
    特征图存储单元,用于控制所述输出移位寄存器根据预设的图像数据内存存储方式将接收到的所述特征图数据输入到所述数据内存中进行存储,以加速完成当前的所述卷积计算。
  6. 如权利要求5所述的装置,其特征在于,所述装置还包括:
    参数设置存储单元,用于设置与所述地址生成器相关的地址生成器指令、以及与所述卷积神经网络相关的卷积神经网络配置参数,并将所述地址生成器指令和所述卷积神经网络配置参数存储在所述配置寄存器中。
  7. 如权利要求5和6所述的装置,其特征在于,所述数据地址输出单元包括:
    像素点获取单元,用于根据存储在所述配置寄存器中的所述卷积神经网络配置参数,得到通过当前的所述卷积计算待输出的所述图像数据对应的特征图像素点;以及
    地址输出子单元,用于控制所述地址生成器根据所述特征图像素点获得与所述特征图像素点对应的所述目标图像的像素点范围,根据所述像素点范围连续输出对应的数据地址。
  8. 如权利要求5所述的装置,其特征在于,所述装置还包括:
    通道值获取单元,用于获取所述目标图像的宽度、高度和图像通道数,根据所述宽度和所述高度计算所述目标图像的图像像素点数,并根据所述图像通道数获取所述目标图像中每个图像像素点对应的各个图像通道值;以及
    数据存储单元,用于根据所述图像像素点数,依次将所述每个图像像素点对应的所述各个图像通道值以连续的数据地址在所述数据内存中进行存储。
  9. 一种计算设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至4任一项所述方法的步骤。
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至4任一项所述方法的步骤。
PCT/CN2018/104901 2018-09-10 2018-09-10 卷积神经网络计算的加速方法、装置、设备及存储介质 WO2020051751A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/104901 WO2020051751A1 (zh) 2018-09-10 2018-09-10 卷积神经网络计算的加速方法、装置、设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/104901 WO2020051751A1 (zh) 2018-09-10 2018-09-10 卷积神经网络计算的加速方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020051751A1 true WO2020051751A1 (zh) 2020-03-19

Family

ID=69776973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/104901 WO2020051751A1 (zh) 2018-09-10 2018-09-10 卷积神经网络计算的加速方法、装置、设备及存储介质

Country Status (1)

Country Link
WO (1) WO2020051751A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686901A (zh) * 2021-03-11 2021-04-20 北京小白世纪网络科技有限公司 基于深度神经网络的us-ct图像分割方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010064728A1 (en) * 2008-12-04 2010-06-10 Canon Kabushiki Kaisha Convolution operation circuit and object recognition apparatus
CN106250103A (zh) * 2016-08-04 2016-12-21 东南大学 一种卷积神经网络循环卷积计算数据重用的系统
CN106779060A (zh) * 2017-02-09 2017-05-31 武汉魅瞳科技有限公司 一种适于硬件设计实现的深度卷积神经网络的计算方法
CN107657581A (zh) * 2017-09-28 2018-02-02 中国人民解放军国防科技大学 一种卷积神经网络cnn硬件加速器及加速方法
JP2018073103A (ja) * 2016-10-28 2018-05-10 キヤノン株式会社 演算回路、その制御方法及びプログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010064728A1 (en) * 2008-12-04 2010-06-10 Canon Kabushiki Kaisha Convolution operation circuit and object recognition apparatus
CN106250103A (zh) * 2016-08-04 2016-12-21 东南大学 一种卷积神经网络循环卷积计算数据重用的系统
JP2018073103A (ja) * 2016-10-28 2018-05-10 キヤノン株式会社 演算回路、その制御方法及びプログラム
CN106779060A (zh) * 2017-02-09 2017-05-31 武汉魅瞳科技有限公司 一种适于硬件设计实现的深度卷积神经网络的计算方法
CN107657581A (zh) * 2017-09-28 2018-02-02 中国人民解放军国防科技大学 一种卷积神经网络cnn硬件加速器及加速方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686901A (zh) * 2021-03-11 2021-04-20 北京小白世纪网络科技有限公司 基于深度神经网络的us-ct图像分割方法及装置
CN112686901B (zh) * 2021-03-11 2021-08-24 北京小白世纪网络科技有限公司 基于深度神经网络的us-ct图像分割方法及装置

Similar Documents

Publication Publication Date Title
KR102048390B1 (ko) 심층 신경망 기반의 인식 장치, 트레이닝 장치, 및 이들의 방법
CN109460813B (zh) 卷积神经网络计算的加速方法、装置、设备及存储介质
US20160283842A1 (en) Neural network and method of neural network training
WO2019201042A1 (zh) 图像对象识别方法和装置、存储介质及电子装置
CN109670574B (zh) 用于同时执行激活和卷积运算的方法和装置及其学习方法和学习装置
US20220083857A1 (en) Convolutional neural network operation method and device
CN109376852A (zh) 运算装置及运算方法
CN114863539A (zh) 一种基于特征融合的人像关键点检测方法及系统
CN111709516A (zh) 神经网络模型的压缩方法及压缩装置、存储介质、设备
CN107784360A (zh) 步进式卷积神经网络剪枝压缩方法
KR101916675B1 (ko) 사용자 인터랙션을 위한 제스처 인식 방법 및 시스템
WO2020038462A1 (zh) 基于深度学习的舌体分割装置、方法及存储介质
WO2020051751A1 (zh) 卷积神经网络计算的加速方法、装置、设备及存储介质
CN110502975B (zh) 一种行人重识别的批量处理系统
CN114091648A (zh) 基于卷积神经网络的图像分类方法、装置及卷积神经网络
US20210397953A1 (en) Deep neural network operation method and apparatus
CN116188785A (zh) 运用弱标签的PolarMask老人轮廓分割方法
CN116452599A (zh) 基于轮廓的图像实例分割方法及系统
CN111508024A (zh) 一种基于深度学习估计机器人位姿的方法
CN112446461A (zh) 一种神经网络模型训练方法及装置
WO2022111231A1 (zh) Cnn训练方法、电子设备和计算机可读存储介质
CN113205102B (zh) 一种基于忆阻神经网络的车辆标志识别方法
KR102537207B1 (ko) 머신 러닝에 기반한 이미지 처리 방법 및 장치
Mo et al. A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment
TW202117609A (zh) 具有快速逐點迴旋的高效推斷

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18933656

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18933656

Country of ref document: EP

Kind code of ref document: A1