WO2020014893A1 - 反卷积实现方法及相关产品 - Google Patents

反卷积实现方法及相关产品 Download PDF

Info

Publication number
WO2020014893A1
WO2020014893A1 PCT/CN2018/096137 CN2018096137W WO2020014893A1 WO 2020014893 A1 WO2020014893 A1 WO 2020014893A1 CN 2018096137 W CN2018096137 W CN 2018096137W WO 2020014893 A1 WO2020014893 A1 WO 2020014893A1
Authority
WO
WIPO (PCT)
Prior art keywords
deconvolution
data
kernel
buffer
output
Prior art date
Application number
PCT/CN2018/096137
Other languages
English (en)
French (fr)
Inventor
刘双龙
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to PCT/CN2018/096137 priority Critical patent/WO2020014893A1/zh
Priority to CN201880004281.4A priority patent/CN110088777B/zh
Publication of WO2020014893A1 publication Critical patent/WO2020014893A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • This application relates to the field of computer and artificial intelligence technologies, and in particular, to a deconvolution implementation method and related products.
  • deconvolution layers are more and more applied to the algorithm development and application of deep convolutional networks.
  • the function of the convolution operation is similar to the encoder in neural networks, which is used to extract low-dimensional features from high-dimensional data.
  • Deconvolution is usually used to map low-dimensional features into high-dimensional inputs, which is equivalent to a decoder, which realizes the reconstruction of low-dimensional vectors to high-dimensional vectors.
  • Deconvolution is mainly used in adversarial neural networks, and it plays an important role in image segmentation, image generation, and edge detection.
  • the existing deconvolution operation is based on the calculation of adding zero to the input data, so the calculation amount is large and the energy consumption is large.
  • the embodiments of the present application provide a deconvolution implementation method and related products.
  • the deconvolution operation is not implemented through a zero-add operation, and the calculation amount is always reduced, and power consumption is reduced.
  • an embodiment of the present application provides a method for implementing deconvolution.
  • the method includes the following steps:
  • the element position of the initial output data has multiple products of the multiplication operation value, performing a summing operation on the multiple products of the element position to obtain a final value of the element position;
  • i, k, and s are all integers greater than or equal to 1, and p is an integer greater than or equal to 0.
  • a hardware architecture for performing deconvolution of the method described in the first aspect includes: an input data cache, a deconvolution kernel cache, a deconvolution kernel, a partial result cache, and a selector. , Accumulator and initial output result buffer;
  • the deconvolution operation kernel includes: k adders A, K multipliers M, and (k) * (k-s) buffers;
  • K multipliers M are connected to each other, k adders A are connected to each other, k adders A are respectively connected to K multipliers M, and (k) * (ks) caches include: k sets of caches, each set of caches Including ks buffers, k groups of buffers are connected to K multipliers M, respectively;
  • the input data buffer and the deconvolution kernel buffer are connected to K multipliers M and input the input data and the deconvolution kernel data; the k adders A are respectively connected to the input end of the partial result buffer and the input end of the selector. Part of the results
  • the buffer output end is connected to k adders A, and the partial output buffer output end is also connected to the input end of the selector, the output end of the selector is connected to the input end of the accumulator, and the output end of the accumulator is connected to the initial output data buffer.
  • a computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes a computer to execute the method as provided in the first aspect.
  • a computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the method provided by the first aspect.
  • Figure 1 is a schematic diagram of a deconvolution operation implemented on software.
  • FIG. 2 is a schematic flowchart of a convolution implementation method provided by the present application.
  • FIG. 3 is an example implementation diagram of a deconvolution operation provided by 2 * 2 input data and a 3 * 3 deconvolution kernel provided in the present application.
  • FIG. 4 is a structural diagram of a deconvolution hardware architecture provided by the present application.
  • an embodiment herein means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application.
  • the appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are they independent or alternative embodiments that are mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
  • the electronic device in this application may include: a server, a smart camera device, a smart phone (such as an Android phone, an iOS phone, a Windows Phone phone, etc.), a tablet computer, a handheld computer, a laptop computer, and a mobile Internet device (MID, Mobile Internet Devices) Or wearable devices, etc.
  • the above electronic devices are merely examples, not exhaustive, and include but are not limited to the above electronic devices.
  • the above electronic devices are referred to as user equipment (UE), Terminal or electronic device.
  • UE user equipment
  • Terminal the above-mentioned user equipment is not limited to the above-mentioned realization form, and may include, for example, a smart vehicle terminal, a computer device, and the like.
  • FIG. 1 is a data diagram of a deconvolution operation.
  • the input data here uses 3 * 3 data as an example.
  • the size of the deconvolution kernel is 3 * 3.
  • the input adjacent data is filled with s-1 zeros, and the boundary is supplemented with kp-1 zeros.
  • the 7 * 7 data block is subjected to a convolution operation with a deconvolution kernel with a step size of 1 to obtain a 7 * 7 output image; finally, the data on the boundary is cropped from the 7 * 7 image (p rows, p columns) , Get 6 * 6 output results.
  • the required zero-fill operation is not suitable for hardware implementation, and the execution efficiency on the FPGA is relatively low.
  • the memory requirement for input image storage is also increased.
  • a large number of paddings of 0 cause most of the multiply-accumulate operations to be invalid operations, which greatly reduces the computing efficiency, resulting in a decrease in the utilization of hardware computing resources
  • low versatility when the deconvolution operation is converted to a convolution operation, due to the input image Irregularities will cause changes in the data read mode (such as different deconvolution kernels, sliding steps, etc.) when operating deconvolution layers with different parameters, resulting in the need for different calculation modules on the hardware to implement different layers Operation, the complexity of hardware design increases.
  • FIG. 2 provides a method for implementing deconvolution, which is executed by a terminal.
  • the method shown in FIG. 2 includes the following steps:
  • Step S201 Obtain input data i * i, a deconvolution kernel k * k, a sliding step size s, and fill the zero number p;
  • Step S202 Each element value of the input data i * i is separately multiplied with the deconvolution kernel k * k; each element value is multiplied with the deconvolution kernel k * k to obtain i * i groups of data, i * The i group of data is moved by the sliding step s to form the initial output data;
  • Step S203 if the element position of the initial output data has multiple products of the multiplication operation value, perform a summing operation on the multiple products of the element position to obtain the final value of the element position;
  • Step S204 Perform clipping on the initial output data to obtain a final output result that meets the output size requirement.
  • i, k, and s are all integers greater than or equal to 1, and p is an integer greater than or equal to 0.
  • step S204 may be specifically:
  • the boundary data of the initial output data is clipped according to P to obtain the final output result.
  • step S204 may further include:
  • the boundary data of the initial output data will be cropped according to the size of the final output data to obtain the final output result.
  • the technical solution provided in this application does not need to perform a zero-padding operation when performing a deconvolution operation, so in actual applications, the amount of calculation will be small, avoiding the operation of inputting zero-padding, and the calculation efficiency is improved; according to the current The size of the mainstream deconvolution kernel (2,4,5,8), compared to the previous traditional convolution implementation method, the amount of operation is reduced to 1 / 4-1 / 80. That greatly reduces the amount of calculation.
  • More suitable for hardware implementation higher utilization of computing resources; more general hardware structure, easier to expand to the configuration of different layers; effectively deal with overlapping areas (that is, positions of elements with products) of the deconvolution result, through control Logic, get the correct output result with very small hardware resources (registers), and will not bring extra time consumption.
  • this application uses the input data as 2 * 2 data and the convolution kernel as 3 * 3 data as examples.
  • the input data of Figure 3 is 2 * 2 data
  • the deconvolution kernel is 3 * 3
  • P 1.
  • the position of each element is named in Figure 3, see Figure 3.
  • the K12 position it has two product values, i11 * K23 and i12 * K31, so it has the product of multiple multiplication operations, so the two products need to be calculated at the K12 position.
  • the K22 position has four product values, K21 has two product values, and K23 has two product values. Value, K32 has a value of 2 products, and the area of these elements can also be called overlapping area.
  • the initial output data is obtained through such calculation, and the final output data can be obtained by cutting the initial output data.
  • FIG. 4 provides a hardware architecture of deconvolution.
  • the hardware architecture executes steps and refinement steps of the method shown in FIG. 2.
  • the hardware architecture of the deconvolution includes: an input data cache, a deconvolution kernel cache, a deconvolution operation kernel, a partial result cache, a selector, an accumulator, and an initial output result cache;
  • the deconvolution kernel includes: k adders A, K multipliers M, and (k) * (k-s) buffers;
  • K multipliers M are connected to each other, k adders A are connected to each other, k adders A are respectively connected to K multipliers M, and (k) * (ks) caches include: k sets of caches, each set of caches Including ks buffers, k groups of buffers are connected to K multipliers M, respectively;
  • the input data buffer and the deconvolution kernel buffer are connected to K multipliers M and input the input data and the deconvolution kernel data; the k adders A are respectively connected to the input end of the partial result buffer and the input end of the selector. Part of the results
  • the buffer output end is connected to k adders A, and the partial output buffer output end is also connected to the input end of the selector, the output end of the selector is connected to the input end of the accumulator, and the output end of the accumulator is connected to the initial output data buffer.
  • the above-mentioned deconvolution hardware architecture may further include: a cutting component, which is configured to perform a cutting operation on the initial output data.
  • the above-mentioned deconvolution hardware architecture may further include a deconvolution kernel factor cache, and the deconvolution kernel factor cache is disposed between the deconvolution kernel cache and the K multipliers M.
  • each of the (k) * (k-s) buffers corresponds to an element position with multiple products. That is, each element of the overlapping area needs to be allocated a separate cache, which can avoid the advantage of data confusion in the overlapping area.
  • An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program causes a computer to execute any one of the deconvolution implementation methods described in the foregoing method embodiments. Some or all of the steps.
  • An embodiment of the present application further provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform the operations described in the foregoing method embodiments. Part or all of the steps of any deconvolution implementation.
  • processors and chips in the various embodiments of the present application may be integrated in one processing unit, or may exist separately physically, or two or more pieces of hardware may be integrated in one unit.
  • the computer-readable storage medium or computer-readable program may be stored in a computer-readable memory.
  • the technical solution of the present application essentially or part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, which is stored in a memory.
  • Several instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
  • the foregoing memories include: U disks, Read-Only Memory (ROM), Random Access Memory (RAM), mobile hard disks, magnetic disks, or optical disks and other media that can store program codes.
  • the program may be stored in a computer-readable memory, and the memory may include a flash disk.
  • ROM Read-only memory
  • RAM Random Access Memory
  • magnetic disks or optical disks etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提供一种反卷积实现方法及相关产品,所述方法包括如下步骤:获取输入数据i*i、反卷积核k*k,滑动步长s,填补零数目p;将输入数据i*i每个元素值单独与反卷积核k*k进行乘法的操作;每个元素值与反卷积核k*k进行乘法的操作得到i*i组数据,i*i组数据按滑动步长s移动组成初始输出数据;如初始输出数据的元素位置具有乘法的操作值的多个积,将所述元素位置的多个积执行求和操作得到所述元素位置的最终值。本申请提供的技术方案具有节省计算量,减低功耗的优点。

Description

反卷积实现方法及相关产品 技术领域
本申请涉及计算机以及人工智能技术领域,具体涉及一种反卷积实现方法及相关产品。
背景技术
随着生成神经网络在机器学习领域的不断发展与成熟,反卷积层被越来越多的应用到深度卷积网络的算法开发与应用中。卷积操作的作用类似神经网络中的编码器,用于对高维数据进行低维特征提取。反卷积通常用于将低维特征映射成高维输入,相当于一个解码器,实现了低维向量到高维向量的重构。反卷积操作主要应用于对抗生成神经网络,在图像分割、图像生成、边缘检测等领域都有很重要的作用。
现有的反卷积操作基于对输入数据的添零进行计算,所以其计算量大,能耗大。
申请内容
本申请实施例提供了一种反卷积实现方法及相关产品,不通过添零操作实现反卷积的运算,从来减少其计算量,降低功耗。
第一方面,本申请实施例提供一种反卷积实现方法,所述方法包括如下步骤:
获取输入数据i*i、反卷积核k*k,滑动步长s,填补零数目p;
将输入数据i*i每个元素值单独与反卷积核k*k进行乘法的操作;每个元素值与反卷积核k*k进行乘法的操作得到i*i组数据,i*i组数据按滑动步长s移动组成初始输出数据;
如初始输出数据的元素位置具有乘法的操作值的多个积,将所述元素位置的多个积执行求和操作得到所述元素位置的最终值;
i、k、s均为大于等于1的整数,p为大于等于0的整数。
第二方面,提供一种执行第一方面所述方法的反卷积的硬件架构,所述硬 件架构包括:输入数据缓存、反卷积核缓存、反卷积运算核、部分结果缓存,选择器、累加器和初始输出结果缓存;
其中,所述反卷积运算核包括:k个加法器A、K个乘法器M以及(k)*(k-s)个缓存;
其中,K个乘法器M互连接,k个加法器A互连接,k个加法器A分别与K个乘法器M连接,(k)*(k-s)个缓存包括:k组缓存,每组缓存包括k-s个缓存,k组缓存分别与K个乘法器M连接;
输入数据缓存以及反卷积核缓存与K个乘法器M连接并输入输入数据和反卷积核数据;k个加法器A分别与部分结果缓存的输入端以及选择器的输入端连接,部分结果缓存输出端回连k个加法器A,部分结果缓存输出端还与选择器的输入端连接,选择器的输出端连接累加器的输入端,累加器的输出端连接初始输出数据缓存。
第三方面,提供一种计算机可读存储介质,其存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如第一方面提供的方法。
第四方面,提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行第一方面提供的方法。
实施本申请实施例,具有如下有益效果:
可以看出,本申请提供的技术方案不通过添零操作,直接进行计算得到反卷积运算的结果,所以其具有减少计算量,降低功耗的优点。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是一种软件上实现反卷积运算的示意图。
图2是本申请提供的一种卷积实现方法的流程示意图。
图3是本申请提供的一种2*2输入数据与3*3反卷积核的反卷积运算实现示例图。
图4是本申请提供的一种反卷积的硬件架构的结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及所述附图中的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本申请中的电子装置可以包括:服务器、智能摄像设备、智能手机(如Android手机、iOS手机、Windows Phone手机等)、平板电脑、掌上电脑、笔记本电脑、移动互联网设备(MID,Mobile Internet Devices)或穿戴式设备等,上述电子装置仅是举例,而非穷举,包含但不限于上述电子装置,为了描述的方便,下面实施例中将上述电子装置称为用户设备(User equipment,UE)、终端或电子设备。当然在实际应用中,上述用户设备也不限于上述变现形式,例如还可以包括:智能车载终端、计算机设备等等。
参阅图1,图1为一种反卷积操作的数据示意图,这里的输入数据(input)以3*3数据为例,反卷积的卷积核尺寸为3*3,其计算的方法可以为,输入图特征图像大小i*i,反卷积核k*k,滑动步长s,填补零数目p。对应输出特征图像的大小o*o,满足:o=s*(i-1)+k-2*p。
参阅图1,滑动步长S=2,填补零数目P=1,k=3,对于3*3的输入数据对输入相邻数据填充s-1个0,边界补充k-p-1个0以后得到7*7的数据块,与反 卷积核进行步长为1的卷积运算,得到7*7的输出图像;最后对7*7的图像裁剪掉边界上的数据(p行,p列),得到6*6的输出结果。
参阅图1,图1的技术方案除了卷积中的乘累加运算之外,要求的填0操作不适合硬件实现,在FPGA上的执行效率比较低,此外也增加了对输入图像储存的内存需求;在大量填补的0导致大部分乘累加运算为无效操作,计算效率大大下降,导致对硬件计算资源利用率下降;通用性低:当把反卷积操作转化为卷积运算时,由于输入图像的不规律,会导致对含有不同参数的反卷积层操作时,数据读取模式变化(比如不同的反卷积核,滑动步长等),导致硬件上需要不同的计算模块来实现不同层的操作,硬件设计复杂度加大。
参阅图2,图2提供一种反卷积实现方法,该方法由终端执行,该方法如图2所示,包括如下步骤:
步骤S201、获取输入数据i*i、反卷积核k*k,滑动步长s,填补零数目p;
步骤S202、输入数据i*i每个元素值单独与反卷积核k*k进行乘法的操作;每个元素值与反卷积核k*k进行乘法的操作得到i*i组数据,i*i组数据按滑动步长s移动组成初始输出数据;
步骤S203、如初始输出数据的元素位置具有乘法的操作值的多个积,将该元素位置的多个积执行求和操作得到该元素位置的最终值;
步骤S204(可选的)、将初始输出数据执行裁剪得到符合输出大小需求的最终输出结果。
上述i、k、s均为大于等于1的整数,p为大于等于0的整数。
可选的,上述步骤S204的实现方法具体可以为:
将初始输出数据的边界数据按P进行裁剪得到最终输出结果。
当然,上述步骤S204的实现方法具体还可以包括:
将根据最终输出数据的大小将初始输出数据的边界数据进行裁剪得到最终输出结果。
本申请提供的技术方案在执行反卷积操作时,无需执行填零操作,那么在实际应用中,计算的量就会很小,避免了对输入填0的操作,计算效率得到提高;根据现在主流的反卷积核的大小(2,4,5,8),相比之前的传统卷积实现方法,运算量减少到1/4—1/80。即大大的减少了计算量。更适合于硬件实现:计算资源的使用率更高;硬件结构更通用,更易于拓展到不同层的配置;有效地处理 了反卷积结果的重叠区域(即具有积的元素位置),通过控制逻辑,以很小的硬件资源(寄存器)得到正确的输出结果,不会带来时间上的额外消耗。
为了更好的说明本申请的效果,本申请以输入数据为2*2数据,卷积核为3*3数据为例来说明。
参阅图3,图3的输入数据为2*2数据,反卷积核为3*3,滑动步长S=2,P=1,为了方便说明,每个元素的位置命名参见图3,参阅图3,输入数据的每个元素值单独与反卷积核相乘,例如i11分别与k11—K33相乘得到9个数值,9个数值排列得到i11方框的位置,同理i12分别与k11—K33相乘得到另外9个数值,另外9个数值按滑动步长S=2向右移动2列即得到i12方框位置,同理,得到i21得到i21方框位置以及i22方框位置,如图3所示,那么对于K12位置,其具有2个乘积的值,分别为,i11*K23以及i12*K31,所以其具有多个乘法操作的积,这样需要在K12位置将2个积进行求和操作得到的和即为K12位置的具体值,即O23=i11*K23+i12*K31,同理,K22位置具有4个乘积的值,K21具有2个乘积的值,K23具有2个乘积的值,K32具有2个乘积的值,这些元素的区域也可以称为重叠区域。经过这样计算即得到初始输出数据,对该初始输出数据进行裁剪即能够得到最终输出数据。
参阅图4,图4提供了一种反卷积的硬件架构,该硬件架构执行如图2所示实施例的方法的步骤以及细化步骤。
参阅图4,该反卷积的硬件架构包括:输入数据缓存、反卷积核缓存、反卷积运算核、部分结果缓存,选择器、累加器和初始输出结果缓存;
其中,该反卷积运算核包括:k个加法器A、K个乘法器M以及(k)*(k-s)个缓存;
其中,K个乘法器M互连接,k个加法器A互连接,k个加法器A分别与K个乘法器M连接,(k)*(k-s)个缓存包括:k组缓存,每组缓存包括k-s个缓存,k组缓存分别与K个乘法器M连接;
输入数据缓存以及反卷积核缓存与K个乘法器M连接并输入输入数据和反卷积核数据;k个加法器A分别与部分结果缓存的输入端以及选择器的输入端连接,部分结果缓存输出端回连k个加法器A,部分结果缓存输出端还与选择器的输入端连接,选择器的输出端连接累加器的输入端,累加器的输出端连接初始输出数据缓存。
可选的,上述反卷积的硬件架构还可以包括:裁剪部件,该裁剪部件用于对该初始输出数据执行裁剪操作。
可选的,上述反卷积的硬件架构还可以包括:反卷积核因子缓存,该反卷积核因子缓存设置在反卷积核缓存与K个乘法器M之间。
可选的,上述(k)*(k-s)个缓存中每个缓存对应一个具有多个积的元素位置。即重叠区域的每个元素需要分配一个单独的缓存,这样能够避免重叠区域的数据混乱的优点。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种反卷积实现方法的部分或全部步骤。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种反卷积实现方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的
另外,在本申请各个实施例中的处理器、芯片可以集成在一个处理单元中,也可以是单独物理存在,也可以两个或两个以上硬件集成在一个单元中。计算机可读存储介质或计算机可读程序可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分 步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种反卷积实现方法,其特征在于,所述方法包括如下步骤:
    获取输入数据i*i、反卷积核k*k,滑动步长s,填补零数目p;
    将输入数据i*i每个元素值单独与反卷积核k*k进行乘法的操作;每个元素值与反卷积核k*k进行乘法的操作得到i*i组数据,i*i组数据按滑动步长s移动组成初始输出数据;
    如初始输出数据的元素位置具有乘法的操作值的多个积,将所述元素位置的多个积执行求和操作得到所述元素位置的最终值;
    i、k、s均为大于等于1的整数,p为大于等于0的整数。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述初始输出数据执行裁剪得到符合输出大小需求的最终输出结果。
  3. 根据权利要求2所述的方法,其特征在于,所述将所述初始输出数据执行裁剪得到符合输出大小需求的最终输出结果具体包括:
    将所述初始输出数据的边界数据按p进行裁剪得到最终输出结果。
  4. 根据权利要求2所述的方法,其特征在于,所述将所述初始输出数据执行裁剪得到符合输出大小需求的最终输出结果具体包括:
    将根据最终输出数据的大小将初始输出数据的边界数据进行裁剪得到最终输出结果。
  5. 一种执行如权利要求1-4任意一项所述方法的反卷积的硬件架构,所述硬件架构包括:输入数据缓存、反卷积核缓存、反卷积运算核、部分结果缓存,选择器、累加器和初始输出结果缓存;
    其中,所述反卷积运算核包括:k个加法器A、K个乘法器M以及k*(k-s)个缓存;
    其中,K个乘法器M互连接,k个加法器A互连接,k个加法器A分别与K个乘法器M连接,(k)*(k-s)个缓存包括:k组缓存,每组缓存包括k-s个缓存,k组缓存分别与K个乘法器M连接;
    输入数据缓存以及反卷积核缓存与K个乘法器M连接并输入输入数据和反卷积核数据;k个加法器A分别与部分结果缓存的输入端以及选择器的输入端连接,部分结果缓存输出端回连k个加法器A,部分结果缓存输出端还与选择 器的输入端连接,选择器的输出端连接累加器的输入端,累加器的输出端连接初始输出数据缓存。
  6. 根据权利要求5所述的反卷积的硬件架构,其特征在于,所述反卷积的硬件架构还包括:裁剪部件,用于对该初始输出数据执行裁剪操作得到最终输出数据。
  7. 根据权利要求5所述的反卷积的硬件架构,其特征在于,所述反卷积的硬件架构还包括:反卷积核因子缓存,该反卷积核因子缓存设置在反卷积核缓存与K个乘法器M之间。
  8. 根据权利要求5所述的反卷积的硬件架构,其特征在于,
    所述k*(k-s)个缓存中每个缓存对应一个具有多个积的元素位置。
  9. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-4中任意一项所述的方法。
  10. 一种计算机程序产品,其特征在于,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如权利要求1-4中任意一项所述的方法。
PCT/CN2018/096137 2018-07-18 2018-07-18 反卷积实现方法及相关产品 WO2020014893A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/096137 WO2020014893A1 (zh) 2018-07-18 2018-07-18 反卷积实现方法及相关产品
CN201880004281.4A CN110088777B (zh) 2018-07-18 2018-07-18 反卷积实现方法及相关产品

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/096137 WO2020014893A1 (zh) 2018-07-18 2018-07-18 反卷积实现方法及相关产品

Publications (1)

Publication Number Publication Date
WO2020014893A1 true WO2020014893A1 (zh) 2020-01-23

Family

ID=67412589

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/096137 WO2020014893A1 (zh) 2018-07-18 2018-07-18 反卷积实现方法及相关产品

Country Status (2)

Country Link
CN (1) CN110088777B (zh)
WO (1) WO2020014893A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113466681A (zh) * 2021-05-31 2021-10-01 国网浙江省电力有限公司营销服务中心 一种基于小样本学习的断路器寿命预测方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704197B (zh) 2019-10-17 2022-12-09 北京小米移动软件有限公司 处理内存访问开销的方法、装置及介质
CN112926020B (zh) * 2019-12-06 2023-07-25 腾讯科技(深圳)有限公司 反卷积处理方法、图像处理方法和相应装置
CN111428189B (zh) * 2020-04-01 2023-09-22 南京大学 一种用于反卷积运算的数据预处理方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170236256A1 (en) * 2016-02-11 2017-08-17 Xsight Technologies Llc System and method for isolating best digital image when using deconvolution to remove camera or scene motion
CN107451659A (zh) * 2017-07-27 2017-12-08 清华大学 用于位宽分区的神经网络加速器及其实现方法
CN108268932A (zh) * 2016-12-31 2018-07-10 上海兆芯集成电路有限公司 神经网络单元

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105303185A (zh) * 2015-11-27 2016-02-03 中国科学院深圳先进技术研究院 虹膜定位方法及装置
KR102631381B1 (ko) * 2016-11-07 2024-01-31 삼성전자주식회사 컨볼루션 신경망 처리 방법 및 장치
CN106600577B (zh) * 2016-11-10 2019-10-18 华南理工大学 一种基于深度反卷积神经网络的细胞计数方法
CN107578054A (zh) * 2017-09-27 2018-01-12 北京小米移动软件有限公司 图像处理方法及装置
CN107944545B (zh) * 2017-11-10 2020-07-31 中国科学院计算技术研究所 应用于神经网络的计算方法及计算装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170236256A1 (en) * 2016-02-11 2017-08-17 Xsight Technologies Llc System and method for isolating best digital image when using deconvolution to remove camera or scene motion
CN108268932A (zh) * 2016-12-31 2018-07-10 上海兆芯集成电路有限公司 神经网络单元
CN107451659A (zh) * 2017-07-27 2017-12-08 清华大学 用于位宽分区的神经网络加速器及其实现方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PENG, YALI ET AL.: "Deep Deconvolution Neural Network for Image Super-Resolution", JOURNAL OF SOFTWARE, vol. 29, no. 04, 30 April 2018 (2018-04-30), pages 928; 929, ISSN: 1000-9825 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113466681A (zh) * 2021-05-31 2021-10-01 国网浙江省电力有限公司营销服务中心 一种基于小样本学习的断路器寿命预测方法
CN113466681B (zh) * 2021-05-31 2024-05-10 国网浙江省电力有限公司营销服务中心 一种基于小样本学习的断路器寿命预测方法

Also Published As

Publication number Publication date
CN110088777A (zh) 2019-08-02
CN110088777B (zh) 2023-05-05

Similar Documents

Publication Publication Date Title
US11720646B2 (en) Operation accelerator
WO2020014893A1 (zh) 反卷积实现方法及相关产品
WO2019109795A1 (zh) 卷积运算处理方法及相关产品
US11934481B2 (en) Matrix multiplier
US10140251B2 (en) Processor and method for executing matrix multiplication operation on processor
KR102305851B1 (ko) 신경망의 컨볼루션 계산을 위한 방법 및 전자 디바이스
US10025755B2 (en) Device and method to process data in parallel
WO2020073211A1 (zh) 运算加速器、处理方法及相关设备
JP2019082996A (ja) 畳み込みニューラルネットワークにおいて演算を実行する方法および装置並びに非一時的な記憶媒体
CN107633297B (zh) 一种基于并行快速fir滤波器算法的卷积神经网络硬件加速器
CN108573305B (zh) 一种数据处理方法、设备及装置
TWI761778B (zh) 模型參數確定方法、裝置和電子設備
US20220083857A1 (en) Convolutional neural network operation method and device
WO2018139177A1 (ja) プロセッサ、情報処理装置及びプロセッサの動作方法
WO2022042124A1 (zh) 超分辨率图像重建方法、装置、计算机设备和存储介质
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
JP7114659B2 (ja) ニューラルネットワーク方法及び装置
WO2020041962A1 (zh) 一种并行反卷积计算方法、单引擎计算方法及相关产品
CN109324984B (zh) 在卷积运算中使用循环寻址的方法和装置
CN111125617A (zh) 数据处理方法、装置、计算机设备和存储介质
WO2020029181A1 (zh) 三维卷积神经网络计算装置及相关产品
CN112784951A (zh) Winograd卷积运算方法及相关产品
CN112765542A (zh) 运算装置
KR20200023154A (ko) 컨볼루션 뉴럴 네트워크를 처리하는 방법 및 장치
CN111583382B (zh) 数据计算方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18926417

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 140521)

122 Ep: pct application non-entry in european phase

Ref document number: 18926417

Country of ref document: EP

Kind code of ref document: A1