WO2019127926A1 - 一种稀疏神经网络的计算方法及计算装置、电子装置、计算机可读存储介质以及计算机程序产品 - Google Patents

一种稀疏神经网络的计算方法及计算装置、电子装置、计算机可读存储介质以及计算机程序产品 Download PDF

Info

Publication number
WO2019127926A1
WO2019127926A1 PCT/CN2018/079373 CN2018079373W WO2019127926A1 WO 2019127926 A1 WO2019127926 A1 WO 2019127926A1 CN 2018079373 W CN2018079373 W CN 2018079373W WO 2019127926 A1 WO2019127926 A1 WO 2019127926A1
Authority
WO
WIPO (PCT)
Prior art keywords
kernel
value
weight
identifier
calculation
Prior art date
Application number
PCT/CN2018/079373
Other languages
English (en)
French (fr)
Inventor
曹庆新
黎立煌
李炜
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Priority to US16/627,293 priority Critical patent/US20200242467A1/en
Publication of WO2019127926A1 publication Critical patent/WO2019127926A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2136Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present application relates to the field of artificial intelligence technologies, and in particular, to a computing method and computing device for a sparse neural network, an electronic device, a computer readable storage medium, and a computer program product.
  • the existing implementation technology sparse neural network calculation is complicated in implementation, and it is difficult to fully utilize the utilization of computing resources, so the existing sparse neural network has large calculation amount and large power consumption.
  • the embodiment of the present application provides a computing method and a computing device, an electronic device, a computer readable storage medium, and a computer program product of a sparse neural network, which can reduce the calculation amount of the sparse neural network, thereby reducing power consumption and saving computing time. advantage.
  • an embodiment of the present application provides a method for calculating a sparse neural network, where the method includes the following steps:
  • Receiving a calculation instruction of the sparse neural network acquiring a weight CO*CI*n*m corresponding to the calculation instruction according to the calculation instruction; determining a kernel size KERNEL SIZE of the weight, and scanning the core size as a basic granularity
  • the weights are identified by QMASK rights, and the weight identifiers include: CO*CI values. For example, all weight values in the kth basic granularity KERNEL k are 0, and the corresponding position rights identifier of the KERNELK in the weight identifier is [ K] is marked as the first specific value.
  • the corresponding position identification [K] of the KERNEL K in the weight identification is marked as the second specific value; the value range of k is [ 1, CO*CI]; storing KERNEL[n][m] corresponding to the second feature value of the weight identifier, and deleting KERNEL[n][m] corresponding to the first feature value of the weight identifier;
  • All values of the scan weight identifier if the value is equal to the second specific value, extract the KERNEL corresponding to the value and the input data corresponding to the KERNEL, and perform an operation on the input data and KERNEL to obtain an initial result, if the value is equal to the first characteristic value, The KERNEL corresponding to the value and the input data are not read;
  • n and m are in an integer greater than or equal to one.
  • the storing the KERNEL[n][m] corresponding to the second feature value of the weight identifier including:
  • the input data and the KERNEL perform an operation to obtain an initial result, including:
  • the core identifier includes 9 bits, and the 9 bits correspond to 9 elements of KERNEL[3][3], such as the position of the core identifier x2
  • the value of the value is equal to 0, and the element value of KERNEL[3][3] corresponding to the x2 is not read.
  • the value of the position x1 of the core identifier is equal to 1, and the position x1 corresponding to the value is determined, and KERNEL[3][3 is read.
  • the element value KERNEL[3][3]x1 at the x1 position and the input data x1 corresponding to the x1 position, the KERNEL[3][3]x1 and the input data x1 are subjected to a product operation to obtain a product result; the value of the x1
  • the range is [1, 9];
  • a computing device for a sparse neural network comprising:
  • transceiver interface for receiving a calculation instruction of the sparse neural network
  • Obtaining a circuit configured to extract, according to the calculation instruction, a weight CO*CI*n*m corresponding to the calculation instruction from a memory;
  • a compiling circuit configured to determine a kernel size KERNEL SIZE of the weight, and scan the weight to obtain a weight identification with the core size as a basic granularity, where the weight identifier includes: CO*CI values, such as a kth basic All the weight values in the KERNEL k are 0, and the corresponding position weight [K] of the KERNELK in the weight identification is marked as the first specific value.
  • the KERNEL K is The corresponding position right identifier [K] of the weight identifier is marked as a second specific value; the value range of k is [1, CO*CI]; and the KERNEL[n][m] corresponding to the second feature value of the weight identifier is stored, KERNEL[n][m] corresponding to the first feature value of the weight identifier is deleted;
  • the calculation circuit is configured to scan all the values of the weight identification, if the value is equal to the second specific value, extract the KERNEL corresponding to the value and the input data corresponding to the KERNEL, and perform an operation on the input data and KERNEL to obtain an initial result, if the value is equal to The first feature value does not read the KERNEL corresponding to the value and the input data; and all the initial results are subjected to arithmetic processing to obtain a calculation result of the calculation instruction.
  • n and m are in an integer greater than or equal to one.
  • the first specific value is 0, and the second specific value is 1;
  • the first specific value is 1, and the second specific value is 0.
  • the calculation circuit is specifically configured to scan all values of the core identifier corresponding to KERNEL[3][3], the core identifier includes 9 bits, and the 9 bits and KERNEL [3] [9] corresponds to 9 elements. If the value of the position x2 of the core identifier is equal to 0, the element value of KERNEL[3][3] corresponding to the x2 is not read, for example, the value of the position x1 of the core identifier is equal to 1.
  • KERNEL[3][3]x1 Determine the position x1 corresponding to the value, and read the element value KERNEL[3][3]x1 at the x1 position of KERNEL[3][3] and the input data x1 corresponding to the position of x1, and KERNEL[3][3] X1 performs a product operation with the input data x1 to obtain a product result; the range of x1 is [1, 9]; and all the product results of the core identifier having a value of 1 are added to obtain the initial result.
  • an electronic device comprising a computing device of a sparse neural network provided by the second aspect.
  • a computer readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to perform the method as provided in the first aspect.
  • a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer program being operative to cause a computer to perform the method provided by the first aspect.
  • the present application adds a weight identifier and a core identifier. Due to the sparse network model, there are more weight element values of 0, so the saved weight parameter space is much larger than the added weight identification and nuclear identification information.
  • the compressed parameters effectively save storage space, saving bandwidth of DDR memory.
  • the technical solution provided by the embodiment shown in FIG. 3 does not extract the corresponding input data when the weight identification is zero, which saves the overhead of data transmission between the calculator and the memory, and removes the corresponding operation, thereby reducing The amount of calculation reduces power consumption and saves costs.
  • 1 is a schematic structural view of an electronic device.
  • FIG. 2 is a schematic diagram of data operations of a sparse neural network.
  • FIG. 3 is a schematic flow chart of a method for calculating a sparse neural network provided by the present invention.
  • Figure 3a is a schematic diagram of a weight identification.
  • Figure 3b is a schematic diagram of KERNEL [3] [3].
  • Figure 3c is a schematic diagram of another KERNEL [3] [3].
  • Figure 3d is a schematic diagram of a nuclear identification.
  • FIG. 4 is a schematic structural diagram of a chip according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a computing device of a sparse neural network according to an embodiment of the present application.
  • the electronic device in the present application may include: a server, a smart camera device, a smart phone (such as an Android phone, an iOS phone, a Windows Phone phone, etc.), a tablet computer, a palmtop computer, a notebook computer, and a mobile Internet device (MID, Mobile Internet Devices).
  • a smart camera device such as an Android phone, an iOS phone, a Windows Phone phone, etc.
  • a tablet computer such as an Android phone, an iOS phone, a Windows Phone phone, etc.
  • MID Mobile Internet Devices
  • the electronic device is referred to as a user equipment (UE) in the following embodiments.
  • Terminal or electronic device the foregoing user equipment is not limited to the above-mentioned realized form, and may also include, for example, an intelligent vehicle-mounted terminal, a computer device, and the like.
  • the structure is as shown in FIG. 1 .
  • it may include: a processor 101, a memory 102, and a neural network chip 103.
  • the processor 101 is connected to the memory 102 and the neural network chip 103.
  • the neural network chip 103 described above may be integrated in the processor 101.
  • the memory 102 may include a flash disk, a read-only memory (English: Read-Only Memory, ROM for short), a random access memory (English: Random Access Memory, RAM for short).
  • the technical solution of the present invention is not limited to whether the above-mentioned neural network chip 103 is separately provided or integrated in the processor 101.
  • FIG. 2 provides a schematic diagram of data operation of a sparse neural network.
  • each neural network model in which the weight value (WEIGHTS) is used, the weight value may also be simply referred to as a weight.
  • the weight basically determines the complexity of the neural network model calculation; and the optimization of the sparseization is to optimize the more elements of the weight value to 0 as much as possible without changing the structure of the neural network model.
  • the input of the neural network calculation has two paths, one is the weight value (such as Filter in Figure 2), the other is the input data Input Image (CI), and the output is the output data Output Image ( CO).
  • the neural network model can contain multiple layers of calculations. Each layer calculation may include complex operations such as matrix multiplication by matrix, convolution operation, etc.
  • each layer calculation may include complex operations such as matrix multiplication by matrix, convolution operation, etc.
  • sparse neural network models it can also be called sparse neural network.
  • Model or sparse neural network sparse neural network has a relatively large number of elements with a value of 0 in the weight relative to the neural network. Since the number of values in the weight is 0, the calculation amount is relatively small. So it is called sparse neural network. As shown in Figure 2, it is a representation of the weight of a sparse neural network.
  • the calculation method of the neural network is divided into multi-layer operations.
  • Each layer operation is the operation between the input data and the weight of the layer, as shown in Figure 2, which is the input.
  • the operation between data and weights including but not limited to: convolution operations, matrix multiplication matrix operations, and the like.
  • the schematic diagram shown in Figure 2 can be a convolution operation of a layer of a neural network. Specifically,
  • Filters represent weight values in the neural network model
  • Input Image is the CI in this application.
  • Output Image is the CO in this article
  • Each CO is multiplied by all CIs and corresponding weight values and accumulated.
  • the number of weight values is CI NUM*CO NUM, and each weight is a two-dimensional matrix data structure.
  • the processing method does not optimize the sparse calculation, so the calculation amount is not much reduced compared with the neural network.
  • the power consumption of the neural network chip is directly related to the calculation amount of the neural network model, so the above calculation method cannot reduce the power consumption of the chip.
  • FIG. 3 provides a calculation method of a sparse neural network, which is performed by a processor or a neural network processing chip. The method is as shown in FIG. 3, and includes the following steps:
  • Step S301 Receive a sparse neural network calculation instruction, and extract the calculation instruction corresponding weight CO*CI*n*m according to the sparse calculation instruction.
  • Step S302 Determine a kernel size KERNEL SIZE[n][m] of the weight; obtain a weight identifier by using a kernel size as a basic granularity scanning weight, and the weight identifier includes: CO*CI values, such as all in KERNEL k weight value (i.e., element values) are zero, corresponding to the right position in the right KERNEL K identifier identifying QMASK [K] labeled first specific value (e.g. 0), the weighting value as the KERNEL K (i.e., the element value) Not all of 0, the corresponding position right identification [K] of the KERNEL K weight identification is marked as a second specific value (for example, 1).
  • CO*CI values such as all in KERNEL k weight value (i.e., element values) are zero, corresponding to the right position in the right KERNEL K identifier identifying QMASK [K] labeled first specific value (e.g. 0), the weighting value as the KERNEL K
  • n 1, 3 or 5.
  • KERNEL [3] [3] as shown in Figure 3b, that is, it has 4 non-zero weight values.
  • a core identifier (WMASK)[1] is generated, the core identifier [1] includes n*m bits, and each bit represents whether a corresponding element value in KERNEL[3][3] is Zero. As shown in Figure 3b, KERNEL [3] [3], its corresponding core identification [1] is shown in Figure 3d.
  • Step S303 Store KERNEL[3][3] corresponding to the second feature value of the weight identifier, and delete KERNEL[3][3] corresponding to the first feature value of the weight identifier.
  • the core identifier is a fine-grained identifier indicating which element inside KERNEL[n][n] is 0 and which element is non-zero. In this way, the weight identifier and the core identifier can be combined to represent all of the weights, so that the control device can be instructed to skip the calculation of the zero value in the weight, thereby reducing power consumption and calculation amount.
  • the rights identification and the nuclear identification are processed offline.
  • the offline scanning obtains the weight identification and the nuclear identification, and compresses the weight according to them (that is, deletes the 0 value element, stores only the non-zero element, and combines the weight identification and the nuclear identification. Up to indicate the location of the non-zero element).
  • Step S305 performing an operation on the KERNEL K and the input data CI to obtain an initial result.
  • the implementation method of the foregoing step S305 may specifically include:
  • Reading n*m bit values of the core identifier [k] corresponding to KERNEL K traversing all the bit values of the core identifier [k], performing a calculation on the weight value of the bit value not zero and performing the operation on the corresponding input data CI to obtain at least one As a result, specifically, if the value of the bit is zero, the operation of the bit is not performed. If the bit is non-zero, the weight value corresponding to the bit in KERNEL K is read, and the weight value is corresponding to The input data [K] performs an operation to obtain a preamble result. Combine at least one preamble result to get the initial result.
  • Step S306 the traversal right identifier calculates KERNEL[3][3] corresponding to all the second feature values and the corresponding input data to obtain a plurality of initial results.
  • Step S307 performing arithmetic processing on all the initial results obtained to obtain a calculation result of the calculation instruction.
  • the weight identifier and the core identifier are added.
  • the weight identifier and the core identifier are added.
  • the weight identification and nuclear identification information, the compressed parameters effectively save storage space and save DDR bandwidth.
  • the technical solution provided by the embodiment shown in FIG. 3 does not extract the corresponding input data when the weight identification is zero, which saves the overhead of data transmission between the calculator and the memory, and removes the corresponding operation, thereby reducing The calculation amount, the input of the technical scheme shown in FIG.
  • FIG. 4 is a schematic structural diagram of a neural network processing chip. As shown in FIG. 4, the chip includes: a memory DDR, a data transmission circuit IDMA, a parameter transmission circuit WDMA, and a calculation processing circuit PE. among them,
  • the data transmission circuit is a data transmission circuit inside the neural network processor (mainly transmitting input data);
  • the parameter transmission circuit is a parameter transmission circuit (main transmission weight data and weight identification) inside the neural network processor;
  • the data transmission circuit controls the handling of the CI data from the memory to the calculation processing circuit according to the weight identification information, specifically,
  • the data transmission circuit When the data transmission circuit acquires a certain position of the weight identifier equal to 0, it directly skips a certain position and directly goes to the next position of the weight identification. If the value of the following position is 1, the non-zero CI position corresponding to the next position is corresponding. Handling to the processing circuit, which eliminates unnecessary data handling and internal storage, saving chip power and storage space. Jump directly to the position where the next weight is not 0, and cooperate with the calculation of the calculation processing circuit to ensure timely data supply and improve the calculation speed.
  • the parameter transmission circuit is responsible for carrying the compressed weight value and the core identification from the memory to the inside of the calculation processing circuit.
  • the weight value has deleted all 0s, the handling capacity and power consumption of the parameter transmission circuit have been optimized to the maximum, and the identification is sent to the calculation processing circuit for telling the calculation processing circuit how to perform the jump 0 calculation and improve the calculation efficiency.
  • the calculation processing circuit is a calculation processing circuit inside the neural network processor
  • the calculation processing circuit completes the product of the CI and the weight and the cumulative calculation:
  • the general method does not matter whether the weight is 0, completes the product of all CIs and weights, and then accumulates the result.
  • the product is also 0, which has no effect on the accumulated result. If it can jump directly, it can greatly speed up the calculation efficiency, reduce the calculation amount and power consumption.
  • the weight identifier and the core identifier are added to identify the position and distribution of the weight 0, the position information of the zero identified by the calculation processing circuit according to the weight identifier and the core identifier can be directly skipped by the element having a value of 0 in the weight. Calculation. specific,
  • Step C The calculation processing circuit parses out the position x1 of the core identifier [1+1]1, and then reads the CI[1+1] x1 data in the BUF, and extracts the value corresponding to x1 of KERNEL[1+1]. KERNEL[1+1] x1 , which multiplies KERNEL[1+1] x1 and CI[1+1] x1 data to obtain the product result.
  • the above CI[1+1] x1 data can be obtained according to the principle of operation. For example, if it is a convolution operation, the position of the CI[1+1] x1 data in the CI data and the CI are determined according to the principle of the convolution operation. [1+1] The specific value of x1 .
  • Step d the calculation processing circuit repeats step C until all the values of the core identifier [1+1] are resolved.
  • Step f The calculation processing circuit parses the position x1 of the core identifier [k] by one, and then reads the CI[k] x1 data in the BUF, and extracts KERNEL[k] x1 from the value corresponding to x1 of KERNEL[k]. The product result is obtained by multiplying KERNEL[k] x1 and CI[k] x1 data.
  • Step g the calculation processing circuit repeats step f until all the values of the core identifier [k] are resolved.
  • Step h The calculation processing circuit traverses all the values of the core identification. If the value is zero, step a is performed. If the value is 1, the steps e, f, and step g are performed.
  • Step I The calculation processing circuit calculates all the product result to obtain a calculation result, and the product result operation includes but is not limited to: an activation operation, a sort operation, an accumulation operation, a conversion operation, and the like.
  • the calculation processing circuit of the present application can analyze the two layers of data, that is, the weight identification and the identification, and the calculation processing circuit can simply skip the weight element value according to the value of the two layers of data.
  • the calculation is then matched with the weight of the compression; the model calculation can be done efficiently, since the structure of the chip as shown in Figure 4 can directly jump to all zero calculations, and the value memory for zero is not stored.
  • FIG. 5 provides a computing device that provides a sparse neural network, the device comprising:
  • a transceiver interface 501 configured to receive a calculation instruction of the sparse neural network
  • the obtaining circuit 502 is configured to extract, according to the calculation instruction, a weight CO*CI*n*m corresponding to the calculation instruction from the memory 505;
  • a compiling circuit 503 configured to determine a kernel size KERNEL SIZE of the weight, and scan the weight obtaining weight identification with the core size as a basic granularity, where the weight identifier includes: CO*CI values, such as the kth All weight values in the basic granularity KERNEL k are 0, and the corresponding position weight identification [K] of the KERNEL K in the weight identification is marked as the first specific value.
  • the KERNEL K is marked as the second specific value in the corresponding position weight [K] of the weight identifier; the value range of k is [1, CO*CI]; the KERNEL[n][m] corresponding to the second feature value of the weight identifier Store, delete KERNEL[n][m] corresponding to the first feature value of the weight identifier;
  • the calculation circuit 504 is configured to scan all the values of the weight identification, if the value is equal to the second specific value, extract the KERNEL corresponding to the value and the input data corresponding to the KERNEL, and perform an operation on the input data and KERNEL to obtain an initial result, such as the value. Equal to the first feature value, the KERNEL corresponding to the value and the input data are not read; all the initial results are subjected to arithmetic processing to obtain a calculation result of the calculation instruction.
  • n is equal to any one of 1, 3, and 5.
  • the first specific value is 0, and the second specific value is 1;
  • the first specific value is 1, and the second specific value is 0.
  • the calculation circuit 504 is specifically configured to scan all values of the core identifier corresponding to KERNEL[3][3], the core identifier includes 9 bits, and the 9 bits are combined with KERNEL [ 3][9] corresponds to 9 elements. If the value of the position x2 of the core identifier is equal to 0, the element value of KERNEL[3][3] corresponding to the x2 is not read, for example, the value of the position x1 of the core identifier is equal to 1.
  • the embodiment of the present application further provides an electronic device, wherein the electronic device includes the computing device of the sparse neural network described above.
  • the embodiment of the present application further provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program for electronic data exchange, the computer program causing the computer to perform any one of the sparse methods described in the foregoing method embodiments. Part or all of the steps of the neural network calculation method.
  • the embodiment of the present application further provides a computer program product, comprising: a non-transitory computer readable storage medium storing a computer program, the computer program being operative to cause a computer to perform the operations as recited in the foregoing method embodiments Part or all of the steps in the calculation of any sparse neural network.
  • the disclosed apparatus may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software program module.
  • the integrated unit if implemented in the form of a software program module and sold or used as a standalone product, may be stored in a computer readable memory.
  • a computer readable memory A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing memory includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like, which can store program codes.
  • ROM Read-Only Memory
  • RAM Random Access Memory

Abstract

一种稀疏神经网络的计算方法,所述方法包括:接收稀疏神经网络的计算指令,依据所述计算指令获取所述计算指令对应的权值CO*CI*n*m;确定所述权值的核尺寸KERNEL SIZE,以所述核尺寸为基本粒度扫描所述权值得到权标识,将权标识的第二特征值对应的KERNEL存储,将权标识的第一特征值对应的KERNEL删除;扫描权标识的所有值,如该值等于第二特定值,提取该值对应的KERNEL以及输入数据,将输入数据与KERNEL执行运算得到初始结果,如该值等于第一特征值,不读取该值对应的KERNEL以及输入数据;将所有的初始结果进行运算处理得到所述计算指令的计算结果。上述技术方案具有功耗小的优点。

Description

一种稀疏神经网络的计算方法及计算装置、电子装置、计算机可读存储介质以及计算机程序产品
本申请要求于2017年12月29日提交中国专利局,申请号为201711480629.0、发明名称为“神经网络的计算方法及相关产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,具体涉及一种稀疏神经网络的计算方法及计算装置、电子装置、计算机可读存储介质以及计算机程序产品。
背景技术
随着人工智能技术的日益成熟,各行各业的应用场景和产品需求爆发式增长;为了达到商用产品的需求,人工智能的神经网络算法的计算复杂度非常巨大,这对硬件而言,需要高昂的成本和巨大的功耗;而这对于海量的嵌入式设备和终端设备而言,计算量过大以及功耗巨大是非常大的瓶颈;所以业界的算法都在寻求更小更快的神经网络模型,而神经网络稀疏化是当前算法的一个重要优化方向和研究分支。
现有的实现技术稀疏神经网络计算在实现上比较复杂,难以把计算资源的利用率充分发挥出来,所以现有的稀疏神经网络的计算量大,功耗大。
发明内容
本申请实施例提供了一种稀疏神经网络的计算方法及计算装置、电子装置、计算机可读存储介质以及计算机程序产品,可以减少稀疏神经网络的计算量,从而具有降低功耗、节省计算时间的优点。
第一方面,本申请实施例提供一种稀疏神经网络的计算方法,所述方法包括如下步骤:
接收稀疏神经网络的计算指令,依据所述计算指令获取所述计算指令对应 的权值CO*CI*n*m;确定所述权值的核尺寸KERNEL SIZE,以所述核尺寸为基本粒度扫描所述权值得到QMASK权标识,所述权标识包括:CO*CI个值,如第k个基本粒度KERNEL k内所有的权重值均为0,对该KERNELK在权标识的对应位置权标识[K]标记为第一特定值,如KERNEL K内的权重值不都为0,对该KERNEL K在权标识的对应位置权标识[K]标记为第二特定值;k的取值范围为【1,CO*CI】;将权标识的第二特征值对应的KERNEL[n][m]存储,将权标识的第一特征值对应的KERNEL[n][m]删除;
扫描权标识的所有值,如该值等于第二特定值,提取该值对应的KERNEL以及该KERNEL对应的输入数据,将输入数据与KERNEL执行运算得到初始结果,如该值等于第一特征值,不读取该值对应的KERNEL以及输入数据;
将所有的初始结果进行运算处理得到所述计算指令的计算结果。
可选的,所述n以及m的取值范围为大于等1的整数。
可选的,所述将权标识的第二特征值对应的KERNEL[n][m]存储,包括:
扫描核标识WMASK获取核标识位置对应的值,存储QASM=1且核标识=1位置对应的KERNEL值。。
可选的,如所述n=3,所述将输入数据与KERNEL执行运算得到初始结果,包括:
扫描KERNEL[3][3]对应的核标识的所有值,所述核标识包括9个比特,所述9个比特与KERNEL[3][3]的9个元素对应,如核标识的位置x2的值等于0,不读取该x2对应的KERNEL[3][3]的元素值,如核标识的位置x1的值等于1,确定该值对应的位置x1,读取KERNEL[3][3]的x1位置的元素值KERNEL[3][3]x1以及x1位置对应的输入数据x1,将KERNEL[3][3]x1与输入数据x1执行乘积运算得到乘积结果;所述x1的取值范围为【1,9】;
将所述核标识的值为1的所有乘积结果累加起来得到所述初始结果。
第二方面,提供一种稀疏神经网络的计算装置,所述装置包括:
收发接口,用于接收稀疏神经网络的计算指令;
获取电路,用于依据所述计算指令从存储器内提取所述计算指令对应的权值CO*CI*n*m;
编译电路,用于确定所述权值的核尺寸KERNEL SIZE,以所述核尺寸为基本粒度扫描所述权值得到权标识,所述权标识包括:CO*CI个值,如第k个 基本粒度KERNEL k内所有的权重值均为0,对该KERNELK在权标识的对应位置权标识[K]标记为第一特定值,如KERNEL K内的权重值不都为0,对该KERNEL K在权标识的对应位置权标识[K]标记为第二特定值;k的取值范围为【1,CO*CI】;将权标识的第二特征值对应的KERNEL[n][m]存储,将权标识的第一特征值对应的KERNEL[n][m]删除;
计算电路,用于扫描权标识的所有值,如该值等于第二特定值,提取该值对应的KERNEL以及该KERNEL对应的输入数据,将输入数据与KERNEL执行运算得到初始结果,如该值等于第一特征值,不读取该值对应的KERNEL以及输入数据;将所有的初始结果进行运算处理得到所述计算指令的计算结果。
可选的,所述n以及m的取值范围为大于等1的整数。
可选的,所述第一特定值为0,所述第二特定值为1;
或所述第一特定值为1,所述第二特定值为0。
可选的,如所述n=3,所述计算电路具体用于扫描KERNEL[3][3]对应的核标识的所有值,所述核标识包括9个比特,所述9个比特与KERNEL[3][3]的9个元素对应,如核标识的位置x2的值等于0,不读取该x2对应的KERNEL[3][3]的元素值,如核标识的位置x1的值等于1,确定该值对应的位置x1,读取KERNEL[3][3]的x1位置的元素值KERNEL[3][3]x1以及x1位置对应的输入数据x1,将KERNEL[3][3]x1与输入数据x1执行乘积运算得到乘积结果;所述x1的取值范围为【1,9】;将所述核标识的值为1的所有乘积结果累加起来得到所述初始结果。
第三方面,提供一种电子装置,所述电子装置包括第二方面提供的稀疏神经网络的计算装置。
第四方面,提供一种计算机可读存储介质,其存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如第一方面提供的方法。
第五方面,提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行第一方面提供的方法。
实施本申请实施例,具有如下有益效果:
可以看出,本申请为了压缩权值参数,增加了权标识和核标识。由于稀疏化的网络模型,有比较多的权值元素值为0,因此节省的权值参数空间远远大 于增加的权标识和核标识信息。另外,压缩后的参数:有效节省存储空,节省DDR存储器的带宽。如图3所示的实施例提供的技术方案当权标识为零时,其不提取对应的输入数据,这样节省了计算器与存储器之间的数据传输的开销,并且去掉了对应的运算,减少了运算量,降低了功耗,节省了成本。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是一种电子装置的结构示意图。
图2是一种稀疏神经网络的数据运算的示意图。
图3是本发明提供的一种稀疏神经网络的计算方法的流程示意图。
图3a是一种权标识示意图。
图3b是一种KERNEL[3][3]示意图。
图3c是另一种KERNEL[3][3]示意图。
图3d是一种核标识示意图。
图4是本申请实施例提供的一种芯片的结构示意图。
图5是本申请实施例提供的一种稀疏神经网络的计算装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请中的电子装置可以包括:服务器、智能摄像设备、智能手机(如Android手机、iOS手机、Windows Phone手机等)、平板电脑、掌上电脑、笔记本电脑、移动互联网设备(MID,Mobile Internet Devices)或穿戴式设备等,上述电子装置仅是举例,而非穷举,包含但不限于上述电子装置,为了描述的方便,下面实施例中将上述电子装置称为用户设备(User equipment,UE)、终 端或电子设备。当然在实际应用中,上述用户设备也不限于上述变现形式,例如还可以包括:智能车载终端、计算机设备等等。
对于上述电子装置,其结构如图1所示,具体的,其可以包括:处理器101、存储器102、神经网络芯片103,其中处理器101与存储器102以及神经网络芯片103之间连接,具体的在一种可选的技术方案中,上述神经网络芯片103可以集成在处理器101内。存储器102可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)等。本发明的技术方案并不局限上述神经网络芯片103是单独设置还是集成在处理器101内。
参阅图2,图2提供了一种稀疏神经网络的数据运算的示意图,如图2所示,每个神经网络模型,其中权重值(WEIGHTS)的量,上述权重值也可以简称为权值,该权值基本决定了神经网络模型计算的复杂度;而稀疏化的优化,是在不改变神经网络模型结构的前提下,尽可能的把权重值中的更多元素优化成0,从而达到大大降低神经网络模型计算复杂度的目的,神经网络计算的输入有两路,一路是权重值(如图2中的Filter),另一路是输入数据Input Image(CI),输出是输出数据Output Image(CO)。
对于神经网络模型,其可以包含多层的计算,每层计算均可能包括例如,矩阵乘以矩阵、卷积运算等等复杂的运算,对于稀疏化后的神经网络模型也可以称为稀疏神经网络模型或稀疏神经网络,稀疏神经网络相对于神经网络,其具有权值中数值为0的元素的个数比较多的特点,由于权值中数值为0的个数比较多,计算量比较小,所以称为稀疏神经网络。如图2所示,为一种稀疏神经网络权值的表现形式。
下面来介绍一下对应的神经网络计算的方案,神经网络的计算方法分为多层运算,每层运算即为该层的输入数据与权值之间的运算,如图2所示,即为输入数据与权值之间的运算,该运算包括但不限于:卷积运算、矩阵乘矩阵运算等等方式。如图2所示的示意图可以为神经网络某层的一个卷积运算,具体的,
Filters代表神经网络模型中的权重值;
Input Image是本申请中的CI;
Output Image是本文中的CO;
每个CO都由所有CI和对应的权重值相乘,累加得到。
权重值的个数为CI NUM*CO NUM,每个权值为一个二维矩阵数据结构。
对于如图2所示的计算方案,虽然稀疏化能够减少一定的数据计算量,但是其处理方式中并未对稀疏计算进行优化,所以其计算量相对于神经网络来说,其并未减少很多,而神经网络芯片的功耗是于神经网络模型的计算量直接相关的,所以上述计算方式也无法降低芯片的功耗。
参阅图3,图3提供了一种稀疏神经网络的计算方法,该方法由处理器或神经网络处理芯片来执行,该方法如图3所示,包括如下步骤:
步骤S301、接收稀疏神经网络计算指令,依据该稀疏计算指令提取该计算指令对应权值CO*CI*n*m。
步骤S302、确定该权值的核尺寸KERNEL SIZE[n][m];以核尺寸为基本粒度扫描权值得到权值标识,该权标识包括:CO*CI个值,如KERNEL k内所有的权重值(即元素值)均为0,对该KERNEL K在权标识的对应位置权标识QMASK[K]标记为第一特定值(例如0),如KERNEL K内的权重值(即元素值)不都为0,对该KERNEL K在权标识的对应位置权标识[K]标记为第二特定值(例如1)。
如图3a所示,图3a为一个权标识的示意图,如图3a,其中,CI NUM=16,CO NUM=32。具体的,如k=1,如图3a所示,其值为1,表示KERNEL[3][3]的权重值中具有至少一个非零权重值。
可选的,上述n=1、3或5。以n=3为例,如KERNEL[3][3]如图3b所示,即其具有4个非零权重值。
如权标识[1]=1,生成核标识(WMASK)[1],该核标识[1]包括n*m个比特,每个比特表示KERNEL[3][3]中一个对应的元素值是否为零。如图3b所示的KERNEL[3][3],其对应的核标识[1]如图3d所示。
如图3a所示,如k=2,其值为0,即KERNEL[3][3]的所有权重值为零,如图3c所示,所有的权重值均为零。
步骤S303、将权标识的第二特征值对应的KERNEL[3][3]存储,将权标识的第一特征值对应的KERNEL[3][3]删除。
其实现的具体方式可以为:当权标识对应第二特征值的KERNEL[n][m]存 储时,并不是存储完整的KERNEL[n][m],而是要结合核标识,只存储QASM=1&核标识=1位置对应的KERNEL值。
对于权标识是粗粒度的标识,表示KERNEL[n][n]全部为0的情况,核标识是细粒度的标识,表示KERNEL[n][n]内部具体哪个元素是0,哪个元素是非零,这样,权标识和核标识结合起来就能够将权值中0全部表示出来,这样就能够指示控制装置跳过权值中的0值的计算,从而降低功耗和计算量。
权标识和核标识都是离线处理的,离线扫描得到权标识和核标识,并根据他们对权值进行压缩(即对0值元素删除,只存储非零元素,并通过权标识和核标识结合起来指示该非零元素的位置)。
步骤S304、获取权标识[K]的值,如权标识[K]=第二特征值,提取该权标识[K]对应的KERNEL K以及该KERNEL K对应的输入数据CI,如权标识[K]=第一特征值,不提取输入数据CI。
步骤S305、将KERNEL K与输入数据CI执行运算得到初始结果。
上述步骤S305的实现方法具体可以包括:
读取KERNEL K对应的核标识[k]的n*m个比特值,遍历核标识[k]的所有比特值将比特值不为零的权重值与对应的输入数据CI执行运算得到至少一个前置结果,具体的,如该比特位的值为零,不执行该比特位的运算,如该比特位为非零,读取该比特位在KERNEL K对应的权重值,将该权重值与对应的输入数据[K]执行运算得到前置结果。将至少一个前置结果组合起来得到初始结果。
步骤S306、遍历权标识将所有第二特征值对应的KERNEL[3][3]与对应的输入数据计算得到多个初始结果。
步骤S307、将所有到的初始结果进行运算处理得到该计算指令的计算结果。
如图3所示的实施例为了压缩权值参数,增加了权标识和核标识,对于稀疏化的网络模型,有比较多的权值元素值为0,节省的权值参数空间远远大于增加的权标识和核标识信息,压缩后的参数:有效节省存储空,节省DDR的带宽。如图3所示的实施例提供的技术方案当权标识为零时,其不提取对应的输入数据,这样节省了计算器与存储器之间的数据传输的开销,并且去掉了对应的运算,减少了运算量,如图3所示的技术方案的输入是权标识、核标识、 权值(压缩后),根据压缩算法进行解码计算,解码过程中可以直接跳过权值中的0,节省功耗和带宽,提高性能,降低了功耗,节省了成本。
参阅图4,图4为一种神经网络处理芯片的结构示意图,如图4所示,该芯片包括:存储器DDR,数据传输电路IDMA,参数传输电路WDMA,计算处理电路PE。其中,
数据传输电路是神经网络处理器内部的数据传输电路(主要传输输入数据);
参数传输电路是神经网络处理器内部的参数传输电路(主要传输权值数据以及权标识);
数据传输电路根据权标识信息来控制CI数据从存储器到计算处理电路的搬运,具体的,
权标识的某位置的值等于0,表示这个值对应的CI->C0的KERNEL n*m为全0,那么算法上,无论CI的值为多少,这个CI的计算结果CO都是恒等于0。
数据传输电路获取到权标识的某个位置等于0,就直接跳过某个位置直接到权标识下一个位置,如下一个位置的值为1,则将该下一个位置对应的非0的CI位置搬运至计算处理电路,这样省去了不必要的数据搬运和内部存储,节省了芯片功耗和存储空间。直接跳到下一个权标识非0的位置搬运,配合计算处理电路的计算,保证数据供给及时,提高了计算速度。
参数传输电路负责把压缩后的权重值和核标识从存储器搬运到计算处理电路内部。
权重值已经删除了所有0,参数传输电路的搬运量和功耗已经得到最大优化,标识送进计算处理电路,用于告诉计算处理电路如何执行跳0计算,提升计算效率。
计算处理
计算处理电路是神经网络处理器内部的计算处理电路
计算处理电路完成CI和权值的乘积和累加计算:
通用的方法不管权值是否为0,完成所有CI和权值的乘积,然后累加得到结果。
但是对于权值中值为0的元素,乘积也为0,对于累加的结果没有任何影 响,如果可以直接跳过去,可以大大加速计算效率,减少计算量以及功耗。
由于增加了权标识和核标识来标识权值0的位置和分布,计算处理电路根据权标识和核标识来标识的零的位置信息即可以直接跳过权值中的值为0的元素执行的计算。具体的,
步骤a、计算处理电路扫描权标识[1],如权标识[1]=0,确定权标识[1]对应的KERNEL[1]为全零,跳过权标识[1];
步骤b、扫描权标识[1+1],如权标识[1+1]=1,解析权标识[1+1]对应的核标识[1+1];
步骤C、计算处理电路解析出核标识[1+1]一个1的位置x1,然后读取BUF中的CI[1+1] x1数据,从KERNEL[1+1]的x1对应的值提取得到KERNEL[1+1] x1,将KERNEL[1+1] x1与CI[1+1] x1数据进行乘积运算得到乘积结果。
上述CI[1+1] x1数据可以依据运算的原理来得出,例如,如果是卷积运算,则根据卷积运算的原理确定该CI[1+1] x1数据在CI数据内的位置以及CI[1+1] x1的具体值。
步骤d、计算处理电路重复步骤C直至核标识[1+1]的值全部解析完毕。
步骤e、计算处理电路扫描权标识[1+1]的后续值,如后续值权标识[k]=1,解析权标识[k]对应的核标识[k];
步骤f、计算处理电路解析出核标识[k]一个1的位置x1,然后读取BUF中的CI[k] x1数据,从KERNEL[k]的x1对应的值提取的得到KERNEL[k] x1,将KERNEL[k] x1与CI[k] x1数据进行乘积运算得到乘积结果。
步骤g、计算处理电路重复步骤f直至核标识[k]的值全部解析完毕。
步骤h、计算处理电路遍历核标识所有值,如值为零,执行步骤a,如值为1,执行步骤e、f、步骤g。
步骤I、计算处理电路将所有的乘积结果运算得到计算结果,该乘积结果运算包括但不限于:激活运算、排序运算、累加运算、转换运算等等。
从上面的计算原理来看,本申请的计算处理电路,通过解析两层数据,即权标识以及标识,计算处理电路即可以根据该两层数据的值很简单的跳过权值元素值为0的计算,然后与压缩后的权值配合;可以高效的完成模型计算,由于如图4所示的芯片的结构可以直接跳转所有为零的计算,并且对于为零的数值存储器也不会存储,
参阅图5,图5提供了提供一种稀疏神经网络的计算装置,所述装置包括:
收发接口501,用于接收稀疏神经网络的计算指令;
获取电路502,用于依据所述计算指令从存储器505内提取所述计算指令对应的权值CO*CI*n*m;
编译电路503,用于确定所述权值的核尺寸KERNEL SIZE,以所述核尺寸为基本粒度扫描所述权值得到权标识,所述权标识包括:CO*CI个值,如第k个基本粒度KERNEL k内所有的权重值均为0,对该KERNEL K在权标识的对应位置权标识[K]标记为第一特定值,如KERNEL K内的权重值不都为0,对该KERNEL K在权标识的对应位置权标识[K]标记为第二特定值;k的取值范围为【1,CO*CI】;将权标识的第二特征值对应的KERNEL[n][m]存储,将权标识的第一特征值对应的KERNEL[n][m]删除;
计算电路504,用于扫描权标识的所有值,如该值等于第二特定值,提取该值对应的KERNEL以及该KERNEL对应的输入数据,将输入数据与KERNEL执行运算得到初始结果,如该值等于第一特征值,不读取该值对应的KERNEL以及输入数据;将所有的初始结果进行运算处理得到所述计算指令的计算结果。
可选的,所述n等于1、3、5中的任意一个值。
可选的,所述第一特定值为0,所述第二特定值为1;
或所述第一特定值为1,所述第二特定值为0。
可选的,如所述n=3,计算电路504具体用于扫描KERNEL[3][3]对应的核标识的所有值,所述核标识包括9个比特,所述9个比特与KERNEL[3][3]的9个元素对应,如核标识的位置x2的值等于0,不读取该x2对应的KERNEL[3][3]的元素值,如核标识的位置x1的值等于1,确定该值对应的位置x1,读取KERNEL[3][3]的x1位置的元素值KERNEL[3][3] x1以及x1位置对应的输入数据x1,将KERNEL[3][3] x1与输入数据x1执行乘积运算得到乘积结果;所述x1的取值范围为【1,9】;将所述核标识的值为1的所有乘积结果累加起来得到所述初始结果。
本申请实施例还提供一种电子装置,其中,该电子装置包括上述稀疏神经网络的计算装置。
本申请实施例还提供一种计算机可读存储介质,其中,该计算机可读存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上 述方法实施例中记载的任何一种稀疏神经网络的计算方法的部分或全部步骤。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种稀疏神经网络的计算方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部 分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (11)

  1. 一种稀疏神经网络的计算方法,其特征在于,所述方法包括如下步骤:
    接收稀疏神经网络的计算指令,依据所述计算指令获取所述计算指令对应的权值CO*CI*n*m;
    确定所述权值的核尺寸KERNEL SIZE,以所述核尺寸为基本粒度扫描所述权值得到权标识,所述权标识包括:CO*CI个值,如第k个基本粒度KERNEL k内所有的权重值均为0,对该KERNEL K在权标识的对应位置权标识[K]标记为第一特定值,如KERNEL K内的权重值不都为0,对该KERNEL K在权标识的对应位置权标识[K]标记为第二特定值;k的取值范围为【1,CO*CI】;
    将权标识的第二特征值对应的KERNEL[n][m]存储,将权标识的第一特征值对应的KERNEL[n][m]删除;
    扫描权标识的所有值,如该值等于第二特定值,提取该值对应的KERNEL以及该KERNEL对应的输入数据,将输入数据与KERNEL执行运算得到初始结果,如该值等于第一特征值,不读取该值对应的KERNEL以及输入数据;
    将所有的初始结果进行运算处理得到所述计算指令的计算结果。
  2. 根据权利要求1所述的方法,其特征在于,
    所述n以及m的取值范围为大于等1的整数。
  3. 根据权利要求1所述的方法,其特征在于,所述将权标识的第二特征值对应的KERNEL[n][m]存储,包括:
    扫描核标识获取核标识位置对应的值,存储权标识=1且核标识=1位置对应的KERNEL值。
  4. 根据权利要求2所述的方法,其特征在于,如所述n=3,m=3,所述将输入数据与KERNEL执行运算得到初始结果,包括:
    扫描KERNEL[3][3]对应的核标识的所有值,所述核标识包括9个比特,所述9个比特与KERNEL[3][3]的9个元素对应,如核标识的位置x2的值等于0,不读取该x2对应的KERNEL[3][3]的元素值,如核标识的位置x1的值等于1,确定该值对应的位置x1,读取KERNEL[3][3]的x1位置的元素值KERNEL[3][3] x1以及x1位置对应的输入数据x1,将KERNEL[3][3] x1与输入数据x1执行乘积运算得到乘积结果;所述x1的取值范围为【1,9】;
    将所述核标识的值为1的所有乘积结果累加起来得到所述初始结果。
  5. 一种稀疏神经网络的计算装置,其特征在于,所述装置包括:
    收发接口,用于接收稀疏神经网络的计算指令;
    获取电路,用于依据所述计算指令从存储器内提取所述计算指令对应的权值CO*CI*n*m;
    编译电路,用于确定所述权值的核尺寸KERNEL SIZE,以所述核尺寸为基本粒度扫描所述权值得到权标识,所述权标识包括:CO*CI个值,如第k个基本粒度KERNEL k内所有的权重值均为0,对该KERNEL K在权标识的对应位置权标识[K]标记为第一特定值,如KERNEL K内的权重值不都为0,对该KERNEL K在权标识的对应位置权标识[K]标记为第二特定值;k的取值范围为【1,CO*CI】;将权标识的第二特征值对应的KERNEL[n][m]存储,将权标识的第一特征值对应的KERNEL[n][m]删除;
    计算电路,用于扫描权标识的所有值,如该值等于第二特定值,提取该值对应的KERNEL以及该KERNEL对应的输入数据,将输入数据与KERNEL执行运算得到初始结果,如该值等于第一特征值,不读取该值对应的KERNEL以及输入数据;将所有的初始结果进行运算处理得到所述计算指令的计算结果。
  6. 根据权利要求5所述的计算装置,其特征在于,
    所述n以及m的取值范围为大于等1的整数。
  7. 根据权利要求5所述的计算装置,其特征在于,
    所述第一特定值为0,所述第二特定值为1;
    或所述第一特定值为1,所述第二特定值为0。
  8. 根据权利要求6所述的计算装置,其特征在于,如所述n=3,m=3,
    所述计算电路具体用于扫描KERNEL[3][3]对应的核标识的所有值,所述核标识包括9个比特,所述9个比特与KERNEL[3][3]的9个元素对应,如核标识的位置x2的值等于0,不读取该x2对应的KERNEL[3][3]的元素值,如核标识的位置x1的值等于1,确定该值对应的位置x1,读取KERNEL[3][3]的x1位置的元素值KERNEL[3][3] x1以及x1位置对应的输入数据x1,将KERNEL[3][3] x1与输入数据x1执行乘积运算得到乘积结果;所述x1的取值范围为【1,9】;将所述核标识的值为1的所有乘积结果累加起来得到所述初始结果。
  9. 一种电子装置,其特征在于,所述电子装置包括如权利要求5-8任意一项所述的稀疏神经网络的计算装置。
  10. 一种计算机可读存储介质,其特征在于,其存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-4任一项所述的方法。
  11. 一种计算机程序产品,其特征在于,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如权利要求1-4任一项所述的方法。
PCT/CN2018/079373 2017-12-29 2018-03-16 一种稀疏神经网络的计算方法及计算装置、电子装置、计算机可读存储介质以及计算机程序产品 WO2019127926A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/627,293 US20200242467A1 (en) 2017-12-29 2018-03-16 Calculation method and calculation device for sparse neural network, electronic device, computer readable storage medium, and computer program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711480629.0 2017-12-29
CN201711480629.0A CN109993286B (zh) 2017-12-29 2017-12-29 稀疏神经网络的计算方法及相关产品

Publications (1)

Publication Number Publication Date
WO2019127926A1 true WO2019127926A1 (zh) 2019-07-04

Family

ID=67065011

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079373 WO2019127926A1 (zh) 2017-12-29 2018-03-16 一种稀疏神经网络的计算方法及计算装置、电子装置、计算机可读存储介质以及计算机程序产品

Country Status (3)

Country Link
US (1) US20200242467A1 (zh)
CN (1) CN109993286B (zh)
WO (1) WO2019127926A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490315B (zh) * 2019-08-14 2023-05-23 中科寒武纪科技股份有限公司 神经网络的反向运算稀疏方法及相关产品
WO2021064292A1 (en) * 2019-10-02 2021-04-08 Nokia Technologies Oy High-level syntax for priority signaling in neural network compression

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170068887A1 (en) * 2014-09-26 2017-03-09 Samsung Electronics Co., Ltd. Apparatus for classifying data using boost pooling neural network, and neural network training method therefor
CN107169560A (zh) * 2017-04-19 2017-09-15 清华大学 一种自适应可重构的深度卷积神经网络计算方法和装置
CN107239824A (zh) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 用于实现稀疏卷积神经网络加速器的装置和方法
CN107341544A (zh) * 2017-06-30 2017-11-10 清华大学 一种基于可分割阵列的可重构加速器及其实现方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239823A (zh) * 2016-08-12 2017-10-10 北京深鉴科技有限公司 一种用于实现稀疏神经网络的装置和方法
CN107153873B (zh) * 2017-05-08 2018-06-01 中国科学院计算技术研究所 一种二值卷积神经网络处理器及其使用方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170068887A1 (en) * 2014-09-26 2017-03-09 Samsung Electronics Co., Ltd. Apparatus for classifying data using boost pooling neural network, and neural network training method therefor
CN107239824A (zh) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 用于实现稀疏卷积神经网络加速器的装置和方法
CN107169560A (zh) * 2017-04-19 2017-09-15 清华大学 一种自适应可重构的深度卷积神经网络计算方法和装置
CN107341544A (zh) * 2017-06-30 2017-11-10 清华大学 一种基于可分割阵列的可重构加速器及其实现方法

Also Published As

Publication number Publication date
US20200242467A1 (en) 2020-07-30
CN109993286A (zh) 2019-07-09
CN109993286B (zh) 2021-05-11

Similar Documents

Publication Publication Date Title
CN110147251B (zh) 用于计算神经网络模型的系统、芯片及计算方法
CN110537194B (zh) 被配置用于层和操作防护和依赖性管理的功率高效的深度神经网络处理器及方法
EP4156017A1 (en) Action recognition method and apparatus, and device and storage medium
WO2020073211A1 (zh) 运算加速器、处理方法及相关设备
CN110830807B (zh) 图像压缩方法、装置及存储介质
CN113296718B (zh) 数据处理方法以及装置
CN109857744B (zh) 稀疏张量计算方法、装置、设备及存储介质
US20200380261A1 (en) Resource optimization based on video frame analysis
CN109598250B (zh) 特征提取方法、装置、电子设备和计算机可读介质
US20190325199A1 (en) Information processing method, device, system and storage medium
WO2023174098A1 (zh) 一种实时手势检测方法及装置
CN109726822B (zh) 运算方法、装置及相关产品
CN112861575A (zh) 一种行人结构化方法、装置、设备和存储介质
WO2019128735A1 (zh) 图像处理方法及装置
WO2019127926A1 (zh) 一种稀疏神经网络的计算方法及计算装置、电子装置、计算机可读存储介质以及计算机程序产品
Shahshahani et al. Memory optimization techniques for fpga based cnn implementations
CN112771546A (zh) 运算加速器和压缩方法
CN112200310B (zh) 智能处理器、数据处理方法及存储介质
CN113743277A (zh) 一种短视频分类方法及系统、设备和存储介质
CN114640669A (zh) 边缘计算方法及装置
US9232222B2 (en) Lossless color image compression adaptively using spatial prediction or inter-component prediction
CN109711538B (zh) 运算方法、装置及相关产品
CN111542837B (zh) 三维卷积神经网络计算装置及相关产品
WO2023124428A1 (zh) 芯片、加速卡以及电子设备、数据处理方法
CN103891272B (zh) 用于视频分析和编码的多个流处理

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18896496

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08.10.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18896496

Country of ref document: EP

Kind code of ref document: A1