WO2021077283A1 - 神经网络计算压缩方法、系统及存储介质 - Google Patents

神经网络计算压缩方法、系统及存储介质 Download PDF

Info

Publication number
WO2021077283A1
WO2021077283A1 PCT/CN2019/112465 CN2019112465W WO2021077283A1 WO 2021077283 A1 WO2021077283 A1 WO 2021077283A1 CN 2019112465 W CN2019112465 W CN 2019112465W WO 2021077283 A1 WO2021077283 A1 WO 2021077283A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
calculation
compression
data
floating
Prior art date
Application number
PCT/CN2019/112465
Other languages
English (en)
French (fr)
Inventor
熊超
牛昕宇
蔡权雄
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to PCT/CN2019/112465 priority Critical patent/WO2021077283A1/zh
Priority to CN201980100181.6A priority patent/CN114365147A/zh
Publication of WO2021077283A1 publication Critical patent/WO2021077283A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the embodiments of the present application relate to the field of neural networks, for example, to a neural network calculation and compression method, system, and storage medium.
  • Neural network is a kind of computing model, which is composed of a large number of nodes (or neurons) and interconnections between nodes. Neural network technology is also called deep learning technology. The development trend of deep learning technology is that the number of network layers is getting deeper and deeper, and the size of each layer is getting larger and larger. Correspondingly, deep learning technology has higher and higher requirements for computing power. However, the speed of computing power of dedicated chips is far from reaching the requirements of deep learning algorithms. In order to solve this problem, the computational compression of neural networks has become an important direction to solve the acceleration problem, and has received extensive attention from academia and industry.
  • the compression of the network structure is divided into two aspects. One is to compress the number of network layers, that is, to reduce the number of network layers; the other is to compress the size of each layer of the network, that is, to compress the number of neurons.
  • the compression network layer usually adopts the distillation method. The main idea of the distillation method is to transfer the data of the large network to a predefined small network structure.
  • the pruning method is usually used to compress the size of each layer. The main idea of the pruning method is to evaluate the importance of each neuron connection according to a standard, and only keep the connections with higher importance.
  • the embodiments of the present application provide a neural network calculation compression method, system, and storage medium, so as to realize data compression for neural network calculations, improve the calculation efficiency of the neural network, and reduce the deployment time of the neural network.
  • the embodiment of the present application provides a neural network calculation and compression method, including:
  • the embodiment of the application provides a neural network computing compression system, including:
  • the statistics acquisition module is set to acquire the statistics of each layer of floating-point numbers in the neural network
  • a compression parameter acquisition module configured to calculate a compression parameter for converting floating-point number data of each layer of the neural network into fixed-point number data according to the statistics
  • the compression calculation module is configured to perform compression calculation of fixed-point number data on the neural network according to the compression parameters.
  • the embodiment of the present application provides a computer-readable storage medium storing a computer program, and when the program is executed by a processor, the neural network calculation compression method as provided in any embodiment of the present application is implemented.
  • FIG. 1 is a schematic flowchart of a neural network calculation and compression method provided in Embodiment 1 of this application;
  • FIG. 2 is a schematic flowchart of another neural network calculation and compression method provided in Embodiment 2 of the application;
  • FIG. 3 is a schematic flowchart of another neural network calculation and compression method provided in Embodiment 3 of this application;
  • FIG. 4 is a schematic structural diagram of a neural network calculation and compression system provided in the fourth embodiment of the application.
  • Some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes multiple steps as sequential processing, many steps in this document can be implemented in parallel, concurrently, or simultaneously. In addition, the order of multiple steps can be rearranged. The processing may be terminated when the multiple step operations are completed, but there may also be additional steps not included in the drawing. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and so on.
  • first”, “second”, etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish a first direction, action, step or element from another direction, action, step or element.
  • the first calculation graph may be referred to as the second calculation graph, and similarly, the second calculation graph may be referred to as the first calculation graph. Both the first calculation graph and the second calculation graph are calculation graphs, but the first calculation graph and the second calculation graph are not the same calculation graph.
  • the terms “first”, “second”, etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features.
  • “plurality” means at least two, such as two, three, etc., unless otherwise defined.
  • FIG. 1 is a schematic flowchart of a neural network calculation and compression method provided in Embodiment 1 of the application, which can be applied to compress and calculate data in the calculation process of the neural network.
  • the method can be executed by a neural network computing compression system, which can be implemented in software and/or hardware, and can be integrated on a hardware device, such as a chip, a board, and the like.
  • a neural network calculation compression method provided by Embodiment 1 of the present application includes:
  • Neural network is a kind of neural network that simulates the human brain to realize artificial intelligence-like machine learning technology. Neural network data is usually expressed in the form of 32-bit floating point numbers.
  • Statistics are the data statistics of each layer of floating-point numbers in the neural network.
  • the data is first classified according to the numerical range, and then the amount of each type of data is counted, which can be represented by a histogram.
  • There are many ways to classify data which can be classified according to the numerical range of the data, or can be classified according to the content of the data, which is not limited in the embodiments of the present application.
  • a certain layer of neural network contains floating-point numbers: 1x2 7 , 1x2 7 , 1x2 8 , 1x2 9 , 1x2 9 , 1x2 7 , 1x2 6 , and s represents a type of data and is a statistical histogram On the horizontal axis, y represents the amount of data of one type and is the vertical axis in the statistical histogram.
  • the statistics of this layer of floating-point numbers in the neural network can be expressed as: the first type of data: 1, The second type of data: 3, the third type of data: 1, the fourth type of data: 2.
  • S120 Calculate a compression parameter for converting each layer of floating-point number data into fixed-point number data in the neural network according to the statistics.
  • each layer of floating-point number data is converted into fixed-point number data according to statistics
  • the compression parameter can be regarded as a conversion parameter or conversion rule between each layer of floating-point number data and fixed-point number data. It can be that all types of floating-point numbers are converted to the same low ratio specific point number, or different types of floating-point numbers can be converted to corresponding fixed-point numbers according to different conversion rules.
  • the embodiment of the application does not limit the conversion parameters. You can choose according to the actual situation.
  • the first type of 1x2 6 ⁇ s ⁇ 1x2 7 data is converted into an 8-bit binary number containing 4 decimal places
  • the second type of 1x2 7 ⁇ s ⁇ 1x2 8 data is converted into an 8-bit binary number containing 3 decimal places.
  • Number, the third type 1x2 8 ⁇ s ⁇ 1x2 9 data and the fourth type s ⁇ 1x2 9 data are converted into 8-bit binary numbers containing 2 decimal places.
  • S130 Perform compression calculation of fixed-point number data on the neural network according to the compression parameter.
  • the compression parameters are imported into the calculation graph of the neural network to be calculated, and the floating-point numbers in the neural network can be converted into corresponding low-bit fixed-point numbers for compression calculation.
  • the neural network calculation and compression method obtaineds the statistics of each layer of floating-point numbers in the neural network; and calculates the compression parameters of the floating-point number data of each layer of the neural network into fixed-point number data according to the statistics; Perform compression calculation of fixed-point data on the neural network according to the compression parameter. Realize the conversion of neural network floating-point numbers into low-bit fixed-point numbers for calculations, reduce the calculation amount and required calculation space of the neural network, and improve the calculation efficiency of the neural network; and only compress the data of the neural network. Involving the modification of the neural network structure, when the neural network after obtaining the compression parameters is used for calculation, the neural network does not need to be retrained, which improves the deployment speed of the neural network.
  • FIG. 2 is a schematic flowchart of another neural network calculation and compression method provided in the second embodiment of the application. This embodiment is described on the basis of the above-mentioned embodiment.
  • a neural network calculation compression method provided in the second embodiment of the present application includes:
  • the calculation graph is an expression form of the neural network model structure when the neural network is performing operations, including multiple computing nodes and the connection relationship between the computing nodes.
  • a layer of the neural network can be regarded as a computing node .
  • Each computing node also includes node data during actual calculations, that is, data that needs to be compressed during neural network operations.
  • the node data is generally expressed in a 32-bit floating point number.
  • Constructing the first calculation graph of the neural network is the first calculation graph when constructing the neural network for calculation, including multiple first calculation nodes and the connection relationship between the first calculation nodes.
  • the statistics node is used to count the statistics of each layer of floating-point numbers in the neural network.
  • the second calculation graph of the neural network is formed.
  • a statistical node is inserted after each first computing node.
  • the second computing node included in the second computing graph is the same as the first computing node.
  • the connection relationship between the second computing nodes is the same as the connection relationship between the first computing nodes.
  • the first input data is data that needs to be input to the neural network when obtaining statistics, and the neural network performs an actual operation based on the first input data to obtain the statistics.
  • the first input data is input into the neural network
  • the second calculation node of the second calculation graph of the neural network obtains the corresponding second node data (the same as the first node data of the first calculation graph), and runs insert
  • the second calculation graph of the neural network after counting the nodes, through the actual calculation of the second calculation graph, obtain the statistics of each second node.
  • the second calculation graph only inserts statistical nodes on the basis of the first calculation graph, and does not compress the node data of the neural network. Therefore, the second calculation graph is still calculated in the form of floating-point numbers when performing actual calculations. .
  • the statistics node when the second calculation graph of the neural network is running, the statistics node will continuously update and record the statistics, so as to obtain the statistics corresponding to the second calculation node, and also obtain the statistics of the floating-point numbers of each layer of the neural network. the amount.
  • S260 Calculate the compression parameters for converting floating-point number data of each layer of the neural network into fixed-point number data according to the statistics.
  • each layer of floating-point number data is converted into fixed-point number data according to statistics
  • the compression parameter can be regarded as a conversion parameter or conversion rule between each layer of floating-point number data and fixed-point number data. It can be that all types of floating-point numbers are converted to the same low ratio specific point number, or different types of floating-point numbers can be converted to corresponding fixed-point numbers according to different conversion rules.
  • the embodiment of the application does not limit the conversion parameters. You can choose according to the actual situation.
  • the first type of 1x2 6 ⁇ s ⁇ 1x2 7 data is converted into an 8-bit binary number containing 4 decimal places
  • the second type of 1x2 7 ⁇ s ⁇ 1x2 8 data is converted into an 8-bit binary number containing 3 decimal places.
  • Number, the third type 1x2 8 ⁇ s ⁇ 1x2 9 data and the fourth type s ⁇ 1x2 9 data are converted into 8-bit binary numbers containing 2 decimal places.
  • S270 Perform compression calculation of fixed-point number data on the neural network according to the compression parameter.
  • the compression parameters are imported into the calculation graph of the neural network to be calculated, and the floating-point numbers in the neural network can be converted into corresponding low-bit fixed-point numbers for compression calculation.
  • the neural network calculation compression method provided in the second embodiment of the application forms a second calculation graph by inserting statistical nodes in the first calculation graph of the neural network, and then runs the second calculation graph according to the first input data, thereby obtaining each layer in the neural network
  • the statistics of floating-point numbers are used to calculate the compression parameters from floating-point numbers to fixed-point numbers in the neural network based on the statistics, and finally the neural network is actually compressed according to the compression parameters.
  • the method of obtaining statistics is simple and easy to operate.
  • the second embodiment of the application realizes the conversion of the floating-point numbers of the neural network into low-bit fixed-point numbers for calculation, reduces the calculation amount and required calculation space of the neural network, and improves the performance of the neural network. Computational efficiency.
  • FIG. 3 is a schematic flowchart of another neural network calculation and compression method provided in Embodiment 3 of the present application. This embodiment is described on the basis of the foregoing embodiment.
  • a neural network calculation compression method provided in the third embodiment of the present application includes:
  • the calculation graph is an expression form of the neural network model structure when the neural network is performing operations, including multiple computing nodes and the connection relationship between the computing nodes.
  • a layer of the neural network can be regarded as a computing node .
  • Each computing node also includes node data during actual calculations, that is, data that needs to be compressed during neural network operations.
  • the node data is generally expressed in a 32-bit floating point number.
  • Constructing the first calculation graph of the neural network is the first calculation graph when constructing the neural network to perform operations, including multiple first calculation nodes and the connection relationship between the first calculation nodes.
  • the statistics node is used to count the statistics of each layer of floating-point numbers in the neural network.
  • the second calculation graph of the neural network is formed.
  • a statistical node is inserted after each first computing node.
  • the second computing node included in the second computing graph is the same as the first computing node.
  • the connection relationship between the second computing nodes is the same as the connection relationship between the first computing nodes.
  • the first input data is data that needs to be input to the neural network when obtaining statistics, and the neural network performs an actual operation based on the first input data to obtain the statistics.
  • the first input data is input into the neural network
  • the second calculation node of the second calculation graph of the neural network obtains the corresponding second node data (the same as the first node data of the first calculation graph), and runs insert
  • the second calculation graph of the neural network after counting the nodes, through the actual calculation of the second calculation graph, obtain the statistics of each second node.
  • the second calculation graph only inserts statistical nodes on the basis of the first calculation graph, and does not compress the node data of the neural network. Therefore, the second calculation graph is still calculated in the form of floating-point numbers when performing actual calculations. .
  • the statistics node when the second calculation graph of the neural network is running, the statistics node will continuously update and record the statistics, so as to obtain the statistics corresponding to the second calculation node, and also obtain the statistics of the floating-point numbers of each layer of the neural network. the amount.
  • each layer of floating-point number data is converted into fixed-point number data according to statistics
  • the compression parameter can be regarded as a conversion parameter or conversion rule between each layer of floating-point number data and fixed-point number data. It can convert all types of floating-point data into the same low ratio specific point number, or different types of floating-point numbers can be converted into corresponding fixed-point numbers according to different conversion rules.
  • the embodiment of the application does not limit the conversion parameters. , You can choose according to the actual situation. For example, the first type of 1x2 6 ⁇ s ⁇ 1x2 7 data is converted into an 8-bit binary number containing 4 decimal places, and the second type of 1x2 7 ⁇ s ⁇ 1x2 8 data is converted into an 8-bit binary number containing 3 decimal places. Number, the third type 1x2 8 ⁇ s ⁇ 1x2 9 data and the fourth type s ⁇ 1x2 9 data are converted into 8-bit binary numbers containing 2 decimal places.
  • the second calculation graph after running the second calculation graph inserted into the statistical node according to the first input data, the second calculation graph is formed into a model file and exported, and the compression parameters are saved.
  • the model can be used directly
  • the file constructs the second calculation graph of the neural network, and the compression parameters can also be reused.
  • the model file of the second calculation graph is imported, and the third calculation graph is constructed according to the model file and the compression parameters.
  • the third calculation graph includes the third calculation node and the connection relationship between the third calculation node.
  • the data of the third calculation node is called the third node data.
  • a method for constructing a third calculation graph of a neural network includes steps S30810 to S30830 (not shown in the figure).
  • the second calculation graph of the neural network is constructed by the model file of the second calculation graph.
  • the second calculation graph includes a plurality of second calculation nodes and the connection relationship between the second calculation nodes, and each second calculation A statistical node is inserted after the node.
  • the statistical node in the second calculation graph is replaced with a compression node.
  • the compression node is used to complete the compression and decompression of the computing node.
  • the compression of the computing node refers to compressing node data from floating point numbers to fixed-point numbers for processing.
  • Compression calculation, the decompression of the computing node refers to the conversion of the output data of the computing node after the compression calculation from a fixed-point number back to a floating-point number expression.
  • the compression parameters are imported into the compression node to form a third calculation graph, which is a calculation graph when a neural network is used for compression calculation. Since only the statistical nodes in the second calculation graph are replaced without changing the number and connection relationship of each layer of the neural network, the third calculation node is the same as the second calculation node, and the connection relationship between the third calculation node is the same as that of the first calculation node. 2. The connection relationship between the computing nodes is the same.
  • the second input data is the data input when the user uses the neural network to perform calculations.
  • the second input data is input into the neural network, and the third calculation node of the third calculation graph of the neural network obtains the corresponding third node. Data to perform neural network compression calculations.
  • the second input data is input into the neural network
  • the calculation node of the third calculation graph obtains the corresponding third node data
  • the third node data is converted from a floating point number to a lower specific point number through the compression parameter of the compression node
  • the third calculation graph performs compression calculations by lowering specific points.
  • running the third calculation graph according to the second input data to perform compression calculation of fixed-point data includes steps S31010 to S31030 (not shown in the figure).
  • S31020 Compress the floating-point number data of the third calculation graph into fixed-point number data according to the compression parameter.
  • S31030 Perform compression calculation according to the fixed-point number data.
  • S3110 Decompress the fixed-point output data obtained after the compression calculation to obtain floating-point output data.
  • the neural network when the neural network performs actual calculations, calculations are performed layer by layer, that is, after the calculation of one layer of the neural network is completed, the output data obtained by the calculation of this layer is used as the input data for the calculation of the next layer of the neural network. Since the data during the compression calculation of the neural network is fixed-point data, the output data of the corresponding third computing node after the compression calculation of each third computing node is also fixed-point data. The third computing node is compressed and calculated. The fixed-point number output data of the third computing node is decompressed through the compression node, so as to obtain the floating-point number output data of the third computing node as the input data of the next third computing node.
  • the neural network calculation compression method provided in the third embodiment of the present application constructs a third calculation graph of the neural network for compression calculation by replacing the statistical nodes in the second calculation graph with compression nodes, and runs the third calculation graph through the second input data.
  • the compression parameters compress the node data, and the decompression of the output data after the neural network compression calculation is completed through the compression node. It only compresses the data of the neural network, and does not involve the modification of the neural network structure. When the neural network after obtaining the compression parameters is used for calculation, it does not need to be retrained, which improves the deployment speed of the neural network.
  • FIG. 4 is a schematic structural diagram of a neural network calculation and compression system provided in the fourth embodiment of the application, which may be suitable for data compression calculation during the operation of the neural network.
  • the system can be implemented in software and/or hardware, and can be integrated on hardware devices, such as chips, boards, and so on.
  • the neural network calculation compression system 400 includes a statistics acquisition module 410, a compression parameter acquisition module 420, and a compression calculation module 430.
  • the statistics acquisition module 410 is configured to acquire the statistics of each layer of floating-point numbers in the neural network;
  • the compression parameter acquisition module 420 is configured to calculate the compression parameters of the floating-point number data of each layer of the neural network into fixed-point data according to the statistics.
  • the compression calculation module 430 is configured to perform compression calculation of fixed-point number data on the neural network according to the compression parameters.
  • the neural network calculation compression system 400 further includes: a first calculation graph construction module, configured to construct a first calculation graph of the neural network before the acquisition of the statistics of the floating point numbers of each layer in the neural network;
  • the second calculation graph construction module is configured to insert a statistical node in the first calculation graph to form a second calculation graph of the neural network.
  • the statistics acquisition module 410 includes: a first input data acquisition unit, configured to acquire first input data of the neural network; a second calculation graph running unit, configured to operate the station based on the first input data The second calculation graph; a statistics acquisition unit configured to obtain statistics of each layer of floating-point numbers in the neural network according to the running process of the second calculation graph.
  • the neural network calculation and compression system 400 further includes: a model file export module configured to calculate the compression parameters of each layer of floating-point data in the neural network according to the statistics and convert it into fixed-point data, Export the model file of the second calculation graph.
  • a model file export module configured to calculate the compression parameters of each layer of floating-point data in the neural network according to the statistics and convert it into fixed-point data
  • the compression calculation module 430 includes: a third calculation graph construction unit configured to construct a third calculation graph according to the model file of the second calculation graph and the compression parameters; and a second input data acquisition unit configured to Acquire the second input data of the neural network; a third calculation graph running unit is configured to run the third calculation graph according to the second input data to perform compression calculation of fixed-point number data.
  • the third calculation graph construction unit includes: a second calculation graph construction subunit, configured to construct the second calculation graph according to the model file of the second calculation graph; and the compressed node replacement subunit is configured to The statistical nodes in the second calculation graph are replaced with compressed nodes; the third calculation graph construction subunit is configured to import the compression parameters into the compressed nodes to form the third calculation graph.
  • the third calculation graph running unit includes: a second input data importing subunit, configured to import the second input data into the third calculation graph; a data compression subunit, configured to be based on the compression parameter
  • the floating-point number data of the third calculation graph is compressed into fixed-point number data; the compression calculation subunit is configured to perform compression calculation according to the fixed-point number data.
  • the third calculation graph running unit further includes: an output data decompression subunit configured to decompress the fixed-point output data obtained after the compression calculation after the compression calculation is performed based on the fixed-point number data , Get the floating-point number output data.
  • the neural network calculation compression system provided in the fourth embodiment of the present application is configured to obtain the statistics of each layer of floating-point numbers in the neural network through the statistics acquisition module; the compression parameter acquisition module is configured to calculate the statistics in the neural network according to the statistics.
  • Each layer of floating-point number data is converted into compression parameters of fixed-point number data; a compression calculation module is configured to perform compression calculation of fixed-point number data on the neural network according to the compression parameters. Realize the conversion of neural network floating-point numbers into low-bit fixed-point numbers for calculations, reduce the calculation amount and required calculation space of the neural network, and improve the calculation efficiency of the neural network; and only compress the data of the neural network. Involving the modification of the neural network structure, when the neural network after obtaining the compression parameters is used for calculation, it does not need to be retrained, which improves the deployment speed of the neural network.
  • the fifth embodiment of the present application also provides a computer-readable storage medium that stores a computer program that, when executed by a processor, implements the neural network calculation compression method as provided in any embodiment of the present application, and the method may include: obtaining Statistics of floating-point numbers in each layer of the neural network; calculating compression parameters for converting floating-point number data of each layer of the neural network into fixed-point data according to the statistics; compressing fixed-point data on the neural network according to the compression parameters Calculation.
  • the computer storage medium of the fifth embodiment of the present application may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above.
  • Examples of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (Read- Only Memory, ROM), Erasable Programmable Read-Only Memory (EPROM or flash memory), optical fiber, CD-ROM, optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and the computer-readable signal medium carries computer-readable program code. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency (RF), etc., or any suitable combination of the foregoing.
  • suitable medium including but not limited to wireless, wire, optical cable, radio frequency (RF), etc., or any suitable combination of the foregoing.
  • the computer program code used to perform the operations of this application can be written in one or more programming languages or a combination thereof.
  • the programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or terminal.
  • the remote computer can be connected to the user's computer through any kind of network-including Local Area Network (LAN) or Wide Area Network (WAN)-or it can be connected to an external computer ( For example, use an Internet service provider to connect via the Internet).
  • LAN Local Area Network
  • WAN Wide Area Network

Abstract

一种神经网络计算压缩方法、系统及存储介质,所述方法包括:获取神经网络中每层浮点数的统计量(S110);根据所述统计量计算所述神经网络中每层浮点数数据转化到定点数数据的压缩参数(S120);根据所述压缩参数对所述神经网络进行定点数数据的压缩计算(S130)。

Description

神经网络计算压缩方法、系统及存储介质 技术领域
本申请实施例涉及神经网络领域,例如涉及一种神经网络计算压缩方法、系统及存储介质。
背景技术
神经网络是一种运算模型,由大量的节点(或称神经元)和节点之间的相互联接构成。神经网络技术也叫做深度学习技术,深度学习技术的发展趋势在于网络层数越来越深,每层的尺寸越来越大。相对应的,深度学习技术对于算力的要求也越来越高,然而专用芯片的算力提升速度远不能达到深度学习算法的要求。为了解决这一问题,对于神经网络的计算压缩成为了解决加速问题的一个重要方向,受到了学术界和工业界的广泛关注。
对于神经网络的计算压缩,一个方向是对神经网络的网络结构进行压缩。对于网络结构的压缩又分为两个方面,一方面是对网络层数进行压缩,即减少网络层数;另一方面是对网络每层尺寸进行压缩,即对神经元数量的压缩。压缩网络层数通常采用蒸馏的方法,蒸馏的方法的主要思路是将大网络的数据传递到一个预定义的小网络结构中去。压缩每层尺寸通常采用剪枝的方法,剪枝的方法的主要思路是依据一种标准评价每个神经元连接的重要性,只保留重要性较高的连接。
但是,对神经网络的结构进行压缩后,使用该神经网络时通常需要进行网络的重训练,即使用压缩好的网络结构重新在原始数据集上进行训练。然而,这种方式对于计算设备、部署时间等因素要求较高,在很多场景下并不适用。
发明内容
本申请实施例提供一种神经网络计算压缩方法、系统及存储介质,以实现对神经网络计算进行数据压缩,提高神经网络的计算效率,降低神经网络的部署时间。
本申请实施例提供一种神经网络计算压缩方法,包括:
获取神经网络中每层浮点数的统计量;
根据所述统计量计算所述神经网络中每层浮点数数据转化到定点数数据的 压缩参数;
根据所述压缩参数对所述神经网络进行定点数数据的压缩计算。
本申请实施例提供一种神经网络计算压缩系统,包括:
统计量获取模块,设置为获取神经网络中每层浮点数的统计量;
压缩参数获取模块,设置为根据所述统计量计算所述神经网络中每层浮点数数据转化到定点数数据的压缩参数;
压缩计算模块,设置为根据所述压缩参数对所述神经网络进行定点数数据的压缩计算。
本申请实施例提供一种计算机可读存储介质,存储有计算机程序,该程序被处理器执行时实现如本申请任意实施例所提供的神经网络计算压缩方法。
附图说明
图1为本申请实施例一提供的一种神经网络计算压缩方法的流程示意图;
图2为本申请实施例二提供的另一种神经网络计算压缩方法的流程示意图;
图3为本申请实施例三提供的另一种神经网络计算压缩方法的流程示意图;
图4为本申请实施例四提供的一种神经网络计算压缩系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请进行说明。本文所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将多个步骤描述成顺序的处理,但是本文中的许多步骤可以被并行地、并发地或者同时实施。此外,多个步骤的顺序可以被重新安排。当多个步骤操作完成时处理可以被终止,但是还可以具有未包括在附图中的附加步骤。处理可以对应于方法、函数、规程、子例程、子程序等等。
术语“第一”、“第二”等可在本文中用于描述多种方向、动作、步骤或元件等,但这些方向、动作、步骤或元件不受这些术语限制。这些术语仅用于将第一个方向、动作、步骤或元件与另一个方向、动作、步骤或元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一计算图称为第二计算图, 且类似地,可将第二计算图称为第一计算图。第一计算图和第二计算图两者都是计算图,但第一计算图和第二计算图不是同一计算图。术语“第一”、“第二”等而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有限定。
实施例一
图1为本申请实施例一提供的一种神经网络计算压缩方法的流程示意图,可适用于在神经网络的运算过程中对数据进行压缩计算。该方法可以由神经网络计算压缩系统来执行,该系统可以采用软件和/或硬件的方式实现,并可集成在硬件设备上,例如芯片、板卡等。
如图1所示,本申请实施例一提供的一种神经网络计算压缩方法包括:
S110、获取神经网络中每层浮点数的统计量。
本实施例中,在机器语言中的数据(实数)表示方法通常有两种:浮点数和定点数。定点数表达实数时,小数点位置是固定的,并且小数点在机器中是不表示出来的,而是事先约定在固定的位置,一旦确定小数点的位置,就不能改变,所以定点数表示的数据范围有限,相应的占用的内存空间(比特)小。浮点数利用科学计数法来表达实数,即用一个尾数,一个基数,一个指数以及一个表示正负的符号来表达实数,例如,表示实数123.45的浮点数为1.2345x10 2,在1.2345x10 2中,1.2345为尾数,10为基数,2为指数。浮点数通过指数达到了浮动小数点的效果,从而可以灵活地表达大范围的数据,相应的占用的内存空间(比特)大。神经网络是一种模拟人脑的神经网络以实现类人工智能的机器学习技术,神经网络的数据通常情况下都采用32比特浮点数的表达形式。
统计量是神经网络中每层浮点数的数据统计,先根据数值范围对数据进行分类,再统计每一类数据的量,可以用直方图表示。对数据进行分类的方式有多种,可以根据数据的数值范围进行分类,也可以根据数据的含量进行分类,本申请实施例不做限制。例如,神经网络某一层包含的浮点数有:1x2 7、1x2 7、1x2 8、1x2 9、1x2 9、1x2 7、1x2 6,以s表示一类大小的数据且为统计量直方图中的横轴,以y表示一类大小的数据的量且为统计量直方图中的纵轴,将1x2 6≤s<1x2 7分为第一类,将1x2 7≤s<1x2 8分为第二类,将1x2 8≤s<1x2 9分为第三类,将s≥1x2 9分为第四类,则神经网络中该层浮点数的统计量可以表示为:第一类数据:1个、第二类数据:3个、第三类数据:1个、第四类数据:2个。
S120、根据所述统计量计算所述神经网络中每层浮点数数据转化到定点数数据的压缩参数。
本实施例中,根据统计量将每层浮点数数据转化为定点数数据,压缩参数可以看成是每层浮点数数据与定点数数据之间的转换参数或者转换规则。可以是将一类型的浮点数都转换为一相同的低比特定点数,也可以是不同类型的浮点数按照不同的转换规则转换为对应的定点数,本申请实施例对转换参数不做限制,可以根据实际情况进行选择。例如,将第一类1x2 6≤s<1x2 7数据转换为包含4位小数位的8比特二进制数,将第二类1x2 7≤s<1x2 8数据转换为包含3位小数位的8比特二进制数,将第三类1x2 8≤s<1x2 9数据和第四类s≥1x2 9数据都转换为包含2位小数位的8比特二进制数。
S130、根据所述压缩参数对所述神经网络进行定点数数据的压缩计算。
本实施例中,获得压缩参数后,将压缩参数导入需要进行计算的神经网络的计算图中,就可以将神经网络中的浮点数转化为相应的低比特的定点数进行压缩计算。
本申请实施例一提供的神经网络计算压缩方法通过获取神经网络中每层浮点数的统计量;根据所述统计量计算所述神经网络中每层浮点数数据转化到定点数数据的压缩参数;根据所述压缩参数对所述神经网络进行定点数数据的压缩计算。实现了将神经网络的浮点数转换为低比特的定点数进行运算,降低神经网络的计算量和所需计算空间,提高了神经网络的计算效率;并且,只针对神经网络的数据进行压缩,不涉及对神经网络结构的修改,利用获取压缩参数后的神经网络进行计算时,不需要对神经网络进行重新训练,提高了神经网络的部署速度。
实施例二
图2为本申请实施例二提供的另一种神经网络计算压缩方法的流程示意图,本实施例在上述实施例的基础上进行说明。如图2所示,本申请实施例二提供的一种神经网络计算压缩方法包括:
S210、构建神经网络的第一计算图。
本实施例中,计算图是神经网络在进行运算时神经网络模型结构的一种表达形式,包括多个计算节点和计算节点之间的连接关系,神经网络的一层可以看成是一个计算节点,每个计算节点还包括进行实际计算时的节点数据,也就是神经网络运算时,需要进行压缩的数据,节点数据一般采用32比特浮点数的表达形式。构建神经网络的第一计算图,就是构建神经网络进行运算时的第一 计算图,包括多个第一计算节点和第一计算节点之间的连接关系。
S220、在所述第一计算图中插入统计节点形成所述神经网络的第二计算图。
本实施例中,统计节点用来统计神经网络中每层浮点数的统计量。将统计节点插入第一计算图中的第一计算节点之后,形成神经网络的第二计算图。在构建神经网络的第二计算图时,可以根据需要插入统计节点,例如,可以在每个第一计算节点之后都插入一个统计节点,也可以在需要进行数据压缩的第一计算节点之后插入统计节点。本申请实施例二在每个第一计算节点之后都插入一个统计节点。由于只在第一计算节点之后插入统计节点,并未改变第一计算节点的数量以及第一计算节点间的连接关系,因此,第二计算图包含的第二计算节点与第一计算节点相同,第二计算节点间的连接关系与第一计算节点间的连接关系相同。
S230、获取所述神经网络的第一输入数据。
本实施例中,第一输入数据是获取统计量时需要对神经网络输入的数据,神经网络根据第一输入数据进行一次实际运算,则可以获取统计量。
S240、根据所述第一输入数据运行所述第二计算图。
本实施例中,将第一输入数据输入神经网络中,神经网络的第二计算图的第二计算节点获取相应的第二节点数据(与第一计算图的第一节点数据相同),运行插入统计节点后的神经网络的第二计算图,通过第二计算图的实际运算,获取每个第二节点的统计量。第二计算图只是在第一计算图的基础上插入了统计节点,并未对神经网络的节点数据进行压缩,因此,第二计算图在进行实际运算时,仍然是以浮点数的形式进行计算。
S250、根据所述第二计算图的运行过程获取神经网络中每层浮点数的统计量。
本实施例中,神经网络的第二计算图在运行的过程中,统计节点会不断更新并记录统计量,从而获取对应第二计算节点的统计量,也就获得神经网络每层浮点数的统计量。
S260、根据所述统计量计算所述神经网络每层浮点数数据转化到定点数数据的压缩参数。
本实施例中,根据统计量将每层浮点数数据转化为定点数数据,压缩参数可以看成是每层浮点数数据与定点数数据之间的转换参数或者转换规则。可以是将一类型的浮点数都转换为一相同的低比特定点数,也可以是不同类型的浮点数按照不同的转换规则转换为对应的定点数,本申请实施例对转换参数不做限制,可以根据实际情况进行选择。例如,将第一类1x2 6≤s<1x2 7数据转换为 包含4位小数位的8比特二进制数,将第二类1x2 7≤s<1x2 8数据转换为包含3位小数位的8比特二进制数,将第三类1x2 8≤s<1x2 9数据和第四类s≥1x2 9数据都转换为包含2位小数位的8比特二进制数。
S270、根据所述压缩参数对所述神经网络进行定点数数据的压缩计算。
本实施例中,获得压缩参数后,将压缩参数导入需要进行计算的神经网络的计算图中,就可以将神经网络中的浮点数转化为相应的低比特的定点数进行压缩计算。
本申请实施例二提供的神经网络计算压缩方法通过在神经网络的第一计算图中插入统计节点形成第二计算图,然后根据第一输入数据运行第二计算图,从而获取神经网络中每层浮点数的统计量,根据统计量计算神经网络中浮点数转换到定点数的压缩参数,最后根据压缩参数对神经网络进行实际的压缩计算。统计量的获取方式简单,容易操作,本申请实施例二实现了将神经网络的浮点数转换为低比特的定点数进行运算,降低神经网络的计算量和所需计算空间,提高了神经网络的计算效率。
实施例三
图3为本申请实施例三提供的另一种神经网络计算压缩方法的流程示意图,本实施例在上述实施例的基础上进行说明。如图3所示,本申请实施例三提供的一种神经网络计算压缩方法包括:
S3010、构建神经网络的第一计算图。
本实施例中,计算图是神经网络在进行运算时神经网络模型结构的一种表达形式,包括多个计算节点和计算节点之间的连接关系,神经网络的一层可以看成是一个计算节点,每个计算节点还包括进行实际计算时的节点数据,也就是神经网络运算时,需要进行压缩的数据,节点数据一般采用32比特浮点数的表达形式。构建神经网络的第一计算图,就是构建神经网络进行运算时的第一计算图,包括多个第一计算节点和第一计算节点之间的连接关系。
S3020、在所述第一计算图中插入统计节点形成所述神经网络的第二计算图。
本实施例中,统计节点用来统计神经网络中每层浮点数的统计量。将统计节点插入第一计算图中的第一计算节点之后,形成神经网络的第二计算图。在构建神经网络的第二计算图时,可以根据需要插入统计节点,例如,可以在每个第一计算节点之后都插入一个统计节点,也可以在需要进行数据压缩的第一计算节点之后插入统计节点。本申请实施例二在每个第一计算节点之后都插入 一个统计节点。由于只在第一计算节点之后插入统计节点,并未改变第一计算节点的数量以及第一计算节点间的连接关系,因此,第二计算图包含的第二计算节点与第一计算节点相同,第二计算节点间的连接关系与第一计算节点间的连接关系相同。
S3030、获取所述神经网络的第一输入数据。
本实施例中,第一输入数据是获取统计量时需要对神经网络输入的数据,神经网络根据第一输入数据进行一次实际运算,则可以获取统计量。
S3040、根据所述第一输入数据运行所述第二计算图。
本实施例中,将第一输入数据输入神经网络中,神经网络的第二计算图的第二计算节点获取相应的第二节点数据(与第一计算图的第一节点数据相同),运行插入统计节点后的神经网络的第二计算图,通过第二计算图的实际运算,获取每个第二节点的统计量。第二计算图只是在第一计算图的基础上插入了统计节点,并未对神经网络的节点数据进行压缩,因此,第二计算图在进行实际运算时,仍然是以浮点数的形式进行计算。
S3050、根据所述第二计算图的运行过程获取神经网络中每层浮点数的统计量。
本实施例中,神经网络的第二计算图在运行的过程中,统计节点会不断更新并记录统计量,从而获取对应第二计算节点的统计量,也就获得神经网络每层浮点数的统计量。
S3060、根据所述统计量计算所述神经网络每层浮点数数据转化到定点数数据的压缩参数。
本实施例中,根据统计量将每层浮点数数据转化为定点数数据,压缩参数可以看成是每层浮点数数据与定点数数据之间的转换参数或者转换规则。可以是将一类型的浮点数据都转换为一相同的低比特定点数,也可以是不同类型的浮点数按照不同的转换规则转换为对应的定点数,本申请实施例对转换参数不做限制,可以根据实际情况进行选择。例如,将第一类1x2 6≤s<1x2 7数据转换为包含4位小数位的8比特二进制数,将第二类1x2 7≤s<1x2 8数据转换为包含3位小数位的8比特二进制数,将第三类1x2 8≤s<1x2 9数据和第四类s≥1x2 9数据都转换为包含2位小数位的8比特二进制数。
S3070、导出所述第二计算图的模型文件。
本实施例中,根据第一输入数据运行插入统计节点的第二计算图之后,将该第二计算图形成模型文件并导出,并保存压缩参数,在后续的使用过程中,可以直接使用该模型文件构建神经网络的第二计算图,压缩参数也可以重复使 用,在对神经网络进行压缩计算时,不需要每次都重复步骤S3010~S3060以获取压缩参数。
S3080、根据所述第二计算图的模型文件和所述压缩参数构建第三计算图。
本实施例中,在需要对神经网络进行实际的压缩计算时,导入第二计算图的模型文件,根据该模型文件和压缩参数构建第三计算图。第三计算图包括第三计算节点以及第三计算节点间的连接关系,第三计算图在进行实际运算时,第三计算节点的数据称为第三节点数据。
一实施例中,一种构建神经网络第三计算图的方法包括步骤S30810~S30830(图中未示出)。
S30810、根据所述第二计算图的模型文件构建所述第二计算图。
本实施例中,通过第二计算图的模型文件构建神经网络的第二计算图,该第二计算图包括多个第二计算节点和第二计算节点间的连接关系,以及每个第二计算节点之后都插入了一个统计节点。
S30820、将所述第二计算图中的统计节点替换为压缩节点。
本实施例中,将第二计算图中的统计节点替换为压缩节点,压缩节点用于完成计算节点的压缩和反压缩,计算节点的压缩是指将节点数据由浮点数压缩成定点数以进行压缩计算,计算节点的反压缩是指将进行压缩计算后的计算节点输出数据由定点数转换回浮点数表达。
S30830、将所述压缩参数导入所述压缩节点,形成所述第三计算图。
本实施例中,将压缩参数导入压缩节点形成第三计算图,第三计算图为使用神经网络进行压缩计算时的计算图。由于仅对第二计算图中的统计节点做了替换,并未改变神经网络每层的数量以及连接关系,所以第三计算节点与第二计算节点相同,第三计算节点间的连接关系与第二计算节点间的连接关系相同。
S3090、获取所述神经网络的第二输入数据。
本实施例中,第二输入数据为用户使用神经网络进行计算时输入的数据,将第二输入数据输入到神经网络中,神经网络的第三计算图的第三计算节点获取相应的第三节点数据,以进行神经网络压缩计算。
S3100、根据所述第二输入数据运行所述第三计算图进行定点数数据的压缩计算。
本实施例中,将第二输入数据输入到神经网络中,第三计算图的计算节点获取相应的第三节点数据,第三节点数据通过压缩节点的压缩参数由浮点数转为低比特定点数,第三计算图通过低比特定点数进行压缩计算。
一实施例中,根据所述第二输入数据运行所述第三计算图进行定点数数据的压缩计算包括步骤S31010~S31030(图中未示出)。
S31010、将所述第二输入数据导入所述第三计算图。
S31020、根据所述压缩参数将所述第三计算图的浮点数数据压缩为定点数数据。
S31030、根据所述定点数数据进行压缩计算。
S3110、对压缩计算后得到的定点数输出数据进行反压缩,得到浮点数输出数据。
本实施例中,神经网络在进行实际运算时,是逐层进行计算的,即神经网络的一层计算完成后,该层计算得到的输出数据作为神经网络下一层进行计算的输入数据。由于神经网络进行压缩计算时的数据为定点数数据,所以每个第三计算节点进行压缩计算后相应的第三计算节点输出数据也是定点数数据,将第三计算节点压缩计算后得到的该第三计算节点的定点数输出数据通过压缩节点进行反压缩,从而得到该第三计算节点的浮点数输出数据,作为下一个第三计算节点的输入数据。
本申请实施例三提供的神经网络计算压缩方法通过将第二计算图中的统计节点替换为压缩节点构建神经网络进行压缩计算的第三计算图,通过第二输入数据运行第三计算图,通过压缩参数对节点数据进行压缩,通过压缩节点完成神经网络压缩计算后的输出数据的反压缩。只针对神经网络的数据进行压缩,不涉及对神经网络结构的修改,利用获取压缩参数后的神经网络进行计算时,不需要对其进行重新训练,提高了神经网络的部署速度。
实施例四
图4为本申请实施例四提供的一种神经网络计算压缩系统的结构示意图,可适用于在神经网络的运算过程中对数据进行压缩计算。该系统可以采用软件和/或硬件的方式实现,并可集成在硬件设备上,例如芯片、板卡等。
如图4所示,本申请实施例四提供神经网络计算压缩系统400包括统计量获取模块410、压缩参数获取模块420和压缩计算模块430。统计量获取模块410设置为获取神经网络中每层浮点数的统计量;压缩参数获取模块420设置为根据所述统计量计算所述神经网络中每层浮点数数据转化到定点数数据的压缩参数;压缩计算模块430设置为根据所述压缩参数对所述神经网络进行定点数数据的压缩计算。
一实施例中,神经网络计算压缩系统400还包括:第一计算图构建模块,设置为在所述获取神经网络中每层浮点数的统计量之前,构建所述神经网络的第一计算图;第二计算图构建模块,设置为在所述第一计算图中插入统计节点形成所述神经网络的第二计算图。
一实施例中,统计量获取模块410包括:第一输入数据获取单元,设置为获取所述神经网络的第一输入数据;第二计算图运行单元,设置为根据所述第一输入数据运行所述第二计算图;统计量获取单元,设置为根据所述第二计算图的运行过程获取所述神经网络中每层浮点数的统计量。
一实施例中,神经网络计算压缩系统400还包括:模型文件导出模块,设置为在所述根据所述统计量计算所述神经网络中每层浮点数数据转化到定点数数据的压缩参数之后,导出所述第二计算图的模型文件。
一实施例中,压缩计算模块430包括:第三计算图构建单元,设置为根据所述第二计算图的模型文件和所述压缩参数构建第三计算图;第二输入数据获取单元,设置为获取所述神经网络的第二输入数据;第三计算图运行单元,设置为根据所述第二输入数据运行所述第三计算图进行定点数数据的压缩计算。
一实施例中,第三计算图构建单元包括:第二计算图构建子单元,设置为根据所述第二计算图的模型文件构建所述第二计算图;压缩节点替换子单元,设置为将所述第二计算图中的统计节点替换为压缩节点;第三计算图构建子单元,设置为将所述压缩参数导入所述压缩节点,形成所述第三计算图。
一实施例中,第三计算图运行单元包括:第二输入数据导入子单元,设置为将所述第二输入数据导入所述第三计算图;数据压缩子单元,设置为根据所述压缩参数将所述第三计算图的浮点数数据压缩为定点数数据;压缩计算子单元,设置为根据所述定点数数据进行压缩计算。
一实施例中,第三计算图运行单元还包括:输出数据反压缩子单元,设置为在所述根据所述定点数数据进行压缩计算之后,对压缩计算后得到的定点数输出数据进行反压缩,得到浮点数输出数据。
本申请实施例四提供的神经网络计算压缩系统通过统计量获取模块,设置为获取神经网络中每层浮点数的统计量;压缩参数获取模块,设置为根据所述统计量计算所述神经网络中每层浮点数数据转化到定点数数据的压缩参数;压缩计算模块,设置为根据所述压缩参数对所述神经网络进行定点数数据的压缩计算。实现了将神经网络的浮点数转换为低比特的定点数进行运算,降低神经网络的计算量和所需计算空间,提高了神经网络的计算效率;并且,只针对神经网络的数据进行压缩,不涉及对神经网络结构的修改,利用获取压缩参数后 的神经网络进行计算时,不需要对其进行重新训练,提高了神经网络的部署速度。
实施例五
本申请实施例五还提供了一种计算机可读存储介质,存储有计算机程序,该程序被处理器执行时实现如本申请任意实施例所提供的神经网络计算压缩方法,该方法可以包括:获取神经网络每层浮点数的统计量;根据所述统计量计算所述神经网络每层浮点数数据转化到定点数数据的压缩参数;根据所述压缩参数对所述神经网络进行定点数数据的压缩计算。
本申请实施例五的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM或闪存)、光纤、CD-ROM、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,计算机可读的信号介质中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算 机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或终端上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。

Claims (10)

  1. 一种神经网络计算压缩方法,包括:
    获取神经网络中每层浮点数的统计量;
    根据所述统计量计算所述神经网络中每层浮点数数据转化到定点数数据的压缩参数;
    根据所述压缩参数对所述神经网络进行定点数数据的压缩计算。
  2. 如权利要求1所述的方法,在所述获取神经网络中每层浮点数的统计量之前,还包括:
    构建所述神经网络的第一计算图;
    在所述第一计算图中插入统计节点形成所述神经网络的第二计算图。
  3. 如权利要求2所述的方法,其中,所述获取神经网络中每层浮点数的统计量包括:
    获取神经网络的第一输入数据;
    根据所述第一输入数据运行所述第二计算图;
    根据所述第二计算图的运行过程获取所述神经网络中每层浮点数的统计量。
  4. 如权利要求3所述的方法,在所述根据所述统计量计算所述神经网络中每层浮点数数据转化到定点数数据的压缩参数之后,还包括:
    导出所述第二计算图的模型文件。
  5. 如权利要求4所述的方法,其中,所述根据所述压缩参数对所述神经网络进行定点数数据的压缩计算包括:
    根据所述第二计算图的模型文件和所述压缩参数构建第三计算图;
    获取所述神经网络的第二输入数据;
    根据所述第二输入数据运行所述第三计算图进行定点数数据的压缩计算。
  6. 如权利要求5所述的方法,其中,所述根据所述第二计算图的模型文件和所述压缩参数构建第三计算图包括:
    根据所述第二计算图的模型文件构建所述第二计算图;
    将所述第二计算图中的统计节点替换为压缩节点;
    将所述压缩参数导入所述压缩节点,形成第三计算图。
  7. 如权利要求5或6所述的方法,其中,所述根据所述第二输入数据运行 所述第三计算图进行定点数数据的压缩计算包括:
    将所述第二输入数据导入所述第三计算图;
    根据所述压缩参数将所述第三计算图的浮点数数据压缩为定点数数据;
    根据所述定点数数据进行压缩计算。
  8. 如权利要求7所述的方法,在所述根据所述定点数数据进行压缩计算之后,还包括:
    对压缩计算后得到的定点数输出数据进行反压缩,得到浮点数输出数据。
  9. 一种神经网络计算压缩系统,包括:
    统计量获取模块,设置为获取神经网络中每层浮点数的统计量;
    压缩参数获取模块,设置为根据所述统计量计算所述神经网络中每层浮点数数据转化到定点数数据的压缩参数;
    压缩计算模块,设置为根据所述压缩参数对所述神经网络进行定点数数据的压缩计算。
  10. 一种计算机可读存储介质,存储有计算机程序,所述程序被处理器执行时实现如权利要求1-8中任一项所述的神经网络计算压缩方法。
PCT/CN2019/112465 2019-10-22 2019-10-22 神经网络计算压缩方法、系统及存储介质 WO2021077283A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/112465 WO2021077283A1 (zh) 2019-10-22 2019-10-22 神经网络计算压缩方法、系统及存储介质
CN201980100181.6A CN114365147A (zh) 2019-10-22 2019-10-22 神经网络计算压缩方法、系统及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/112465 WO2021077283A1 (zh) 2019-10-22 2019-10-22 神经网络计算压缩方法、系统及存储介质

Publications (1)

Publication Number Publication Date
WO2021077283A1 true WO2021077283A1 (zh) 2021-04-29

Family

ID=75619606

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/112465 WO2021077283A1 (zh) 2019-10-22 2019-10-22 神经网络计算压缩方法、系统及存储介质

Country Status (2)

Country Link
CN (1) CN114365147A (zh)
WO (1) WO2021077283A1 (zh)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832082A (zh) * 2017-07-20 2018-03-23 上海寒武纪信息科技有限公司 一种用于执行人工神经网络正向运算的装置和方法
US20180107451A1 (en) * 2016-10-14 2018-04-19 International Business Machines Corporation Automatic scaling for fixed point implementation of deep neural networks
CN108256632A (zh) * 2018-01-29 2018-07-06 百度在线网络技术(北京)有限公司 信息处理方法和装置
CN108292374A (zh) * 2015-11-09 2018-07-17 谷歌有限责任公司 训练表示为计算图的神经网络
CN108427991A (zh) * 2017-02-14 2018-08-21 谷歌有限责任公司 在定点运算计算系统中实现神经网络
CN109643229A (zh) * 2018-04-17 2019-04-16 深圳鲲云信息科技有限公司 网络模型的应用开发方法及相关产品
CN109726806A (zh) * 2017-10-30 2019-05-07 上海寒武纪信息科技有限公司 信息处理方法及终端设备
CN109934331A (zh) * 2016-04-29 2019-06-25 北京中科寒武纪科技有限公司 用于执行人工神经网络正向运算的装置和方法
CN110062246A (zh) * 2018-01-19 2019-07-26 杭州海康威视数字技术股份有限公司 对视频帧数据进行处理的方法和装置

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108292374A (zh) * 2015-11-09 2018-07-17 谷歌有限责任公司 训练表示为计算图的神经网络
CN109934331A (zh) * 2016-04-29 2019-06-25 北京中科寒武纪科技有限公司 用于执行人工神经网络正向运算的装置和方法
US20180107451A1 (en) * 2016-10-14 2018-04-19 International Business Machines Corporation Automatic scaling for fixed point implementation of deep neural networks
CN108427991A (zh) * 2017-02-14 2018-08-21 谷歌有限责任公司 在定点运算计算系统中实现神经网络
CN107832082A (zh) * 2017-07-20 2018-03-23 上海寒武纪信息科技有限公司 一种用于执行人工神经网络正向运算的装置和方法
CN109726806A (zh) * 2017-10-30 2019-05-07 上海寒武纪信息科技有限公司 信息处理方法及终端设备
CN110062246A (zh) * 2018-01-19 2019-07-26 杭州海康威视数字技术股份有限公司 对视频帧数据进行处理的方法和装置
CN108256632A (zh) * 2018-01-29 2018-07-06 百度在线网络技术(北京)有限公司 信息处理方法和装置
CN109643229A (zh) * 2018-04-17 2019-04-16 深圳鲲云信息科技有限公司 网络模型的应用开发方法及相关产品

Also Published As

Publication number Publication date
CN114365147A (zh) 2022-04-15

Similar Documents

Publication Publication Date Title
US11373087B2 (en) Method and apparatus for generating fixed-point type neural network
CN107704625B (zh) 字段匹配方法和装置
CN111126574B (zh) 基于内镜图像对机器学习模型进行训练的方法、装置和存储介质
US20220414959A1 (en) Method for Training Virtual Image Generating Model and Method for Generating Virtual Image
CN109840589B (zh) 一种在fpga上运行卷积神经网络的方法和装置
US11741339B2 (en) Deep neural network-based method and device for quantifying activation amount
WO2020207174A1 (zh) 用于生成量化神经网络的方法和装置
CN110874625B (zh) 一种数据处理方法及装置
CN111091278A (zh) 机械设备异常检测的边缘检测模型构建方法及装置
CN113378961A (zh) 网络流量识别方法、装置、设备以及计算机程序产品
CN111985831A (zh) 云计算资源的调度方法、装置、计算机设备及存储介质
WO2023207039A1 (zh) 数据处理方法、装置、设备以及存储介质
WO2022222649A1 (zh) 神经网络模型的训练方法、装置、设备及存储介质
TWI738048B (zh) 算數框架系統及操作浮點至定點算數框架的方法
WO2021077283A1 (zh) 神经网络计算压缩方法、系统及存储介质
CN110009101A (zh) 用于生成量化神经网络的方法和装置
CN114154392A (zh) 基于区块链和联邦学习的模型共建方法、装置及设备
WO2021073638A1 (zh) 运行神经网络模型的方法、装置和计算机设备
CN116703659A (zh) 一种应用于工程咨询的数据处理方法、装置及电子设备
CN116560968A (zh) 一种基于机器学习的仿真计算时间预测方法、系统及设备
CN111325212A (zh) 模型训练方法、装置、电子设备和计算机可读存储介质
CN116168403A (zh) 医疗数据分类模型训练方法、分类方法、装置及相关介质
US20220247626A1 (en) Method For Generating Backbone Network, Apparatus For Generating Backbone Network, Device, And Storage Medium
CN115424725A (zh) 数据的分析方法和装置、存储介质及处理器
CN115249058A (zh) 神经网络模型的量化方法、装置、终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19949929

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 23/09/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19949929

Country of ref document: EP

Kind code of ref document: A1