CN109190759A - Neural network model compression and accelerated method of the one kind based on { -1 ,+1 } coding - Google Patents

Neural network model compression and accelerated method of the one kind based on { -1 ,+1 } coding Download PDF

Info

Publication number
CN109190759A
CN109190759A CN201810866365.0A CN201810866365A CN109190759A CN 109190759 A CN109190759 A CN 109190759A CN 201810866365 A CN201810866365 A CN 201810866365A CN 109190759 A CN109190759 A CN 109190759A
Authority
CN
China
Prior art keywords
neural network
bit
encoding
coding
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810866365.0A
Other languages
Chinese (zh)
Inventor
孙其功
焦李成
杨康
尚凡华
李秀芳
侯彪
杨淑媛
李玲玲
郭雨薇
唐旭
冯志玺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201810866365.0A priority Critical patent/CN109190759A/en
Publication of CN109190759A publication Critical patent/CN109190759A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses one kind to be based on { -1, + 1 } the neural network compression decomposed and accelerated method, mainly solve the problems, such as that deep neural network model can not be used well to mobile phone and low profile edge equipment in the prior art first constructs neural network model, then { -1,1 number of encoding bits } are determined;Retraining neural network model parameter simultaneously quantifies;{ -1,1 } are carried out again to the model parameter after quantization again to encode;Coding layer is added to neural network model again;Neural network internal matrix or vector multiplication operation are finally substituted for the operation of several two-value bit arithmetics.The present invention, which has, carries out again { -1,1 } coding to neural network parameter and activation value, reduces the storage space of model parameter, implementation model calculates the advantages of accelerating.

Description

Neural network model compression and acceleration method based on { -1, +1} code
Technical Field
The invention belongs to the technical field of deep neural networks, and particularly relates to a neural network model compression and acceleration method based on { -1, +1} coding, which is applied to compression and acceleration of a deep neural network model.
Background
Deep neural networks have made a great breakthrough in the fields of image classification, target detection, natural language processing, and the like. In computer vision tasks, applying convolutional neural networks in the field of neural networks often yields results that are superior to other methods of machine learning. However, the deep neural network has the characteristics of deep network depth and large network parameter number, so that the general deep neural network can only complete tasks on GPUs with large video memory and strong calculation power, and cannot perform well on mobile phones and small embedded devices with small memory and weak calculation power. The quantization and acceleration of the deep neural network model are one of the methods for solving the problem.
The existing binarization parameter neural network method has the defects that the data characteristics are more in the classification task and the target detection task of a large data set, the target detection relates to the regression problem, and the characteristic representation capability after parameter binarization is insufficient, so that the method has poor effect on the classification task and the target detection task of the large data set.
The existing acceleration and compression method for the deep convolutional neural network reduces the characteristic extraction capability of the original network due to the fact that the deep convolutional neural network parameters are sparse by applying a plurality of subcodebooks and indexes corresponding to the plurality of codebooks, is difficult to apply to mobile phones and small embedded equipment, and is limited in acceleration ratio.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method for compressing and accelerating a neural network model based on { -1, +1} coding, which is to perform { -1,1} coding on quantized neural network parameters and eigenvalues by quantizing deep neural network parameters and eigenvalues, and decompose an internal matrix or vector multiplication operation of the coded neural network into a plurality of binary bit operations, thereby improving the utilization rate of hardware resources and realizing the compression and acceleration of parameters. By using the method, the encoding, calculation and storage of the { -1,1} of different digits of the data under different precisions are realized by selecting the encoding digit.
The invention adopts the following technical scheme:
a neural network model compression and acceleration method based on { -1, +1} code is characterized by firstly constructing a neural network model; then, carrying out segmentation processing on [ -1,1] according to the precision required by a specific task, and determining the bit number M for carrying out { -1,1} coding on the model parameter and the activation value again according to the number of segments; initializing a neural network model and quantizing model parameters; training a neural network model and quantizing model parameters and activation values according to the number M of bits of the { -1,1} code in the training by using a linear quantization formula; carrying out M-bit number { -1,1} coding on the trained quantized neural network model parameters; adding an encoding layer for M-bit-1, 1-encoding the activation values again after each activation layer of the neural network model; the internal matrix or vector multiplication operation of the neural network is replaced by a plurality of binary bit operations, thereby realizing model compression and acceleration.
Specifically, the activation function of the neural network model is an activation function HReLU, which is specifically as follows:
here, hrelu (x) represents an activation value obtained by taking x as an input.
Specifically, the segmentation processing specifically includes: will [ -1,1 [ ]]Is divided into 2MSegment according to the number of segments 2MDetermining the number of bits M for subsequent re-1, 1 encoding of the model parameters and activation values.
Specifically, a weight with a mean value of 0 and a variance of 1 of the neural network model is initialized, and the model parameters are quantized by using a linear quantization formula according to the bit number M of the { -1,1} code.
Further, the linear quantization formula is as follows:
wherein q isM(x) Denotes a quantization parameter value obtained under M-bit-1, 1 coding, x denotes a full precision parameter value, M denotes the number of bits of the-1, 1 coding,<·>indicating a rounding operation.
Specifically, the quantized neural network model parameters after training are subjected to M-bit number { -1,1} encoding by adopting an encoding formula MBitEncoder, wherein the encoding formula MBitEncoder is as follows:
wherein MBitEncoder (x) indicates M-bit { -1,1} encoding of x, M indicates the total number of bits of the { -1,1} encoding, M indicates the M-th bit of the M-bit { -1,1} encoding,represents a determined value in an M-th bit code of M-bit { -1,1} codes and
further, the quantized value x is quantizedqAfter M-bit { -1,1} coding, it can be expressed as:
wherein x isqmDenotes xqM denotes the total number of bits of the coded { -1,1} code, xqEach bit can be represented as:
xq→[xqMxq(M-1)xq(M-2)...xq1]
wherein, the [ alpha ], [ beta ]]Denotes xqInner coding form of (1), xqMDenotes xqInner-1, 1 encodes the state of the Mth bit and xqM∈{-1,1}。
Specifically, the matrix or vector multiplication operation inside the neural network is replaced by a plurality of binary bit operation operations, each bit of the encoded parameter and the activation value is separated, and the multiplication operation between the model parameter and the activation value after separation is decomposed into a plurality of binary bit operation calculations.
Further, the inner product of the encoded activation value vector x and the encoded neural network parameter vector w is represented as:
wherein x isTgw is decomposed into M · K binary bit operator operations,denotes xmEach bit of (A) andand carrying out XOR operation on corresponding bits and calculating the number of minus results of the total number of 1 in the XOR results to be-1, wherein M is the total number of bits of the activation value codes, and K is the total number of bits of the network parameter codes.
Further, each element of the encoded activation value vector x is represented as follows using { -1,1} encoding:
wherein,denotes xiValue of m-th bit of element and
and (3) transforming the vector x after the { -1,1} coding as follows:
wherein,the ith bit of each element in the vector is taken out to form a vector, and the encoded neural network parameter vector w is also transformed as the vector x
Compared with the prior art, the invention has at least the following beneficial effects:
the invention relates to a neural network model compression and acceleration method based on { -1, +1} coding, which carries out { -1,1} coding on model parameters and activation values of a neural network, the coding bit number can be freely selected, and the internal matrix or vector multiplication operation of the neural network is decomposed into a plurality of binary bit operations, thereby overcoming the problem that the model parameters can only be coded under single precision after being quantized in the prior art, so that the invention compresses and accelerates the model parameters under different coding precisions and has higher model acceleration ratio on an FPGA (field programmable gate array), and realizes the encoding, calculation and storage of different { -1,1} bits of data under different precisions by selecting the coding bit number, and the invention is not only competent for classification tasks, but also can be used for target detection tasks.
Furthermore, because the new activation function HReLU is used for restraining the activation value before coding, and the activation value is restrained between [0 and 1], the problems that a quantization model is difficult to converge and insensitive to a full-precision weight in the prior art are solved, so that the quantization neural network built by the invention can be initialized by directly applying weight parameters trained by the neural network, and the convergence speed of the quantization neural network training is accelerated.
Furthermore, the invention uses the coding formula MBitEncoder to carry out { -1,1} coding on the model parameters and the activation values, the number of coding bits can be freely selected, and the internal matrix or vector multiplication operation of the coded neural network is decomposed into a plurality of binary bit operations, thereby overcoming the problem that the model parameters can only be coded under single precision after being quantized in the prior art, and leading the invention to compress the model parameters and accelerate the model under different coding precisions and to have higher model acceleration ratio on FPGA.
Furthermore, as the model parameters and the activation values { -1,1} encoding bit number of the neural network are variable, the model expression capability can be improved by changing the encoding bit number, the problem of poor classification effect of a binary network in a large-scale classification data set ImageNet in the prior art is solved, and the neural network can be applied to a neural network of a target detection task.
In conclusion, the method has the advantages of carrying out the encoding of the neural network parameters and the activation values again { -1,1}, reducing the storage space of the model parameters and realizing the acceleration of the model calculation.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of the 2-bit encoded neural network fully connected algorithm of the present invention.
Detailed Description
The invention provides a neural network model compression and acceleration method based on { -1, +1} coding, which comprises the steps of firstly constructing a neural network model; then, carrying out segmentation processing on [ -1,1] according to the precision required by a specific task, and determining the bit number M of carrying out { -1,1} coding on the model parameters and the activation values again according to the segmentation number; initializing a neural network model, and quantizing the model parameters by using a linear quantization formula; training a neural network model and quantizing model parameters and activation values by using a linear quantization formula according to the number M of bits of the { -1,1} code in the training; carrying out M-bit number { -1,1} coding on the trained quantized neural network model parameters by using a coding formula defined by the invention; adding an encoding layer for M-bit-1, 1-encoding the activation values again after each activation layer of the neural network model; and the multiplication operation of the matrix or the vector in the neural network is replaced by a plurality of binary bit operations, so that the model compression and acceleration are realized.
Referring to fig. 1, a neural network model compression and acceleration method based on { -1, +1} coding of the present invention includes the following steps:
s1, constructing a neural network model:
constructing a required neural network model according to a specific task; the neural network model generally comprises a convolutional layer, a pooling layer, a full-connection layer and the like; the activation function of the model is an activation function HTanh or an activation function HReLU defined by the invention;
wherein the activation function HTanh is formulated as follows:
where htanh (x) represents the activation value obtained with x as input.
The activation function HReLU formula defined by the present invention is as follows:
here, hrelu (x) represents an activation value obtained by taking x as an input.
S2, precision pair [ -1,1] according to specific task requirement]Is segmented, and [ -1,1] is]Is divided into 2MSegment according to the number of segments 2MDetermining the number of bits M for subsequent re-encoding of the model parameters and the activation values by-1, 1;
for example, will [ -1,1 [ ]]Divided into 4 sections ([ -1, -0.5), [ -0.5,0), [0,0.5) and [0.5,1 []) Then 2M4, the number M of bits of { -1,1} code is 2;
s3, carrying out weight initialization with the mean value of 0 and the variance of 1 on the neural network model, and quantizing the model parameters by using a linear quantization formula according to the digit M of the { -1,1} code;
different { -1,1} encoding bit numbers can generate quantization with different precision, and the quantization precision is higher as the { -1,1} encoding bit number is more.
For example, quantizing x with the number of coded bits { -1,1} being 1 bitq1E { -1,1}, use xq1-1 represents the value of [ -1,0), using xq11 represents [0,1]]Is a value of (1). Quantizing to obtain x under the condition that the coded number of { -1,1} is 2 bitsq2∈{-1,-1/3,1/3,1};
The linear quantization formula used by the model is as follows:
wherein q isM(x) Denotes a quantization parameter value obtained under M-bit-1, 1 coding, x denotes a full precision parameter value, M denotes the number of bits of the-1, 1 coding,<·>indicating a rounding operation.
S4, training the neural network model and quantizing the model parameters and the activation values according to the number M of the coded bits of { -1,1} in the training by using the linear quantization formula in the step S3;
s5, carrying out M-bit number { -1,1} coding on the trained quantized neural network model parameters by using the coding formula defined by the invention;
for example, x quantized under two-bit { -1,1} codingq2E { -1, -1/3, 1/3, 1}, and the inner coding of the two-bit { -1,1} code is represented as { [ -1 } by re-encoding],[-11],[1-1],[11]};
The invention defines a { -1,1} coding formula MBitEncoder as follows:
wherein MBitEncoder (x) denotes M-bit { -1,1} encoding x, M denotes the total number of bits of the { -1,1} encoding, M denotes the M-th bit of the M-bit { -1,1} encoding,represents a determined value in an M-th bit code of M-bit { -1,1} codes and
for quantized value xqAfter M-bit { -1,1} coding, it can be expressed as:
wherein xqmDenotes xqM denotes the total number of bits of the coded { -1,1} code, xqEach bit can be represented as:
xq→[xqMxq(M-1)xq(M-2)...xq1](6)
wherein]Denotes xqInner coding form of (1), xqMDenotes xqInner-1, 1 encodes the state of the Mth bit and xqM∈{-1,1}。
S6, adding an encoding layer for M-bit-1, 1-encoding the activation values again after each activation layer of the neural network model; the coding formula used by the coding layer is the same as the coding formula in step S5;
and S7, replacing the multiplication operation of the matrix or the vector in the neural network with a plurality of binary bit operation operations, separating each bit of the encoded parameter and the activation value, and decomposing the multiplication operation between the model parameter and the activation value into a plurality of binary bit operation calculations after separation.
Without loss of generality, the coded activation value vector x ═ x is known1,x2,.....,xN]TThe encoded neural network parameter vector w ═ w1,w2,.....,wN]T
The sum of the original inner products of x and w is expressed as follows:
wherein x isTDenotes the transposition of x, xnDenotes the nth element in x, likewise wnN-th element, x, representing wTgw represents the inner product summation operation of x and w.
Vector x and each element of vector w are represented using { -1,1} encoding of the present invention:
wherein,
denotes xiValue of m-th bit of element andm represents the total number of bits of the encoded { -1,1} code.
And (3) transforming the vector x after the { -1,1} coding as follows:
wherein,the method comprises the steps of representing that the ith bit of each element in a vector is taken out to form a vector; the vector w is also transformed in the same way as the vector x
Referring to fig. 2, x is a vector consisting of full-precision real numbers, where x becomes a two-bit coded number after quantization coding, each bit { -1,1}, and w full-precision real numbers are also quantization coded in the same form as x; decomposing the coded x and w, separating the high order from the low order, using the low order of x and the high order of w to do the same or operation, using the high order of x and the high order of w to do the same or operation, if the low order is the same or operation, the coefficient is 20·201 is ═ 1; the x low bit and the w high bit are operated in the same or way, and the coefficient is 20·21And (2) and so on; rear 1/9 denotesThe inner product of the transit vector x and the vector w is represented as:
wherein x isTgw is decomposed into M · K binary bit operator operations,denotes xmEach bit of (A) andand carrying out XOR operation on corresponding bits and calculating the number of minus results of the total number of 1 in the XOR results to be-1, wherein M is the total number of bits of the activation value codes, and K is the total number of bits of the network parameter codes.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Simulation conditions are as follows:
the hardware platform is as follows: intel (r) xeon (r) CPU Z480, 2.40GHz 16, 64G memory.
The software platform is as follows: mxnet
Simulation content and results:
the multi-bit coded resnet-18 classification model constructed by the method disclosed by the invention and the full-precision resnet-18 classification model constructed by the traditional method are respectively used for classifying the large-scale classification data set ImageNet.
The results are shown in table 1:
as can be seen from Table 1, the classification result of the 8-bit encoded resnet-18 classification model has higher accuracy than that of the full-precision resnet-18 classification model, and the parameters of the 8-bit encoded resnet-18 classification model are four times smaller and the classification speed is four times higher than that of the full-precision resnet-18 classification model. As the number of encoding bits decreases, the accuracy of classification decreases, but at the same time parameter compression and model acceleration increase.
The method is respectively used for constructing a multi-bit coded SSD target detection model and a full-precision SSD target detection model by a traditional method to carry out target detection on images of VOC target detection data sets.
Comparing the obtained target detection result with a real target mark according to the following two formulas:
recall is total number of detected correct targets/total number of actual targets
Accuracy rate is total number of detected correct targets/total number of detected targets
And drawing an accuracy-recall rate curve, obtaining the detection precision AP of the target detection according to the area of the curve, and averaging the APs of multiple categories to obtain the average precision mAP.
The results are shown in table 2:
full precision 8 bit encoding 6 bit encoding 5 bit encoding
mAP 0.6392 0.6351 0.6111 0.4540
As can be seen from table 2, although the difference between the average precision mapp obtained when the SSD target detection model with 8-bit codes and the SSD target detection model with full precision detect the target of the VOC is 0.041, the SSD target detection model with 8-bit codes compresses the model parameters and the speed of target detection is doubled. The SSD target detection model coded by the method continuously improves the compression of model parameters and the acceleration ratio of the model along with the reduction of the number of coded bits.
In summary, the invention decomposes the neural network matrix operation into a plurality of binary bit operations by recoding and decomposing the parameters and the eigenvalues of the deep neural network, thereby realizing the encoding, calculation and storage of data under different encoding precisions.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (10)

1. A compression and acceleration method of a neural network model based on { -1, +1} coding is characterized by firstly constructing the neural network model; then, carrying out segmentation processing on [ -1,1] according to the precision required by a specific task, and determining the bit number M for carrying out { -1,1} encoding on the model parameter and the activation value again according to the number of segments; initializing a neural network model and quantizing model parameters; training a neural network model and quantizing model parameters and activation values by using a linear quantization formula according to the number M of bits of the { -1,1} code in the training; carrying out M-bit number { -1,1} coding on the trained quantized neural network model parameters; adding an encoding layer for M-bit-1, 1-encoding the activation values again after each activation layer of the neural network model; and the multiplication operation of the matrix or the vector in the neural network is replaced by a plurality of binary bit operations, so that the model compression and acceleration are realized.
2. The method as claimed in claim 1, wherein the activation function of the neural network model is an activation function HReLU, and is as follows:
here, hrelu (x) represents an activation value obtained by taking x as an input.
3. The method as claimed in claim 1, wherein the segmentation process comprises: will [ -1,1 [ ]]Is divided into 2MSegment according to the number of segments 2MThe number of bits M for subsequent re-1, 1 encoding of the model parameters and activation values is determined.
4. The method of claim 1, wherein a weight with a mean value of 0 and a variance of 1 is initialized, and the model parameter is quantized using a linear quantization formula according to the number M of bits of { -1,1} code.
5. The method of claim 4, wherein the linear quantization formula is as follows:
wherein,qM(x) Denotes a quantization parameter value obtained under M-bit-1, 1 coding, x denotes a full precision parameter value, M denotes the number of bits of the-1, 1 coding,<·>indicating a rounding operation.
6. The method as claimed in claim 1, wherein the training quantized neural network model parameters are subjected to M-bit { -1,1} encoding using an encoding formula MBitEncoder, wherein the encoding formula MBitEncoder is as follows:
wherein MBitEncoder (x) indicates M-bit { -1,1} encoding of x, M indicates the total number of bits of the { -1,1} encoding, M indicates the M-th bit of the M-bit { -1,1} encoding,represents a determined value in an M-th bit code of M-bit { -1,1} codes and
7. the method of claim 6 wherein the quantized value x is compressed and accelerated by a method of compressing a neural network model based on { -1, +1} encodingqAfter M-bit { -1,1} coding, it can be expressed as:
xqm∈{-1,1}
wherein x isqmDenotes xqM denotes the total number of bits of the coded { -1,1} code, xqEach bit can be represented as:
xq→[xqMxq(M-1)xq(M-2)...xq1]
wherein,[]denotes xqInner coding form of (1), xqMDenotes xqInner-1, 1 encodes the state of the Mth bit and xqM∈{-1,1}。
8. The method of claim 1 wherein the neural network model compression and acceleration based on { -1, +1} encoding is performed by replacing matrix or vector multiplication operations within the neural network with binary bit operations, separating the encoded parameters from each bit of the activation value, and decomposing the multiplication between the model parameters and the activation value into binary bit operations.
9. The method of claim 8, wherein the inner product of the vector x of encoded activation values and the vector w of encoded neural network parameters is represented as:
wherein x isTgw is decomposed into M · K binary bit operator operations,denotes xmEach bit of (A) andand carrying out XOR operation on corresponding bits and calculating the number of minus results of the total number of 1 in the XOR results to be-1, wherein M is the total number of bits of the activation value codes, and K is the total number of bits of the network parameter codes.
10. The method of claim 9, wherein each element of the encoded activation value vector x is represented by { -1, +1} code as follows:
wherein,denotes xiValue of m-th bit of element and
and (3) transforming the vector x after the { -1,1} coding as follows:
wherein,the ith bit of each element in the vector is taken out to form a vector, and the encoded neural network parameter vector w is also transformed as the vector x
CN201810866365.0A 2018-08-01 2018-08-01 Neural network model compression and accelerated method of the one kind based on { -1 ,+1 } coding Pending CN109190759A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810866365.0A CN109190759A (en) 2018-08-01 2018-08-01 Neural network model compression and accelerated method of the one kind based on { -1 ,+1 } coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810866365.0A CN109190759A (en) 2018-08-01 2018-08-01 Neural network model compression and accelerated method of the one kind based on { -1 ,+1 } coding

Publications (1)

Publication Number Publication Date
CN109190759A true CN109190759A (en) 2019-01-11

Family

ID=64920369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810866365.0A Pending CN109190759A (en) 2018-08-01 2018-08-01 Neural network model compression and accelerated method of the one kind based on { -1 ,+1 } coding

Country Status (1)

Country Link
CN (1) CN109190759A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886394A (en) * 2019-03-05 2019-06-14 北京时代拓灵科技有限公司 Three-valued neural networks weight processing method and processing device in embedded device
CN110069715A (en) * 2019-04-29 2019-07-30 腾讯科技(深圳)有限公司 A kind of method of information recommendation model training, the method and device of information recommendation
CN111723901A (en) * 2019-03-19 2020-09-29 百度在线网络技术(北京)有限公司 Training method and device of neural network model
CN111914986A (en) * 2019-05-10 2020-11-10 北京京东尚科信息技术有限公司 Method for determining binary convolution acceleration index and related equipment
CN113328755A (en) * 2021-05-11 2021-08-31 内蒙古工业大学 Compressed data transmission method facing edge calculation
WO2022147745A1 (en) * 2021-01-08 2022-07-14 深圳市大疆创新科技有限公司 Encoding method, decoding method, encoding apparatus, decoding apparatus
CN117391175A (en) * 2023-11-30 2024-01-12 中科南京智能技术研究院 Pulse neural network quantification method and system for brain-like computing platform

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133266A (en) * 2017-12-12 2018-06-08 北京信息科技大学 A kind of neural network weight compression method and application method based on non-uniform quantizing
CN108304919A (en) * 2018-01-29 2018-07-20 百度在线网络技术(北京)有限公司 Method and apparatus for generating convolutional neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133266A (en) * 2017-12-12 2018-06-08 北京信息科技大学 A kind of neural network weight compression method and application method based on non-uniform quantizing
CN108304919A (en) * 2018-01-29 2018-07-20 百度在线网络技术(北京)有限公司 Method and apparatus for generating convolutional neural networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ITAY HUBARA等: "Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations", 《HTTPS://ARXIV.ORG/ABS/1609.07061》 *
机器之心: "ICLR 2018|阿里巴巴论文:基于交替方向法的循环神经网络多比特量化", 《HTTPS://WWW.SOHU.COM/A/229799206_129720》 *
牟帅: "基于位量化的深度神经网络加速与压缩研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886394A (en) * 2019-03-05 2019-06-14 北京时代拓灵科技有限公司 Three-valued neural networks weight processing method and processing device in embedded device
CN109886394B (en) * 2019-03-05 2021-06-18 北京时代拓灵科技有限公司 Method and device for processing weight of ternary neural network in embedded equipment
CN111723901A (en) * 2019-03-19 2020-09-29 百度在线网络技术(北京)有限公司 Training method and device of neural network model
CN111723901B (en) * 2019-03-19 2024-01-12 百度在线网络技术(北京)有限公司 Training method and device for neural network model
CN110069715A (en) * 2019-04-29 2019-07-30 腾讯科技(深圳)有限公司 A kind of method of information recommendation model training, the method and device of information recommendation
CN110069715B (en) * 2019-04-29 2022-12-23 腾讯科技(深圳)有限公司 Information recommendation model training method, information recommendation method and device
CN111914986A (en) * 2019-05-10 2020-11-10 北京京东尚科信息技术有限公司 Method for determining binary convolution acceleration index and related equipment
WO2022147745A1 (en) * 2021-01-08 2022-07-14 深圳市大疆创新科技有限公司 Encoding method, decoding method, encoding apparatus, decoding apparatus
CN113328755A (en) * 2021-05-11 2021-08-31 内蒙古工业大学 Compressed data transmission method facing edge calculation
CN113328755B (en) * 2021-05-11 2022-09-16 内蒙古工业大学 Compressed data transmission method facing edge calculation
CN117391175A (en) * 2023-11-30 2024-01-12 中科南京智能技术研究院 Pulse neural network quantification method and system for brain-like computing platform

Similar Documents

Publication Publication Date Title
CN109190759A (en) Neural network model compression and accelerated method of the one kind based on { -1 ,+1 } coding
CN109889839B (en) Region-of-interest image coding and decoding system and method based on deep learning
CN105184362B (en) The acceleration of the depth convolutional neural networks quantified based on parameter and compression method
CN108510083B (en) Neural network model compression method and device
CN107016708B (en) Image hash coding method based on deep learning
CN109002889B (en) Adaptive iterative convolution neural network model compression method
EP3735658A1 (en) Generating a compressed representation of a neural network with proficient inference speed and power consumption
WO2017031630A1 (en) Deep convolutional neural network acceleration and compression method based on parameter quantification
CN108304928A (en) Compression method based on the deep neural network for improving cluster
CN107493405A (en) Encrypted image reversible information hidden method based on coding compression
CN110517329A (en) A kind of deep learning method for compressing image based on semantic analysis
CN110248190B (en) Multilayer residual coefficient image coding method based on compressed sensing
CN110753225A (en) Video compression method and device and terminal equipment
CN111179144B (en) Efficient information hiding method for multi-embedding of multi-system secret information
CN109523016B (en) Multi-valued quantization depth neural network compression method and system for embedded system
CN110647990A (en) Cutting method of deep convolutional neural network model based on grey correlation analysis
CN104392207A (en) Characteristic encoding method for recognizing digital image content
CN112817940A (en) Gradient compression-based federated learning data processing system
CN114301889A (en) Efficient federated learning method and system based on weight compression
CN107170020A (en) Dictionary learning still image compression method based on minimum quantization error criterion
CN106331719B (en) A kind of image data compression method split based on the Karhunen-Loeve transformation error space
CN109670057B (en) Progressive end-to-end depth feature quantization system and method
CN104320659B (en) Background modeling method, device and equipment
WO2022247368A1 (en) Methods, systems, and mediafor low-bit neural networks using bit shift operations
Seo et al. Hybrid approach for efficient quantization of weights in convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190111