CN112766456A - Quantification method, device, equipment and storage medium of floating point type deep neural network - Google Patents

Quantification method, device, equipment and storage medium of floating point type deep neural network Download PDF

Info

Publication number
CN112766456A
CN112766456A CN202011632088.0A CN202011632088A CN112766456A CN 112766456 A CN112766456 A CN 112766456A CN 202011632088 A CN202011632088 A CN 202011632088A CN 112766456 A CN112766456 A CN 112766456A
Authority
CN
China
Prior art keywords
layer
neural network
network
point type
deep neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011632088.0A
Other languages
Chinese (zh)
Other versions
CN112766456B (en
Inventor
李骏
马骏
王少军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011632088.0A priority Critical patent/CN112766456B/en
Publication of CN112766456A publication Critical patent/CN112766456A/en
Application granted granted Critical
Publication of CN112766456B publication Critical patent/CN112766456B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to the technical field of artificial intelligence, and provides a quantification method, a quantification device, quantification equipment and a quantification storage medium for a floating-point type deep neural network, which are used for improving the precision and the performance of the deep neural network after compression. The quantization method of the floating-point type deep neural network comprises the following steps: splitting the initial weight of each layer of the pre-trained initial floating point type deep neural network according to a preset input channel to obtain a candidate weight of each layer of the network; normalizing the candidate weight of each layer network through the normalization factor vector of the input channel to obtain a target weight of each layer network; performing fixed point quantization processing on the target weight of each layer of network, and performing fixed point quantization processing on the bias value to obtain a quantized floating point type deep neural network; and writing the normalized factor vector serving as a scaling layer into the quantization floating point type depth neural network to obtain a target floating point type depth neural network. In addition, the invention also relates to a block chain technology, and the pre-trained initial floating point type deep neural network can be stored in the block chain.

Description

Quantification method, device, equipment and storage medium of floating point type deep neural network
Technical Field
The invention relates to the field of artificial intelligence neural networks, in particular to a quantification method, a quantification device, quantification equipment and a quantification storage medium for a floating-point type deep neural network.
Background
With the development of deep learning, the performance of a deep neural network becomes more and more powerful, but the deployment of a mobile terminal is difficult because a large amount of network parameters consume a large amount of computing resources and memory space. In recent years, neural network compression has become a research hotspot, and the main methods are low-rank decomposition, distillation, pruning, global quantization and the like.
However, fine adjustment needs to be performed again on a new network obtained by the low-rank decomposition and pruning method, otherwise, the performance of the network is seriously reduced; the distillation method requires pre-training a strong teacher network; the network obtained by the global quantization method has limited quantization precision and large quantization loss, so that the precision and the performance of the compressed deep neural network are low.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for quantizing a floating-point type deep neural network, which are used for improving the precision and the performance of the compressed deep neural network.
The invention provides a quantification method of a floating-point type deep neural network, which comprises the following steps:
acquiring an initial weight of each layer of network in a pre-trained initial floating point type deep neural network, and splitting the initial weight of each layer of network according to a preset input channel to obtain a candidate weight of each layer of network;
acquiring a normalization factor vector of the input channel, and normalizing the candidate weight of each layer network through the normalization factor vector to obtain a target weight of each layer network;
performing fixed-point quantization processing on the target weight of each layer of the initial floating point type deep neural network and fixed-point quantization processing on the offset value to obtain a quantized floating point type deep neural network;
and determining the normalization factor vector as a scaling layer, and writing the scaling layer into the quantization floating point type depth neural network to obtain a target floating point type depth neural network.
Optionally, in a first implementation manner of the first aspect of the present invention, the obtaining a normalization factor vector of the input channel, and normalizing the candidate weight of each layer of the network by using the normalization factor vector to obtain a target weight of each layer of the network includes:
acquiring channel data of the input channel, and performing long tail data removal processing on the channel data of the input channel to obtain processed data;
calculating the maximum value of the absolute value of the processed data to obtain a normalization factor vector;
and dividing the candidate weight of each layer network with the normalization factor vector to obtain a target weight of each layer network.
Optionally, in a second implementation manner of the first aspect of the present invention, the obtaining channel data of the input channel, and performing long tail data removal processing on the channel data of the input channel to obtain processed data includes:
acquiring channel data of the input channel and the quantity of the channel data, and calculating the mean value of the channel data;
calculating difference absolute values between the channel data and the average values respectively, and sequencing the channel data according to the sequence of the difference absolute values from small to large to obtain a data sequence;
comparing and analyzing the quantity of each channel data in the data sequence with a preset threshold value to obtain long tail data of which the quantity is smaller than the preset threshold value;
and removing long tail data in the data sequence to obtain processed data.
Optionally, in a third implementation manner of the first aspect of the present invention, the performing, on the initial floating point type deep neural network, fixed-point quantization processing on a target weight of each layer of the network and fixed-point quantization processing on an offset value to obtain a quantized floating point type deep neural network includes:
performing fixed-point quantization processing on the target weight of each layer of the initial floating-point type deep neural network through preset first numerical data to obtain a candidate floating-point type deep neural network;
and obtaining the offset values of each layer of network in the candidate floating point type depth neural network, and carrying out fixed point quantization processing on the offset values of each layer of network through preset second numerical data to obtain a quantized floating point type depth neural network.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the obtaining an initial weight of each layer of the pre-trained initial floating-point deep neural network, and splitting the initial weight of each layer of the network according to a preset input channel to obtain a candidate weight of each layer of the network includes:
acquiring a weight matrix, a storage mode, a network convolution layer and the number of channels of input channels preset in each layer of the pre-trained initial floating point type deep neural network;
extracting the weight matrix according to the network convolution layer to obtain an initial weight of each layer network;
and extracting and operating the initial weight of each layer network according to the storage mode and the channel number to obtain a candidate weight of each layer network, wherein the number of the candidate weights of each layer network comprises a plurality.
Optionally, in a fifth implementation manner of the first aspect of the present invention, after determining the normalization factor vector as a scaling layer and writing the scaling layer into the quantization floating-point depth neural network to obtain a target floating-point depth neural network, the method further includes:
deploying the target floating point type deep neural network on terminal equipment, and acquiring the operation precision of the target floating point type deep neural network based on the terminal equipment;
and optimizing the target floating point type deep neural network according to the operation precision.
The second aspect of the present invention provides a quantization apparatus for a floating-point type deep neural network, including:
the splitting module is used for acquiring an initial weight of each layer of network in the pre-trained initial floating point type deep neural network, splitting the initial weight of each layer of network according to a preset input channel, and obtaining a candidate weight of each layer of network;
the normalization module is used for acquiring a normalization factor vector of the input channel, and normalizing the candidate weight of each layer of network through the normalization factor vector to obtain a target weight of each layer of network;
the fixed-point quantization processing module is used for performing fixed-point quantization processing on the target weight of each layer of the initial floating-point type deep neural network and fixed-point quantization processing on the offset value to obtain a quantized floating-point type deep neural network;
and the writing module is used for determining the normalization factor vector as a scaling layer and writing the scaling layer into the quantization floating point type depth neural network to obtain a target floating point type depth neural network.
Optionally, in a first implementation manner of the second aspect of the present invention, the normalization module includes:
the removing unit is used for acquiring the channel data of the input channel and removing long tail data of the channel data of the input channel to obtain processed data;
the computing unit is used for computing the maximum value of the absolute value of the processed data to obtain a normalization factor vector;
and the dividing unit is used for dividing the candidate weight of each layer network with the normalization factor vector to obtain a target weight of each layer network.
Optionally, in a second implementation manner of the second aspect of the present invention, the removing unit is specifically configured to:
acquiring channel data of the input channel and the quantity of the channel data, and calculating the mean value of the channel data;
calculating difference absolute values between the channel data and the average values respectively, and sequencing the channel data according to the sequence of the difference absolute values from small to large to obtain a data sequence;
comparing and analyzing the quantity of each channel data in the data sequence with a preset threshold value to obtain long tail data of which the quantity is smaller than the preset threshold value;
and removing long tail data in the data sequence to obtain processed data.
Optionally, in a third implementation manner of the second aspect of the present invention, the fixed-point quantization processing module is specifically configured to:
performing fixed-point quantization processing on the target weight of each layer of the initial floating-point type deep neural network through preset first numerical data to obtain a candidate floating-point type deep neural network;
and obtaining the offset values of each layer of network in the candidate floating point type depth neural network, and carrying out fixed point quantization processing on the offset values of each layer of network through preset second numerical data to obtain a quantized floating point type depth neural network.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the splitting module is specifically configured to:
acquiring a weight matrix, a storage mode, a network convolution layer and the number of channels of input channels preset in each layer of the pre-trained initial floating point type deep neural network;
extracting the weight matrix according to the network convolution layer to obtain an initial weight of each layer network;
and extracting and operating the initial weight of each layer network according to the storage mode and the channel number to obtain a candidate weight of each layer network, wherein the number of the candidate weights of each layer network comprises a plurality.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the quantization apparatus of a floating point type deep neural network further includes:
the deployment module is used for deploying the target floating point type deep neural network on terminal equipment and acquiring the operation precision of the target floating point type deep neural network based on the terminal equipment;
and the optimization module is used for optimizing the target floating point type deep neural network according to the operation precision.
A third aspect of the present invention provides a quantization apparatus for a floating-point type deep neural network, including: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the quantization apparatus of the floating point type depth neural network to perform the quantization method of the floating point type depth neural network described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute the above-described quantization method of a floating point type deep neural network.
In the technical scheme provided by the invention, the initial weight of each layer of the pre-trained initial floating point type deep neural network is obtained, and the initial weight of each layer of the network is split according to a preset input channel to obtain a candidate weight of each layer of the network; acquiring a normalization factor vector of an input channel, and normalizing the candidate weight of each layer network through the normalization factor vector to obtain a target weight of each layer network; performing fixed point quantization processing on a target weight of each layer of the initial floating point type deep neural network and fixed point quantization processing on an offset value to obtain a quantized floating point type deep neural network; and determining the normalized factor vector as a scaling layer, and writing the scaling layer into the quantization floating point type depth neural network to obtain a target floating point type depth neural network. In the embodiment of the invention, the initial weight of each layer of the initial floating point type deep neural network is split according to the preset input channel, so that the precision loss caused by subsequent quantization can be reduced, normalizing the candidate weight of each layer network by the normalization factor vector, and performing fixed-point quantization processing on the target weight of each layer network on the initial floating-point type deep neural network, and fixed point quantization processing of the offset value, compressing the network layer volume of the initial floating point type deep neural network, reducing memory occupation, reducing the influence on the weight quantization precision of each layer of network, determining the normalization factor vector as a scaling layer, and the scaling layer is added to the quantization floating point type deep neural network, so that small partition normalization data is realized, quantization errors are reduced, and the precision and the performance of the deep neural network after compression are improved.
Drawings
FIG. 1 is a diagram of an embodiment of a quantization method for a floating-point type deep neural network according to the present invention;
FIG. 2 is a diagram of another embodiment of a quantization method for a floating-point type deep neural network according to the present invention;
FIG. 3 is a diagram of an embodiment of a quantization apparatus of a floating-point type deep neural network according to the present invention;
FIG. 4 is a diagram of another embodiment of a quantization apparatus of a floating-point type deep neural network according to an embodiment of the present invention;
FIG. 5 is a diagram of an embodiment of a quantization apparatus of a floating-point deep neural network according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for quantizing a floating-point type deep neural network, which improve the precision and the performance of the compressed deep neural network.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a detailed flow of an embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a quantization method for a floating-point type deep neural network according to an embodiment of the present invention includes:
101. and acquiring an initial weight of each layer of network in the pre-trained initial floating point type deep neural network, and splitting the initial weight of each layer of network according to a preset input channel to obtain a candidate weight of each layer of network.
It is to be understood that the execution subject of the present invention may be a quantization apparatus of a floating-point type deep neural network, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
The method comprises the steps that a server obtains training requirements and training samples of an initial floating point type deep neural network in advance, wherein the training requirements include but are not limited to functions, network parameters and a network structure of the initial floating point type deep neural network, and the deep neural network is constructed, trained, calculated in floating point number precision, adjusted in weight and optimized according to the training requirements, a pre-training model and the training samples, so that the pre-trained initial floating point type deep neural network is obtained; or the server performs transfer learning, floating point number precision calculation, weight adjustment and optimization on the pre-constructed deep neural network according to the training requirement, the pre-training model and the training sample, so as to obtain the pre-trained initial floating point type deep neural network. The number of input channels of each layer network comprises a plurality of input channels; the form of the candidate weight of each layer network can be a product form; the candidate weight value of each layer network comprises one or more than one.
After the server obtains the pre-trained initial floating point type deep neural network, the initial weight of each layer network in the pre-trained initial floating point type deep neural network and the number of input channels of a preset input channel are obtained, and matrix decomposition is performed on the initial weight of each layer network according to the number of the input channels, so that candidate weights of each layer network are obtained, for example: taking the initial weight of the A-layer network in the initial floating-point type deep neural network as an example for explanation, the initial weight of the A-layer network is
Figure BDA0002875117430000071
If the number of input channels is 3, the initial weight of the A-layer network is decomposed into candidate weights by a matrix
Figure BDA0002875117430000072
Candidate weight
Figure BDA0002875117430000073
And the candidate weight
Figure BDA0002875117430000074
Or, after obtaining the pre-trained initial floating point type deep neural network, the server obtains an initial weight of each layer network in the pre-trained initial floating point type deep neural network and a preset number of input channels of the input channels, obtains a storage mode in the initial floating point type deep neural network, and performs matrix decomposition on the initial weight of each layer network according to the number of the input channels, thereby obtaining a candidate weight of each layer network, for example: taking the initial weight of the A-layer network in the initial floating-point type deep neural network as an example for explanation, the initial weight of the A-layer network is
Figure BDA0002875117430000075
The number of input channels is 3, the storage mode is column-first, then the initial weight of A-layer network is decomposed into split weight according to channel data
Figure BDA0002875117430000076
Splitting weight
Figure BDA0002875117430000077
And split weight
Figure BDA0002875117430000078
Split weight
Figure BDA0002875117430000079
Splitting weight
Figure BDA00028751174300000710
And split weight
Figure BDA00028751174300000711
Splitting according to columns to obtain candidate weight
Figure BDA00028751174300000712
Candidate weight
Figure BDA00028751174300000713
Candidate weight
Figure BDA00028751174300000714
Candidate weight
Figure BDA00028751174300000715
And the candidate weight
Figure BDA00028751174300000716
Thereby obtaining a candidate weight value of
Figure BDA00028751174300000717
The matrix thereafter. The precision loss caused by subsequent quantization can be reduced.
102. And acquiring a normalization factor vector of the input channel, and normalizing the candidate weight of each layer of the network through the normalization factor vector to obtain a target weight of each layer of the network.
The server obtains a normalization factor vector of an input channel of each layer in the initial floating point type deep neural network, and divides the candidate weight of each layer with the normalization factor vector to obtain a target weight of each layer, for example: the normalization factor vector of the input channel of each layer of the initial floating-point type deep neural network is K ═ K (K1, K2, … …, kn), the candidate weight of each layer of the network is W ═ W (W1, W2, … …, wn), and the target weight of each layer of the network is W ═ W1/K1, W2/K2, … …, wn/kn.
The method includes the steps that a server obtains channel data of an input channel, calculates an average value of the channel data, calculates a difference value between the channel data and the average value, sorts the channel data according to a sequence from small to large of the difference value to obtain a sequence, determines the channel data at a preset ratio position before the sequence is arranged as long-tail data, removes the long-tail data in the sequence, calculates the maximum value of an absolute value of the sequence after the long-tail data is removed, and obtains a normalization factor vector, for example: taking the input channel 1 of the network in the layer a as an example, the server obtains the channel data of the input channel 1, calculates a mean value of the channel data, calculates a difference value between the channel data and the mean value, sorts the channel data according to a sequence of the difference values from small to large to obtain a sequence (q1, q2, … …, q10), if the preset ratio is 1%, q1 is long-tailed data, removes q1 in (q1, q2, … …, q10) to obtain (q2, … …, q10), calculates a maximum absolute value in (q2, … …, q10), and obtains a normalization factor vector of the input channel 1. And by normalizing the factor vector, the candidate weight of each layer of the network is normalized, the volume of the network layer is compressed, the memory occupation is reduced, and the problem of less available resources of the terminal equipment is solved.
103. And carrying out fixed point quantization processing on the target weight of each layer of the initial floating point type deep neural network and fixed point quantization processing on the offset value to obtain a quantized floating point type deep neural network.
The server sequentially performs full-precision weight recovery, weight balance processing and fine-tuning processing on a target weight of each layer of the initial floating point type deep neural network through preset low-bit data, obtains an offset value of each layer of the initial floating point type deep neural network, and sequentially performs full-precision weight recovery, weight balance processing and fine-tuning processing on the offset value of each layer of the initial floating point type deep neural network through preset numerical data, so that the quantized floating point type deep neural network is obtained. By carrying out fixed point quantization processing on the target weight of each layer of the initial floating point type deep neural network and fixed point quantization processing on the offset value, the network layer volume of the initial floating point type deep neural network is compressed, the memory occupation is reduced, and the influence on the weight quantization precision of each layer of the network is reduced.
104. And determining the normalized factor vector as a scaling layer, and writing the scaling layer into the quantization floating point type depth neural network to obtain a target floating point type depth neural network.
And writing the normalized factor vector into a scaling layer scale by the server, and writing the scaling layer scale into the front of an activation layer of the quantization floating point type depth neural network and the rear of a bias layer according to the network rule and the calculation logic of the quantization floating point type depth neural network, thereby obtaining the target floating point type depth neural network. By writing in the scaling layer scale, the output of each layer of weight network in the target floating point type deep neural network is reduced to the floating point type fload, small partition normalization data is realized, quantization errors are reduced as much as possible, meanwhile, the floating point number input of the initial floating point type deep neural network before quantization can be compatible, the effect that the calculation logic hardly needs to be changed when an original engine is transplanted can be realized, the quantized target floating point type deep neural network does not need to be retrained, and a network training scheme does not need to be modified, so that the precision and the performance of the depth neural network after compression are improved.
In the embodiment of the invention, the initial weight of each layer in the initial floating point type deep neural network is split according to the preset input channel, so that the precision loss caused by subsequent quantization can be reduced, the candidate weight of each layer is normalized through the normalization factor vector, the fixed point quantization processing of the target weight of each layer is carried out on the initial floating point type deep neural network, the fixed point quantization processing of the offset value is carried out, the network layer volume of the initial floating point type deep neural network is compressed, the memory occupation is reduced, the influence on the weight quantization precision of each layer of network is reduced, the normalization factor vector is determined as the scaling layer, the scaling layer is added to the quantization floating point type deep neural network, the small partition normalization data is realized, the quantization error is reduced as much as possible, and simultaneously, the floating point number input of the initial floating point type deep neural network before quantization can be compatible, the effect that calculation logic hardly needs to be changed when an original engine is transplanted can be achieved, retraining of the quantized target floating point type deep neural network is not needed, and a network training scheme does not need to be modified, so that the precision and the performance of the deep neural network after compression are improved.
Referring to fig. 2, another embodiment of the quantization method of the floating-point deep neural network according to the embodiment of the present invention includes:
201. and acquiring an initial weight of each layer of network in the pre-trained initial floating point type deep neural network, and splitting the initial weight of each layer of network according to a preset input channel to obtain a candidate weight of each layer of network.
Specifically, the server acquires a weight matrix, a storage mode, a network convolution layer of a pre-trained initial floating point type deep neural network and the channel number of input channels preset in each layer of the network; extracting the weight matrix according to the network convolution layer to obtain an initial weight of each layer network; and extracting and operating the initial weight of each layer network through the storage mode and the number of the channels to obtain a candidate weight of each layer network, wherein the number of the candidate weights of each layer network comprises a plurality of candidate weights.
For example: taking the initial weight of the A-layer network in the initial floating-point type deep neural network, the preset input channel 1 as an example, the weight matrix of the A-layer network is
Figure BDA0002875117430000101
The storage mode of the initial floating point type deep neural network is line-first, the network convolution layer is 2 layers, and the number of the preset input channels 1 is ciExtracting the weight matrix according to the network convolution layer to obtain the initial weight of the A-layer network
Figure BDA0002875117430000102
Splitting the initial weight of the A-layer network according to rows to obtain a splitting weight 1(1,2,5,6,3) and a splitting weight
Figure BDA0002875117430000103
Dividing the weight value 1(1,2,5,6,3) and the channel number ciPerforming cross-correlation operation to obtain a candidate weight 1, splitting the weight
Figure BDA0002875117430000104
And the number of channels ciAnd performing cross-correlation operation to obtain a candidate weight. The data of the input channel of each layer network comprises a plurality of data; the form of the candidate weights for each tier of the network may be in the form of a product. The precision loss caused by subsequent quantization can be reduced.
202. And acquiring channel data of the input channel, and performing long tail data removal processing on the channel data of the input channel to obtain processed data.
Specifically, the server acquires channel data of an input channel and the number of the channel data, and calculates an average value of the channel data; calculating difference absolute values between the channel data and the average value respectively, and sequencing the channel data according to the sequence of the difference absolute values from small to large to obtain a data sequence; comparing and analyzing the quantity of each channel data in the data sequence with a preset threshold value to obtain long tail data of which the quantity is smaller than the preset threshold value; and removing long tail data in the data sequence to obtain processed data.
For example, the server obtains channel data of the input channel 1 as B1, B2, B3, B4, B5, B6, B7 and B8, and the number of corresponding channel data is D1, D2, D3, D4, D5, D6, D7 and D8, respectively, the preset threshold is H, the average of the channel data B1, B2, B3, B4, B5, B6, B7 and B8 is E, the channel data B1, B2, B3, B4, B5, B6, B7 and B7 are F7, B7, D and F7, sorts the channel data according to the order of the absolute difference values from small to large, the channel data is F7, B7, D7, D72, D, B7, D72, D7, D72, D and D7, and D72, respectively, and 7, and D72 are removed from the corresponding data sequences are obtained as the preset threshold, 7, B7, D, 7, 36, B5, B8, B3 and B6 in B8, B3, B6, B2, B7, B4 and B1 obtain processed data B2, B7, B4 and B1 of the input channel 1.
203. And calculating the maximum value of the absolute value of the processed data to obtain a normalization factor vector.
After the server obtains the processing data, an absolute value of each processing data is calculated, the processing data are sorted according to a sequence of the absolute values from large to small, and the processing data sorted to the first order are determined as a normalization factor vector, for example: taking the processing data of the input channel 1 in the input channel 1-5 of the layer-A network as B2, B7, B4 and B1 as an example, the server calculates the absolute value of each processing data, sorts the processing data according to the sequence of the absolute values from large to small to obtain B7, B4, B1 and B2, then B7 is the normalization factor vector of the input channel 1, and similarly, the normalization factor vectors of the input channels 2-5 are B10, B11, B12, B13 and B14, so as to obtain the normalization factor vectors of the layer-A network (B7, B10, B11, B12, B13 and B14).
204. And dividing the candidate weight of each layer network by the normalization factor vector to obtain the target weight of each layer network.
The server may obtain the target weight of each layer by performing one-to-one division on the candidate weight of each layer and the vector in the normalization factor vector, for example: the normalization factor vector of each layer of network input channels in the initial floating-point type deep neural network is B ═ B (B7, B10, B11, B12, B13 and B14), the candidate weight value of each layer of network is w ═ w (w1, w2, w3, w4, w5 and w6), and the target weight value of each layer of network is w ═ w' (w1/B7, w2/B10, w3/B11, w4/B12, w5/B13 and w 6/B14).
The server may also calculate a factor average of the normalized factor vector, and divide the candidate weight of each layer by the factor average to obtain a target weight of each layer network, for example: the normalization factor vector of each layer of network input channels in the initial floating-point type deep neural network is B ═ B (B7, B10, B11, B12, B13 and B14), the candidate weight of each layer of network is w ═ w (w1, w2, w3, w4, w5 and w6), the factor mean value of the normalization factor vector of each layer of network input channels is T ═ B7+ B10+ B11+ B12+ B13+ B14)/6, and the target weight of each layer of network is w ═ w (w1/T, w2/T, w3/T, w4/T, w5/T and w 6/T). And by normalizing the factor vector, the candidate weight of each layer of the network is normalized, the volume of the network layer is compressed, the memory occupation is reduced, and the problem of less available resources of the terminal equipment is solved.
205. And carrying out fixed point quantization processing on the target weight of each layer of the initial floating point type deep neural network and fixed point quantization processing on the offset value to obtain a quantized floating point type deep neural network.
Specifically, the server performs fixed-point quantization processing on a target weight of each layer in the initial floating-point type deep neural network through preset first numerical data to obtain a candidate floating-point type deep neural network; and obtaining the offset values of each layer of network in the candidate floating point type depth neural network, and carrying out fixed point quantization processing on the offset values of each layer of network through preset second numerical data to obtain a quantized floating point type depth neural network.
For example, the first numerical data may be smaller than the second numerical data, the server sequentially performs full-precision weight recovery, weight balance processing, and fine-tuning processing on a target weight of each layer of the network in the initial floating point type deep neural network through preset 8-bit integer data of the first numerical data to obtain a candidate floating point type deep neural network, the server obtains an offset value of each layer of the network in the candidate floating point type deep neural network, and sequentially performs full-precision weight recovery, weight balance processing, and fine-tuning processing on the offset value of each layer of the network through preset 16-bit integer data of the second numerical data to obtain a quantized floating point type deep neural network. By carrying out fixed point quantization processing on the target weight of each layer of the initial floating point type deep neural network and fixed point quantization processing on the offset value, the network layer volume of the initial floating point type deep neural network is compressed, the memory occupation is reduced, and the influence on the weight quantization precision of each layer of the network is reduced at the cost of no compression or 2 times of compression.
206. And determining the normalized factor vector as a scaling layer, and writing the scaling layer into the quantization floating point type depth neural network to obtain a target floating point type depth neural network.
And writing the normalized factor vector into a scaling layer scale by the server, and writing the scaling layer scale into the front of an activation layer of the quantization floating point type depth neural network and the rear of a bias layer according to the network rule and the calculation logic of the quantization floating point type depth neural network, thereby obtaining the target floating point type depth neural network. By writing in the scaling layer scale, the output of each layer of weight network in the target floating point type deep neural network is reduced to the floating point type fload, small partition normalization data is realized, quantization errors are reduced as much as possible, meanwhile, the floating point number input of the initial floating point type deep neural network before quantization can be compatible, the effect that the calculation logic hardly needs to be changed when an original engine is transplanted can be realized, the quantized target floating point type deep neural network does not need to be retrained, and a network training scheme does not need to be modified, so that the precision and the performance of the depth neural network after compression are improved.
Specifically, the server determines the normalization factor vector as a scaling layer, writes the scaling layer into the quantization floating point type depth neural network, obtains a target floating point type depth neural network, deploys the target floating point type depth neural network on the terminal device, and obtains the operation precision of the target floating point type depth neural network based on the terminal device; and optimizing the target floating point type deep neural network according to the operation precision.
After the server obtains the target floating point type deep neural network, acquiring a configuration file of the terminal device, analyzing the configuration file to acquire configuration information, configuring the information of the target floating point type deep neural network on the terminal device according to the configuration information, acquiring routing information of a service platform of the terminal device, deploying the target floating point type deep neural network to a server corresponding to the terminal device based on the routing information, carrying out service operation on the server corresponding to the terminal device through the deployed target floating point type deep neural network to acquire an operation result, calculating the operation precision of the target floating point type deep neural network through a preset loss function and the operation result, and adjusting network parameters and/or network results of the target floating point type deep neural network through a preset optimizer and the operation precision to improve the accuracy of the target floating point type deep neural network in service operation, and availability of a target floating point type deep neural network.
In the embodiment of the invention, the precision loss caused by subsequent quantization can be reduced, the network layer volume of the initial floating point type deep neural network is compressed, the memory occupation is reduced, the influence on the weight quantization precision of each layer of network is reduced, the small partition normalization data is realized, the quantization error is reduced as much as possible, can be compatible with the floating point number input of the initial floating point type depth neural network before quantization, can realize the effect that the computational logic hardly needs to be changed when an original engine is transplanted, the quantized target floating point type depth neural network does not need to be retrained and does not need to modify a network training scheme, thereby improving the precision and performance of the deep neural network after compression, and further improving the performance of the deep neural network by controlling the operation precision, the target floating point type deep neural network is optimized, and the accuracy of the target floating point type deep neural network on service operation and the availability of the target floating point type deep neural network on service operation are improved.
With reference to fig. 3, the method for quantizing a floating point type deep neural network in the embodiment of the present invention is described above, and a quantization apparatus of a floating point type deep neural network in the embodiment of the present invention is described below, where an embodiment of the quantization apparatus of a floating point type deep neural network in the embodiment of the present invention includes:
the splitting module 301 is configured to obtain an initial weight of each layer of the pre-trained initial floating point type deep neural network, and split the initial weight of each layer of the network according to a preset input channel to obtain a candidate weight of each layer of the network;
a normalization module 302, configured to obtain a normalization factor vector of an input channel, and normalize the candidate weight of each layer network by using the normalization factor vector to obtain a target weight of each layer network;
a fixed-point quantization processing module 303, configured to perform fixed-point quantization processing on a target weight of each layer of the initial floating-point type deep neural network and perform fixed-point quantization processing on an offset value to obtain a quantized floating-point type deep neural network;
and a writing module 304, configured to determine the normalization factor vector as a scaling layer, and write the scaling layer into the quantization floating-point type depth neural network to obtain a target floating-point type depth neural network.
The function implementation of each module in the quantization apparatus of the floating-point type deep neural network corresponds to each step in the quantization method embodiment of the floating-point type deep neural network, and the function and implementation process thereof are not described in detail here.
In the embodiment of the invention, the initial weight of each layer in the initial floating point type deep neural network is split according to the preset input channel, so that the precision loss caused by subsequent quantization can be reduced, the candidate weight of each layer is normalized through the normalization factor vector, the fixed point quantization processing of the target weight of each layer is carried out on the initial floating point type deep neural network, the fixed point quantization processing of the offset value is carried out, the network layer volume of the initial floating point type deep neural network is compressed, the memory occupation is reduced, the influence on the weight quantization precision of each layer of network is reduced, the normalization factor vector is determined as the scaling layer, the scaling layer is added to the quantization floating point type deep neural network, the small partition normalization data is realized, the quantization error is reduced as much as possible, and simultaneously, the floating point number input of the initial floating point type deep neural network before quantization can be compatible, the effect that calculation logic hardly needs to be changed when an original engine is transplanted can be achieved, retraining of the quantized target floating point type deep neural network is not needed, and a network training scheme does not need to be modified, so that the precision and the performance of the deep neural network after compression are improved.
Referring to fig. 4, another embodiment of the quantization apparatus of the floating-point type deep neural network according to the embodiment of the present invention includes:
the splitting module 301 is configured to obtain an initial weight of each layer of the pre-trained initial floating point type deep neural network, and split the initial weight of each layer of the network according to a preset input channel to obtain a candidate weight of each layer of the network;
a normalization module 302, configured to obtain a normalization factor vector of an input channel, and normalize the candidate weight of each layer network by using the normalization factor vector to obtain a target weight of each layer network;
the normalization module 302 specifically includes:
a removing unit 3021, configured to obtain channel data of an input channel, and perform long tail data removal processing on the channel data of the input channel to obtain processed data;
a calculating unit 3022, configured to calculate a maximum absolute value of the processed data to obtain a normalization factor vector;
a dividing unit 3023, configured to divide the candidate weight of each layer network by the normalization factor vector to obtain a target weight of each layer network;
a fixed-point quantization processing module 303, configured to perform fixed-point quantization processing on a target weight of each layer of the initial floating-point type deep neural network and perform fixed-point quantization processing on an offset value to obtain a quantized floating-point type deep neural network;
and a writing module 304, configured to determine the normalization factor vector as a scaling layer, and write the scaling layer into the quantization floating-point type depth neural network to obtain a target floating-point type depth neural network.
Optionally, the removing unit 3021 may be further specifically configured to:
acquiring channel data of input channels and the quantity of the channel data, and calculating the average value of the channel data;
calculating difference absolute values between the channel data and the average value respectively, and sequencing the channel data according to the sequence of the difference absolute values from small to large to obtain a data sequence;
comparing and analyzing the quantity of each channel data in the data sequence with a preset threshold value to obtain long tail data of which the quantity is smaller than the preset threshold value;
and removing long tail data in the data sequence to obtain processed data.
Optionally, the fixed-point quantization processing module 303 may be further specifically configured to:
performing fixed-point quantization processing on a target weight of each layer in the initial floating point type deep neural network through preset first numerical data to obtain a candidate floating point type deep neural network;
and obtaining the offset values of each layer of network in the candidate floating point type depth neural network, and carrying out fixed point quantization processing on the offset values of each layer of network through preset second numerical data to obtain a quantized floating point type depth neural network.
Optionally, the splitting module 301 may be further specifically configured to:
acquiring a weight matrix, a storage mode, a network convolution layer and the number of channels of input channels preset in each layer of the pre-trained initial floating point type deep neural network;
extracting the weight matrix according to the network convolution layer to obtain an initial weight of each layer network;
and extracting and operating the initial weight of each layer network through the storage mode and the number of the channels to obtain a candidate weight of each layer network, wherein the number of the candidate weights of each layer network comprises a plurality of candidate weights.
Optionally, the quantization apparatus of the floating-point type deep neural network further includes:
the deployment module 305 is configured to deploy the target floating point type deep neural network on the terminal device, and obtain the operation precision of the target floating point type deep neural network based on the terminal device;
and the optimization module 306 is configured to optimize the target floating point type deep neural network according to the operation precision.
The function implementation of each module and each unit in the quantization apparatus of the floating-point type deep neural network corresponds to each step in the quantization method embodiment of the floating-point type deep neural network, and the function and implementation process thereof are not described in detail herein.
In the embodiment of the invention, the precision loss caused by subsequent quantization can be reduced, the network layer volume of the initial floating point type deep neural network is compressed, the memory occupation is reduced, the influence on the weight quantization precision of each layer of network is reduced, the small partition normalization data is realized, the quantization error is reduced as much as possible, can be compatible with the floating point number input of the initial floating point type depth neural network before quantization, can realize the effect that the computational logic hardly needs to be changed when an original engine is transplanted, the quantized target floating point type depth neural network does not need to be retrained and does not need to modify a network training scheme, thereby improving the precision and performance of the deep neural network after compression, and further improving the performance of the deep neural network by controlling the operation precision, the target floating point type deep neural network is optimized, and the accuracy of the target floating point type deep neural network on service operation and the availability of the target floating point type deep neural network on service operation are improved.
Fig. 3 and 4 describe the quantization apparatus of the floating point type deep neural network in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the quantization apparatus of the floating point type deep neural network in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 5 is a schematic structural diagram of a quantization apparatus of a floating point type deep neural network according to an embodiment of the present invention, where the quantization apparatus 500 of the floating point type deep neural network may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) for storing applications 533 or data 532. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations in the quantization apparatus 500 for the floating-point type deep neural network. Still further, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the quantization device 500 of the floating point type deep neural network.
The quantization apparatus 500 of the floating-point type deep neural network may further include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the quantization device architecture of the floating point type depth neural network shown in fig. 5 does not constitute a limitation of the quantization device of the floating point type depth neural network, and may include more or less components than those shown, or combine some components, or a different arrangement of components.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when executed on a computer, cause the computer to perform the steps of the quantization method of the floating-point type deep neural network.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A quantization method of a floating point type deep neural network is characterized by comprising the following steps:
acquiring an initial weight of each layer of network in a pre-trained initial floating point type deep neural network, and splitting the initial weight of each layer of network according to a preset input channel to obtain a candidate weight of each layer of network;
acquiring a normalization factor vector of the input channel, and normalizing the candidate weight of each layer network through the normalization factor vector to obtain a target weight of each layer network;
performing fixed-point quantization processing on the target weight of each layer of the initial floating point type deep neural network and fixed-point quantization processing on the offset value to obtain a quantized floating point type deep neural network;
and determining the normalization factor vector as a scaling layer, and writing the scaling layer into the quantization floating point type depth neural network to obtain a target floating point type depth neural network.
2. The method according to claim 1, wherein the obtaining of the normalization factor vector of the input channel, and the normalizing the candidate weight of each layer of the network by the normalization factor vector to obtain the target weight of each layer of the network comprises:
acquiring channel data of the input channel, and performing long tail data removal processing on the channel data of the input channel to obtain processed data;
calculating the maximum value of the absolute value of the processed data to obtain a normalization factor vector;
and dividing the candidate weight of each layer network with the normalization factor vector to obtain a target weight of each layer network.
3. The method according to claim 2, wherein the obtaining of the channel data of the input channel and the performing of long tail data removal processing on the channel data of the input channel to obtain processed data comprises:
acquiring channel data of the input channel and the quantity of the channel data, and calculating the mean value of the channel data;
calculating difference absolute values between the channel data and the average values respectively, and sequencing the channel data according to the sequence of the difference absolute values from small to large to obtain a data sequence;
comparing and analyzing the quantity of each channel data in the data sequence with a preset threshold value to obtain long tail data of which the quantity is smaller than the preset threshold value;
and removing long tail data in the data sequence to obtain processed data.
4. The method according to claim 1, wherein the performing fixed-point quantization processing on the target weight of each layer of the initial floating-point type deep neural network and fixed-point quantization processing on an offset value to obtain a quantized floating-point type deep neural network comprises:
performing fixed-point quantization processing on the target weight of each layer of the initial floating-point type deep neural network through preset first numerical data to obtain a candidate floating-point type deep neural network;
and obtaining the offset values of each layer of network in the candidate floating point type depth neural network, and carrying out fixed point quantization processing on the offset values of each layer of network through preset second numerical data to obtain a quantized floating point type depth neural network.
5. The method according to claim 1, wherein the obtaining an initial weight of each layer of the pre-trained initial floating point type deep neural network, and splitting the initial weight of each layer of the network according to a preset input channel to obtain a candidate weight of each layer of the network comprises:
acquiring a weight matrix, a storage mode, a network convolution layer and the number of channels of input channels preset in each layer of the pre-trained initial floating point type deep neural network;
extracting the weight matrix according to the network convolution layer to obtain an initial weight of each layer network;
and extracting and operating the initial weight of each layer network according to the storage mode and the channel number to obtain a candidate weight of each layer network, wherein the number of the candidate weights of each layer network comprises a plurality.
6. The method for quantizing a floating point type depth neural network according to any one of claims 1 to 5, wherein the determining the normalization factor vector as a scaling layer and writing the scaling layer into the quantizing floating point type depth neural network to obtain a target floating point type depth neural network further comprises:
deploying the target floating point type deep neural network on terminal equipment, and acquiring the operation precision of the target floating point type deep neural network based on the terminal equipment;
and optimizing the target floating point type deep neural network according to the operation precision.
7. A quantization apparatus of a floating point type depth neural network, the quantization apparatus comprising:
the splitting module is used for acquiring an initial weight of each layer of network in the pre-trained initial floating point type deep neural network, splitting the initial weight of each layer of network according to a preset input channel, and obtaining a candidate weight of each layer of network;
the normalization module is used for acquiring a normalization factor vector of the input channel, and normalizing the candidate weight of each layer of network through the normalization factor vector to obtain a target weight of each layer of network;
the fixed-point quantization processing module is used for performing fixed-point quantization processing on the target weight of each layer of the initial floating-point type deep neural network and fixed-point quantization processing on the offset value to obtain a quantized floating-point type deep neural network;
and the writing module is used for determining the normalization factor vector as a scaling layer and writing the scaling layer into the quantization floating point type depth neural network to obtain a target floating point type depth neural network.
8. The apparatus for quantizing a floating-point type depth neural network according to claim 7, wherein the normalization module comprises:
the removing unit is used for acquiring the channel data of the input channel and removing long tail data of the channel data of the input channel to obtain processed data;
the computing unit is used for computing the maximum value of the absolute value of the processed data to obtain a normalization factor vector;
and the dividing unit is used for dividing the candidate weight of each layer network with the normalization factor vector to obtain a target weight of each layer network.
9. A quantization apparatus of a floating point type depth neural network, comprising: a memory and at least one processor, the memory having instructions stored therein;
the at least one processor invokes the instructions in the memory to cause a quantization device of the floating point type depth neural network to perform a quantization method of the floating point type depth neural network as recited in any one of claims 1-6.
10. A computer readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement a quantization method of a floating point type deep neural network as claimed in any one of claims 1 to 6.
CN202011632088.0A 2020-12-31 2020-12-31 Quantization method, device and equipment for floating-point deep neural network and storage medium Active CN112766456B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011632088.0A CN112766456B (en) 2020-12-31 2020-12-31 Quantization method, device and equipment for floating-point deep neural network and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011632088.0A CN112766456B (en) 2020-12-31 2020-12-31 Quantization method, device and equipment for floating-point deep neural network and storage medium

Publications (2)

Publication Number Publication Date
CN112766456A true CN112766456A (en) 2021-05-07
CN112766456B CN112766456B (en) 2023-12-26

Family

ID=75699600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011632088.0A Active CN112766456B (en) 2020-12-31 2020-12-31 Quantization method, device and equipment for floating-point deep neural network and storage medium

Country Status (1)

Country Link
CN (1) CN112766456B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705791A (en) * 2021-08-31 2021-11-26 上海阵量智能科技有限公司 Neural network inference quantification method and device, electronic equipment and storage medium
CN114861886A (en) * 2022-05-30 2022-08-05 阿波罗智能技术(北京)有限公司 Quantification method and device of neural network model
CN115019150A (en) * 2022-08-03 2022-09-06 深圳比特微电子科技有限公司 Target detection fixed point model establishing method and device and readable storage medium
CN116187420A (en) * 2023-05-04 2023-05-30 上海齐感电子信息科技有限公司 Training method, system, equipment and medium for lightweight deep neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190164050A1 (en) * 2017-11-30 2019-05-30 International Business Machines Corporation Compression of fully connected / recurrent layers of deep network(s) through enforcing spatial locality to weight matrices and effecting frequency compression
CN110852439A (en) * 2019-11-20 2020-02-28 字节跳动有限公司 Neural network model compression and acceleration method, data processing method and device
WO2020057000A1 (en) * 2018-09-19 2020-03-26 深圳云天励飞技术有限公司 Network quantization method, service processing method and related products
CN111401550A (en) * 2020-03-10 2020-07-10 北京迈格威科技有限公司 Neural network model quantification method and device and electronic equipment
CN111783961A (en) * 2020-07-10 2020-10-16 中国科学院自动化研究所 Activation fixed point fitting-based convolutional neural network post-training quantization method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190164050A1 (en) * 2017-11-30 2019-05-30 International Business Machines Corporation Compression of fully connected / recurrent layers of deep network(s) through enforcing spatial locality to weight matrices and effecting frequency compression
WO2020057000A1 (en) * 2018-09-19 2020-03-26 深圳云天励飞技术有限公司 Network quantization method, service processing method and related products
CN110852439A (en) * 2019-11-20 2020-02-28 字节跳动有限公司 Neural network model compression and acceleration method, data processing method and device
CN111401550A (en) * 2020-03-10 2020-07-10 北京迈格威科技有限公司 Neural network model quantification method and device and electronic equipment
CN111783961A (en) * 2020-07-10 2020-10-16 中国科学院自动化研究所 Activation fixed point fitting-based convolutional neural network post-training quantization method and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705791A (en) * 2021-08-31 2021-11-26 上海阵量智能科技有限公司 Neural network inference quantification method and device, electronic equipment and storage medium
CN113705791B (en) * 2021-08-31 2023-12-19 上海阵量智能科技有限公司 Neural network reasoning quantification method and device, electronic equipment and storage medium
CN114861886A (en) * 2022-05-30 2022-08-05 阿波罗智能技术(北京)有限公司 Quantification method and device of neural network model
CN114861886B (en) * 2022-05-30 2023-03-10 阿波罗智能技术(北京)有限公司 Quantification method and device of neural network model
CN115019150A (en) * 2022-08-03 2022-09-06 深圳比特微电子科技有限公司 Target detection fixed point model establishing method and device and readable storage medium
CN116187420A (en) * 2023-05-04 2023-05-30 上海齐感电子信息科技有限公司 Training method, system, equipment and medium for lightweight deep neural network

Also Published As

Publication number Publication date
CN112766456B (en) 2023-12-26

Similar Documents

Publication Publication Date Title
CN112766456A (en) Quantification method, device, equipment and storage medium of floating point type deep neural network
Fan et al. Training with quantization noise for extreme model compression
CN107239829B (en) Method for optimizing artificial neural network
CN110969251B (en) Neural network model quantification method and device based on label-free data
CN110555450B (en) Face recognition neural network adjusting method and device
Wu et al. Easyquant: Post-training quantization via scale optimization
EP3788559A1 (en) Quantization for dnn accelerators
US11847569B2 (en) Training and application method of a multi-layer neural network model, apparatus and storage medium
US11488019B2 (en) Lossless model compression by batch normalization layer pruning in deep neural networks
CN110399211B (en) Distribution system, method and device for machine learning and computer equipment
EP4100887A1 (en) Method and system for splitting and bit-width assignment of deep learning models for inference on distributed systems
CN113241064B (en) Speech recognition, model training method and device, electronic equipment and storage medium
CN111178514A (en) Neural network quantification method and system
CN114580636B (en) Neural network lightweight deployment method based on three-target joint optimization
JPWO2019146189A1 (en) Neural network rank optimizer and optimization method
CN108268950B (en) Iterative neural network quantization method and system based on vector quantization
KR20190093932A (en) Arithmetic processing apparatus and method in deep running system
CN111984414B (en) Data processing method, system, equipment and readable storage medium
CN111344719A (en) Data processing method and device based on deep neural network and mobile device
CN111783843A (en) Feature selection method and device and computer system
Peter et al. Resource-efficient dnns for keyword spotting using neural architecture search and quantization
CN111401043A (en) Method, device and equipment for mining similar meaning words and storage medium
CN110276448B (en) Model compression method and device
CN111797984B (en) Quantification and hardware acceleration method and device for multi-task neural network
CN113505804A (en) Image identification method and system based on compressed deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant