CN107292382A

CN107292382A - A kind of neutral net acoustic model activation primitive pinpoints quantization method

Info

Publication number: CN107292382A
Application number: CN201610191900.8A
Authority: CN
Inventors: 张鹏远; 邢安昊; 潘接林; 颜永红
Original assignee: Institute of Acoustics CAS; Beijing Kexin Technology Co Ltd
Current assignee: Institute of Acoustics CAS; Beijing Kexin Technology Co Ltd
Priority date: 2016-03-30
Filing date: 2016-03-30
Publication date: 2017-10-24

Abstract

The present invention provides a kind of neutral net acoustic model activation primitive fixed point quantization method, and this method is specifically included：Step (1) is at l layers of DNN models, by x in the floating-point activation vector of 1 layer of the l^(l‑1)=[x₁,…,x_N]^TEach floating-point activation value be linearly quantified as 0~2^KInteger, obtain 1 layer of the l linear fixed point quantify activation vector x^*(l‑1)；The activation vector x that the l 1 layer of the step (2) to being obtained in step (1) linear fixed point quantifies^*(l‑1)In each activation value be further classified, will each activation value be approximately 2 the most neighbouring whole power, finally give 1 layer of the l classification fixed point quantify activation vector x^**(l‑1)；Step (3) carries out linear fixed point to l layers and quantified, will this layer of floating type weight matrix W^(l)Weight w_M,NLinearly it is quantified as the integer between 127 to 127；The feedforward that step (4) carries out l layers of DNN is calculated, and finally gives l layers of floating type activation vector x^(l)。

Description

A kind of neutral net acoustic model activation primitive pinpoints quantization method

Technical field

The present invention relates to technical field of voice recognition, more particularly to a kind of neutral net acoustic model activation primitive fixed point Quantization method.

Background technology

In field of speech recognition, acoustics is carried out using deep-neural-network (Deep Neural Network, DNN) Modeling achieves good effect.DNN deep structure causes model to have very strong learning ability, needs simultaneously Substantial amounts of floating number multiply-add operation, DNN major calculations equation below：

x^(l)=σ (W^(l)·x^(l-1))

Wherein, l is the number of plies, x^(l)For l layers of activation vector, W^(l)For l layers of weight matrix, x^(l-1)For l-1 The activation vector of layer, σ () is activation primitive；It is generally adopted by sigmoid functions, as follows：

Wherein, e^-xFor exponent arithmetic.

The method that fixed point quantization is carried out to DNN models is used to reduce floating-point operation amount, and its principle is as follows：

Due to sigmoid function σ (x) ∈ (0,1), floating type activation value can be used without 8 fixed-point integers of symbol Represent, i.e., activation value is linearly quantified as to 0~256 integer (for sake of simplicity, without expression layer in below equation Several subscript)：

x^*=round (x256)

Wherein, x* is the activation vector after linear fixed point quantifies, and round () operates for round.

Meanwhile, by weight matrix using there are 8 fixed-point integers of symbol to represent, i.e., weights are linearly quantified as - 127~127 integer, with following two formula (for sake of simplicity, without the subscript for representing the number of plies in below equation)：

w_max=max (| W |)

Wherein, W is weight matrix, and max (| W |) is takes out the element value of the maximum absolute value in matrix W, w_maxFor Maximum absolute value value in weight matrix W in all elements, W^*Weight matrix after quantifying for linear fixed point.

After the fixed point quantization of model, the floating-point operation of script can just be changed into fixed-point calculation, so as to save meter It is counted as this.Model switchs to from floating-point after fixed point, due to the reduction of data precision, and the performance of model also can slightly drop It is low, on the other hand, quantify floating-point operation to be converted into fixed-point calculation even across fixed point, the quantity of multiplication does not subtract Few, DNN models still need substantial amounts of fixed point multiply-add operation.Therefore, it is limited in computing resource, especially multiply Acoustic Modeling on a limited number of embedded devices of musical instruments used in a Buddhist or Taoist mass using DNN progress speech recognitions is very difficult.

The content of the invention

It is an object of the present invention to the defect of number of multiplication operations can not be reduced to solve existing fixed point quantization method, Quantization method is pinpointed the invention provides a kind of neutral net acoustic model activation primitive, this method is specifically included：

Step (1) is at l layers of DNN models, by x in described l-1 layers of floating-point activation vector^(l-1)= [x₁,…,x_N]^TEach floating-point activation value be linearly quantified as 0~2^KInteger, K for quantify series, obtain institute State the activation vector that l-1 layers of linear fixed point quantifies

Wherein,

Wherein, x_iFor l-1 layers of floating-point activation vector x^(l-1)I-th dimension element,For l-1 layers of line Property fixed point quantify activation vector x^*(l-1)I-th dimension element；

The activation vector x that step (2) quantifies to described l-1 layers of the linear fixed point obtained in step (1)^*(l-1) In each activation value be further classified, will each activation value be approximately 2 the most neighbouring whole power, The classification fixed point for finally giving described l-1 layers quantifies activation vector

Described l-1 layers of classification fixed point quantifies activation vectorIn appoint One element of meaningValue need meet：

Specific corresponding relation is as follows：

Wherein, x^*(l-1)For the element in the activation vector of l-1 layers of linear fixed points, x^**(l-1)For l-1 layers Element in the activation vector that classification fixed point quantifies,WithRepresent to round up and round downwards respectively.

Step (3) carries out linear fixed point to l layers according to equation below and quantified, will this layer of floating type weight matrix W^(l)Weight w_i,jLinearly it is quantified as the integer between -127 to 127：

Wherein, W^(l)To include l layers of weight w_i,jThe floating type power of (wherein i=1 ..., M, j=1 ..., N) Value matrix, i.e.,

max(|W^(l)|) it is to take out matrix W^(l)Maximum absolute value element value,For l layers of weight matrix W^(l) Maximum absolute value value in middle all elements, W^*(l)Floating type weight matrix after quantifying for linear fixed point, round () Operated for round.

Step (4) is according to equation below, and the feedforward for carrying out l layers of DNN is calculated：

a^(l)=W^*(l)·x^**(l-1)

Wherein, x^**(l-1)The activation vector quantified for l-1 layers of classification fixed points；W^*(l)After quantifying for linear fixed point Floating type weight matrix；a^(l)For temporary variable；

a^(l)First with the de-quantization factorIt is multiplied, then by sigmoid function σ (), finally gives l layer and float Point-type activates vector x^(l)。

The advantage of the invention is that：A large amount of multiplyings during the present invention can calculate DNN feedforwards are shifted with integer Operation is substituted, and greatly reduces demand of the DNN calculating for computing resource, especially multiplier；Simplify DNN The multiplication of weight matrix and activation vector in feedforward calculating.

Brief description of the drawings

Fig. 1 is that the neutral net activation primitive of the present invention pinpoints the flow chart of quantization method

Fig. 2 is the DNN structural representations used in embodiment

Embodiment

Below in conjunction with accompanying drawing, the present invention is described in further detail.

Quantization method is pinpointed the invention provides a kind of neutral net acoustic model activation primitive, by DNN activation value 2 whole power is quantified as, in this way, in DNN computings, determining in weight matrix and activation multiplication of vectors Point multiplication computing can be substituted for the shifting function of fixed-point number.

In the present embodiment, the experimental situation tested is：Input feature value includes 572 dimensions, specifically, Input feature vector is that 52 dimensions perceive linear prediction (PLP) feature, and has done the context extension of front and rear each 5 frame, most The input vector of 52*11=572 dimensions is obtained eventually, and the weight matrix scale of input layer is 2048*572.In addition, DNN Each self-contained 2048*2048 scales of 2 hidden layers weight matrix, the weight matrix scale of output layer is 19508*2048, output vector has 19508 dimensions, corresponding to the context-sensitive phoneme state after binding.Voice Identification mission uses Chinese test set, has 3000 hours speech datas as training set, 2 hour datas are made For test set.

In the present embodiment, as shown in Fig. 2 the DNN models have 4 layers, exemplified by the 2nd layer, now provide Specific embodiment, as shown in Figure 1：

Step (1) is at the 2nd layer of DNN models, by x in described 1st layer of floating-point activation vector⁽¹⁾= [x₁,…,x_N]^TEach floating-point activation value be linearly quantified as 0~2^KInteger, K for quantify series, obtain institute State the activation vector that the 1st layer of linear fixed point quantifies

Wherein,

Wherein, x_iFor the 1st layer of floating-point activation vector x⁽¹⁾I-th dimension element,For the 1st layer of linear fixed point amount The activation vector x of change^*(1)I-th dimension element；

The activation vector x that step (2) quantifies to described 1st layer of the linear fixed point obtained in step (1)^*(1)In Each activation value be further classified, will each activation value be approximately 2 the most neighbouring whole power, most The classification fixed point for obtaining described 1st layer eventually quantifies activation vector

Described 1st layer of classification fixed point quantifies activation vectorIn any one member ElementValue need meet：

Specific corresponding relation is as follows：

Wherein, x^*(1)For the element in the activation vector of the 1st layer of linear fixed point, x^**(1)For the 1st layer of classification fixed point Element in the activation vector of quantization,WithRepresent to round up and round downwards respectively.

Step (3) carries out linear fixed point to the 2nd layer and quantified according to formula (2) and (3), will this layer of floating-point Type weight matrix W⁽²⁾Weight w_M,NLinearly it is quantified as the integer between -127 to 127：

Wherein, W⁽²⁾To include the 2nd layer of weight w_i,jThe floating type of (wherein i=1 ..., M, j=1 ..., N) Weight matrix, scale is that M × N is

max(|W⁽²⁾|) it is to take out matrix W⁽²⁾Maximum absolute value element value,For the 2nd layer of weight matrix W⁽²⁾Maximum absolute value value in middle all elements, W^*(2)Floating type weight matrix after quantifying for linear fixed point, Round () operates for round.

Step (4) is according to formula (4) and (5), and the feedforward for carrying out the 2nd layer of DNN is calculated：

a⁽²⁾=W^*(2)·x^**(1) (4)

Wherein, x^**(1)The activation vector quantified for the 1st layer of classification fixed point；W^*(2)It is floating after quantifying for linear fixed point Point-type weight matrix；a⁽²⁾For temporary variable；

As shown in figure 1, according to computer language, if it is determined that the 2nd layer is output layer, thenExported for DNN Layer, i.e., the 2nd layer of activation primitive in the present embodiment is specific such as formula (6-1) usually using softmax functions Shown in (6-2)：

Wherein, e is natural constant, y_iFor vectorial y i-th dimension element, x_iAnd x_kRespectively the i-th dimension of vector x and kth are tieed up Element,WithIt is exponent arithmetic, N is vector x and y dimension.

In formula (5), a⁽²⁾First with the de-quantization factorIt is multiplied, then by the sigmoid functions σ (), The floating type for finally giving the 2nd layer activates vector x⁽²⁾, i.e., whole DNN output vector.

If it is determined that the 2nd layer is not output layer, then the 2nd layer of floating type activation vector x is obtained⁽²⁾, for the 3rd layer Calculating.

Based on above-mentioned fixed point quantization method, the activation of the weight matrix and last layer of current layer during DNN feedforwards are calculated Multiplication of vectors can further simplify, so that the 2nd layer of feedforward of whole network is calculated as an example：

a⁽²⁾=W^*(2)·x^**(1)

Wherein M=N=2048, specific computational methods are：

Wherein,For interim vector a⁽²⁾I-th dimension element,For the weight matrix after second layer equal interval quantizing The element of i-th row jth row, that is, connect the weights of the 2nd layer of j-th of neuron and the 1st layer of i-th of neuron,For The jth dimension element of activation vector after first layer scalar quantization, N is vector x^**(1)Dimension, i.e., the 1st layer neuron Number；

Assuming thatWherein,

In computer or other embedded systems, for integer y and x, there is y2^x=y<<X, wherein "<<” For shift left operation, therefore calculated above it can be changed into：

Multiplying i.e. in script Matrix Multiplication vector, is all shifted operation and substitutes, and this greatly reduces DNN meters Calculate the demand for computing resource, especially multiplier.

The performance to the present embodiment is analyzed below.

The character error rate (word error rate, WER) of each model is tested using test set, model is respectively original The floating-point mould of beginning, linearly pinpoints quantitative model (weights and activation value quantify by linear fixed point) and different K values Classification fixed point quantitative model (weights quantify for linear fixed point, and activation value quantifies for classification fixed point)；

The calculation formula of character error rate is as follows：

Experimental result is shown in Table 1：

Table 1

Further scalar quantization is carried out to fixed point quantitative model it can be seen from experimental result, performance is not bright Aobvious decline, it should be noted that during K=5, the performance of scalar quantization model is even better than fixed point quantitative model, with Original floating-point mould performance is approached.The scalar quantization for activation value newly introduced, is not bringing significant performance On the premise of loss, most multiplyings in calculating that DNN is feedovered have been substituted for shifting function, drop significantly The low computational resource requirements that DNN is applied to embedded system, especially for the requirement of number of multipliers.

Finally it should be noted that the experiment figure described in embodiment is only used for illustrating the technology of the present invention The feasibility of scheme software algorithm and be not limited to this example, algorithm pass through lot of experimental data checking, be true Reliably, collocation hardware just can realize technical scheme.Although having been carried out in detail to the present invention with reference to embodiment Describe in detail it is bright, it will be understood by those within the art that, technical scheme is modified or equivalent Replace, without departure from the spirit and scope of technical solution of the present invention, it all should cover the claim model in the present invention Among enclosing.

Claims

1. a kind of neutral net acoustic model activation primitive pinpoints quantization method, it is characterised in that this method is specifically wrapped Include：

Wherein,

Step (3) carries out linear fixed point to l layers and quantified, will this layer of floating type weight matrix W^(l)Weight w_M,N Linearly it is quantified as the integer between -127 to 127；

The feedforward that step (4) carries out l layers of DNN is calculated, and finally gives l layers of floating type activation vector x^(l)：

a^(l)=W^*(l)·x^**(l-1)

<mrow> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <mrow> <msup> <mi>a</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> <mo>&CenterDot;</mo> <mfrac> <msubsup> <mi>w</mi> <mi>max</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mn>127</mn> <mo>&times;</mo> <msup> <mn>2</mn> <mi>K</mi> </msup> </mrow> </mfrac> </mrow> <mo>)</mo> </mrow> </mrow>

Wherein, x^**(l-1)The activation vector quantified for l-1 layers of classification fixed points；W^*(l)After quantifying for linear fixed point Floating type weight matrix；For l layers of weight matrix W^(l)Maximum absolute value value in middle all elements, a^(l)For Temporary variable.

2. a kind of neutral net acoustic model activation primitive fixed point quantization method according to claim 1, it is special Levy and be, in the step (2), described l-1 layers of classification fixed point quantifies activation vector In any one elementValue need meet：

Specific corresponding relation is as follows：

3. a kind of neutral net acoustic model activation primitive fixed point quantization method according to claim 1, it is special Levy and be, in the step (3), l layers are carried out with linear fixed point quantization and is further comprised：

<mrow> <msup> <mi>W</mi> <mrow> <mo>*</mo> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>=</mo> <mi>r</mi> <mi>o</mi> <mi>u</mi> <mi>n</mi> <mi>d</mi> <mrow> <mo>(</mo> <mrow> <mfrac> <msup> <mi>W</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> <msubsup> <mi>W</mi> <mi>max</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> </mfrac> <mo>&CenterDot;</mo> <mn>127</mn> </mrow> <mo>)</mo> </mrow> </mrow>

max(|W^(l)|) it is to take out matrix W^(l)Maximum absolute value element value, W^*(l)After quantifying for linear fixed point Floating type weight matrix, round () operates for round.