CN106847268B

CN106847268B - Neural network acoustic model compression and voice recognition method

Info

Publication number: CN106847268B
Application number: CN201510881044.4A
Authority: CN
Inventors: 张鹏远; 邢安昊; 潘接林; 颜永红
Original assignee: Institute of Acoustics CAS; Beijing Kexin Technology Co Ltd
Current assignee: Institute of Acoustics CAS; Beijing Kexin Technology Co Ltd
Priority date: 2015-12-03
Filing date: 2015-12-03
Publication date: 2020-04-24
Anticipated expiration: 2035-12-03
Also published as: CN106847268A

Abstract

The invention provides a compression method of a neural network acoustic model, which comprises the following steps: dividing row vectors of an output layer weight matrix W of the neural network acoustic model according to a specified dimensionDividing the vector into a plurality of sub-vectors; performing first-stage vector quantization on a plurality of sub-vectors to obtain a first-stage codebook, and replacing the sub-vectors of the matrix W with the first-stage codebook vectors to obtain the matrix W^*(ii) a Using matrices W and W^*Calculating a residual error matrix R, and performing two-stage vector quantization on a vector of R; obtaining a secondary codebook, and replacing the vector of the matrix R with the vector of the secondary codebook to obtain the matrix R^*(ii) a Finally using the matrix W^*And R^*Representing the weight matrix W. The method can reduce the storage space of the neural network acoustic model, greatly reduce the quantization error and avoid the exponential increase of the codebook scale.

Description

Neural network acoustic model compression and voice recognition method

Technical Field

The invention relates to the field of voice recognition, in particular to a neural network acoustic model compression and voice recognition method.

Background

In the field of speech recognition, acoustic modeling by using Deep Neural Networks (DNN) has a good effect. The deep structure of DNN makes the model have strong learning ability, and results in huge model parameters, so it is difficult to apply DNN to acoustic modeling for speech recognition on mobile devices with weak computing power: the problems of large storage requirement and high computational complexity are mainly faced.

Vector quantization based methods are used to compress the DNN model, saving storage space and computational effort. The principle is as follows:

weight matrix for DNN

Each line thereof is orientedAll quantities are split into

Subvectors of dimension d:

wherein

Is the jth sub-vector of the ith row of the weight matrix W, the superscript T denotes the transpose,

thereafter, all the subvectors are quantized into K codebook vectors using a vector quantization method. Thus, the original M × N matrix can be represented by a codebook including K d-dimensional vectors, and further needs (log)₂K) X (MJ) bits to record the index of each sub-vector in the codebook. In the forward calculation of DNN, because the sub-vectors in the same column are multiplied by the same activation vector, if a plurality of sub-vectors exist in the sub-vectors in the same column and are quantized into the same codebook vector, the multiplication results of the sub-vectors and the activation vector can be shared, thereby reducing the calculation times.

The method for compressing DNN by using vector quantization may cause DNN performance to be affected, and the affected degree depends on quantization error of vector quantization, however, in the conventional vector quantization, only a single-stage codebook is used, and when the codebook is small (i.e. the number of codebook vectors in the codebook is small), the quantization error is high, and in order to reduce the quantization error, the codebook size has to be exponentially increased, which may greatly increase the amount of computation, so that the method loses the significance of saving space and computation.

Disclosure of Invention

The invention aims to solve the problem of large quantization error of a method for compressing DNN by vector quantization, and provides a method for compressing DNN by using a multi-stage vector quantization method, adding a second-stage quantization, quantizing the residual error of the first-stage quantization again, and finally replacing the original weight matrix by using a two-stage codebook, thereby greatly reducing the quantization error and avoiding exponential increase of the codebook scale.

In order to achieve the above object, the present invention provides a compression method of a neural network acoustic model, the method comprising: dividing row vectors of an output layer weight matrix W of the neural network acoustic model into a plurality of sub-vectors according to a specified dimension; performing first-stage vector quantization on a plurality of sub-vectors to obtain a first-stage codebook, and replacing the sub-vectors of the matrix W with the first-stage codebook vectors to obtain the matrix W^*(ii) a Using matrices W and W^*Calculating a residual error matrix R, and performing two-stage vector quantization on a vector of R; obtaining a secondary codebook, and replacing the vector of the matrix R with the vector of the secondary codebook to obtain the matrix R^*(ii) a Finally using the matrix W^*And R^*Representing the weight matrix W.

In the above technical solution, the method specifically includes:

step S1) splits the row vector of the output layer weight matrix W of the neural network acoustic model into sub-vectors of dimension d:

wherein W is an M × N matrix;

step S2) carrying out primary vector quantization on the sub-vectors obtained in the step S1) to obtain a primary codebook, and replacing the sub-vectors of the matrix W with the primary codebook vectors to obtain the matrix W^*；

Performing primary vector quantization on the subvectors obtained in the step S1) to obtain a primary codebook

The codebook contains K₁A codebook vector, wherein a first-level codebook vector corresponding to the jth sub-vector of the ith row of the weight matrix W is set at C⁽¹⁾In (1)Index value id⁽¹⁾(i,j)∈｛1,…,K₁Is the corresponding codebook vector of

Using codebook vectors

Subvectors replacing the matrix W

Obtain matrix W^*：

Step S3) using the matrices W and W^*Calculating a residual error matrix R, and performing two-stage vector quantization on a vector of R; obtaining a secondary codebook, and replacing the vector of the matrix R with the vector of the secondary codebook to obtain the matrix R^*；

Calculating a residual matrix R:

wherein the content of the first and second substances,

for vector

Performing two-stage vector quantization to obtain a two-stage codebook

The codebook contains K₂A codebook vector corresponding to the jth sub-vector in the ith row of the weight matrix R is set as C⁽²⁾Index value in is id⁽²⁾(i,j)∈｛1,…,K₂Is the corresponding codebook vector of

Replacement of the corresponding sub-vectors of the matrix R by codebook vectors

Obtain a matrix R^*：

Step S4) uses the matrix W^*And R^*Representing the weight matrix W:

subvectors in the matrix W

Index in the two-level codebook is id⁽¹⁾(i, j) and id⁽²⁾(i, j); thus storage W is converted to storage id⁽¹⁾(i, j) and id⁽²⁾(i,j)。

In the above technical solution, the value of d in the step 1) satisfies: d is divisible by the number of rows N of the matrix W.

Based on the compression method of the neural network acoustic model, the invention also provides a voice recognition method, which comprises the following steps:

step T1) for the input speech feature vector, after the forward calculation of the input layer and the hidden layer, obtaining the vector

Splitting the vector into sub-vectors with dimension d to obtain

Wherein

Step T2) computing the output layer

The method specifically comprises the following steps:

the weight matrix W is composed of two codebooks C⁽¹⁾And C⁽²⁾And corresponding index id⁽¹⁾(i, j) and id⁽²⁾(i, j) where i ∈ {1,2, …, M },

go through

For i ═ 1,2, …, M, calculated sequentially

And

if in the process there is an id^(k)(i,j)＝id^(k)(i′,j)，k∈｛1,2｝，i′>i, then calculating

When it is used directly

The result of (1); and (3) calculating:

obtaining an output: y ═ y₁,…,y_i,…,y_M]；

Step T3) carrying out softmax warping on y to obtain a likelihood value

Wherein

Step T4) sending a to a decoder for decoding; a recognition result in text form is obtained.

The invention has the advantages that: the method can reduce the storage space of the neural network acoustic model, greatly reduce the quantization error and avoid the exponential increase of the codebook scale.

Drawings

FIG. 1 is a flow chart of a neural network acoustic model compression method of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples.

As shown in fig. 1, a method for compressing an acoustic model of a neural network, the method comprising:

step S1) splits the row vectors of the output layer weight matrix W of the neural network acoustic model (DNN) into subvectors of dimension d:

wherein W is an M × N matrix;

in this embodiment, the DNN model has 7 layers, wherein the scale of the weight matrix of 5 hidden layers is 5000 × 500, the scale of the weight matrix of the input layer is 5000 × 360, and the scale of the weight matrix of the output layer is 20000 × 500. The dimension of the input observation vector is 360, specifically, the dimension of the input observation vector is 40-dimensional features obtained by performing expansion, Linear Discriminant Analysis (LDA), Maximum Likelihood Linear Transformation (MLLT) and maximum likelihood linear regression (FMLLR) on 13-dimensional mel-domain cepstrum coefficient (MFCC) features, and then the input observation vector is subjected to expansion of 4 frames of context to obtain input features of (4+1+4) × 40 ═ 360 dimensions. The adopted data set is a standard English data set Switchboard, the training data is 286 hours, and the testing data is 3 hours; the output layer parameters account for about half of the total model parameters.

In the present embodiment, it is preferred that,

step S2) performs one-level vector quantization on the subvectors obtained in step 1) using a codebook of size 1024,obtaining a first-level codebook

The codebook contains K₁A codebook vector corresponding to the jth sub-vector of the ith row of the weight matrix W is set as C⁽¹⁾Index value in is id⁽¹⁾(i,j)∈｛1,…,K₁Is the corresponding codebook vector of

Replacement of subvectors of matrix W by codebook vector identification

Obtain matrix W^*：

Calculating the residual error of the first-stage quantization to obtain a residual error matrix R:

wherein the content of the first and second substances,

performing secondary vector quantization on the residual vector by adopting a 1024-scale codebook to obtain a secondary codebook

Using codebook vectors

Sub-vectors replacing the corresponding matrix R

Obtain a matrix R^*：

Step S4) uses the matrix W^*And R^*Representing the weight matrix W:

subvectors in the matrix W

Index in the two-level codebook is id⁽¹⁾(i, j) and id⁽²⁾(i, j); thus storage W is converted to storage id⁽¹⁾(i, j) and id⁽²⁾(i,j)；

The method of the invention inherits the characteristic that the traditional method can save the calculated amount, in the method, one sub-vector can be quantized into the sum of the codebook vectors belonging to two different levels of codebooks, therefore, in the DNN forward calculation process, the multiplication of a single sub-vector and an activation vector can be converted into the multiplication and the summation of two parts respectively:

the operation can be simplified if the sub-vectors in the same column share a codebook vector in the first or second stage quantization.

Based on the neural network acoustic model compression method, the invention also provides a voice recognition method; the method comprises the following steps:

Splitting the vector into sub-vectors with dimension d to obtain

Wherein

In the present embodiment, it is preferred that,

and output layer weight matrix

Correspondingly, M is 20000, N is 500, and d is 4.

Step T2) computing the output layer

Since the weight matrix W can be composed of two codebooks C⁽¹⁾And C⁽²⁾And corresponding index id⁽¹⁾(i, j) and id⁽²⁾(i, j) where i ∈ {1,2, …, M },

go through

For i ═ 1,2, …, M, calculated sequentially

And

When it is used directly

Thereby saving the amount of calculation;

and (3) calculating:

obtaining an output: y ═ y₁,…,y_i,…,y_M]；

Step T3) carrying out softmax warping on y to obtain a likelihood value

Wherein

The performance of this example is analyzed below.

Testing Word Error Rates (WERs) of all models by using a test set, wherein the models are respectively an uncompressed model, a single-stage vector quantization compressed model (a 1024-scale codebook and an 8192-scale codebook) and a multi-stage vector quantization compressed model (the 1024-scale codebook is subjected to first-stage quantization, and the 1024-scale codebook is subjected to second-stage quantization);

the word error rate is calculated as follows:

the compression ratio is the ratio of the storage space required after the model is compressed and before the model is compressed, and the calculation formula is as follows:

wherein M and N are rows and columns of the matrix respectively and are respectively equal to 20000 and 500, J is the number of subvectors in each row, and the value is 500/4-125, K₁And K₂Respectively, the size of the two-stage codebook, sizeof (data) refers to the number of bits required to store a single data, such as 32 bits for floating point type data.

The storage space required by the weight matrix after the two-stage vector quantization compression is as follows:

sizeof(data)×d×(K₁+K₂)+log₂(K₁×K₂)×M×J。

the results are shown in Table 1:

TABLE 1

The experimental result shows that the single-stage vector quantization is adopted, the quantization error is large, and the DNN performance after the single-stage vector quantization compression is obviously damaged; after the DNN is compressed by adopting multi-stage vector quantization, only two codebooks with smaller scale are needed, so that the quantization error can be greatly reduced, and the identification performance of the model is nearly lossless. The last two rows in the table are compared: "8192" and "1024 + 1024", although the compression ratio of the model after multi-level vector quantization is higher than that of the model after single-level vector quantization, because the newly added two-level codebook requires additional space to record indexes; however, due to the reduction of the total size of the codebook, the performance of the multi-stage vector quantization method in the aspect of reducing the calculation amount is better than that of the single-stage vector quantization method, and the performance lossless compression of DNN is realized while the exponential increase of the codebook size is avoided.

Claims

1. A method of compression of a neural network acoustic model, the method comprising: dividing row vectors of an output layer weight matrix W of the neural network acoustic model into a plurality of sub-vectors according to a specified dimension; for a plurality of sub-vectorsLine-level vector quantization to obtain a level-level codebook, and replacing the sub-vector of the matrix W with the level-level codebook vector to obtain the matrix W^*(ii) a Using matrices W and W^*Calculating a residual error matrix R, and performing two-stage vector quantization on a vector of R; obtaining a secondary codebook, and replacing the vector of the matrix R with the vector of the secondary codebook to obtain the matrix R^*(ii) a Finally using the matrix W^*And R^*Representing a weight matrix W;

the method specifically comprises the following steps:

wherein W is an M × N matrix;

The codebook contains K₁A codebook vector, wherein a first-level codebook vector corresponding to the jth sub-vector of the ith row of the weight matrix W is set at C⁽¹⁾Index value in is id⁽¹⁾(i,j)∈{1,…,K₁Is the corresponding codebook vector of

Using codebook vectors

Subvectors replacing the matrix W

Obtain matrix W^*：

Calculating a residual matrix R:

wherein the content of the first and second substances,

for vector

Performing two-stage vector quantization to obtain a two-stage codebook

The codebook contains K₂A codebook vector corresponding to the jth sub-vector in the ith row of the weight matrix R is set as C⁽²⁾Index value in is id⁽²⁾(i,j)∈{1,…,K₂Is the corresponding codebook vector of

Obtain a matrix R^*：

Step S4) uses the matrix W^*And R^*Representing the weight matrix:

subvectors in the matrix W

2. The compression method of the neural network acoustic model according to claim 1, wherein the value of d in the step S1) satisfies the following condition: d is divisible by the number of columns N of the matrix W.

3. A speech recognition method implemented based on the compression method of the neural network acoustic model of claim 2, the method comprising: