CN106847268A

CN106847268A - A kind of neutral net acoustic model compression and audio recognition method

Info

Publication number: CN106847268A
Application number: CN201510881044.4A
Authority: CN
Inventors: 张鹏远; 邢安昊; 潘接林; 颜永红
Original assignee: Institute of Acoustics CAS; Beijing Kexin Technology Co Ltd
Current assignee: Institute of Acoustics CAS; Beijing Kexin Technology Co Ltd
Priority date: 2015-12-03
Filing date: 2015-12-03
Publication date: 2017-06-13
Anticipated expiration: 2035-12-03
Also published as: CN106847268B

Abstract

The invention provides a kind of compression method of neutral net acoustic model, methods described includes：The row vector of the output layer weight matrix W of neutral net acoustic model is divided into several subvectors according to specified dimension；Several subvectors are carried out with one-level vector quantization, one-level code book is obtained, the subvector of matrix W is replaced with one-level codebook vectors, obtain matrix W^*；Using matrix W and W^*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations；Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain matrix R^*；Finally use matrix W^*And R^*Represent weight matrix W.The method of the present invention can reduce the memory space of neutral net acoustic model, while substantially reducing quantization error, it is to avoid code book scale is exponentially increased.

Description

A kind of neutral net acoustic model compression and audio recognition method

Technical field

The present invention relates to field of speech recognition, more particularly to a kind of neutral net acoustic model compresses and speech recognition side Method.

Background technology

In field of speech recognition, acoustics is carried out using deep-neural-network (Deep Neural Network, DNN) Modeling achieves good effect.The deep structure of DNN causes that model has very strong learning ability, also results in Model parameter amount is huge, therefore to carry out the sound of speech recognition using DNN on computing capability mobile device on the weak side Learn modeling very difficult：It is main to be faced with storage demand greatly and computation complexity problem high.

Method based on vector quantization is used to be compressed DNN models, so as to save storage area and amount of calculation. Its principle is as follows：

Weight matrix for DNNEach of which row vector is all split intoIndividual dimension is the son of d Vector：

WhereinIt is j-th subvector of the rows of weight matrix W i-th, subscript T represents transposition,It Afterwards, whole subvectors are quantified as K codebook vectors using the method for vector quantization.So, the square of script M × N Battle array, it is possible to represented with a code book for containing K d dimensional vector, further need exist for (log₂K the individual ratios of) × (MJ) Spy records each subvector index in the codebook.The method can save amount of calculation simultaneously, in the forward direction of DNN In calculating, because the subvector in same row is all to activate multiplication of vectors with identical, if in same row Same codebook vectors are quantified as in the presence of several subvectors, then these subvectors and activation vector in subvector Multiplied result can just share, so as to reduce calculation times.

The performance of DNN can be caused to be affected using the method for vector quantization compression DNN, its impacted degree The quantization error of vector quantization is depended on, but traditional vector quantization only uses single-stage code book, when code book is smaller (i.e. Codebook vectors negligible amounts in code book) when, quantization error is higher, in order to reduce quantization error, it has to be in Exponentially improve code book scale, amount of calculation can thus greatly increased so that the method lose save space and The meaning of calculating.

The content of the invention

The larger problem of the quantization error that exists it is an object of the invention to the method for overcoming vector quantization compression DNN, Proposition is compressed using the method for multi-stage vector quantization to DNN, by adding the quantization of the second level, to the first order The residual error of quantization is quantified again, final to replace original weight matrix using two-stage codebook, on the one hand significantly Quantization error is reduced, while avoid code book scale being exponentially increased.

To achieve these goals, the invention provides a kind of compression method of neutral net acoustic model, the side Method includes：The row vector of the output layer weight matrix W of neutral net acoustic model is divided into according to specified dimension Several subvectors；Several subvectors are carried out with one-level vector quantization, one-level code book is obtained, with one-level code book to Amount replaces the subvector of matrix W, obtains matrix W^*；Using matrix W and W^*, residual matrix R is calculated, and to R's Vector carries out two grades of vector quantizations；Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain square Battle array R^*；Finally use matrix W^*And R^*Represent weight matrix W.

In above-mentioned technical proposal, methods described is specifically included：

Step S1) row vector of the output layer weight matrix W of neutral net acoustic model is split into the son that dimension is d Vector：

Wherein, W is M × N matrix；

Step S2) to step S1) subvector that obtains carries out one-level vector quantization, obtains one-level code book, uses one-level Codebook vectors replace the subvector of matrix W, obtain matrix W^*；

To step S1) subvector that obtains carries out one-level vector quantization, obtains one-level code book The code book contains K altogether₁Individual codebook vectors, if corresponding to j-th subvector of the rows of weight matrix W i-th One-level codebook vectors are in C⁽¹⁾In index value be id⁽¹⁾(i, j) ∈ 1 ..., K₁, corresponding codebook vectors are Use codebook vectorsInstead of the subvector of matrix WObtain matrix W^*：

Step S3) utilize matrix W and W^*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations； Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain matrix R^*；

Calculate residual matrix R：

Wherein,

To vectorTwo grades of vector quantizations are carried out, two grades of code books are obtainedThe code book contains K altogether₂ Individual codebook vectors, if the codebook vectors corresponding to j-th subvector of the rows of weight matrix R i-th are in C⁽²⁾In index value It is id⁽²⁾(i, j) ∈ 1 ..., K₂, corresponding codebook vectors areReplace corresponding matrix R's with codebook vectors SubvectorObtain matrix R^*：

Step S4) use matrix W^*And R^*Represent weight matrix W：

Subvector in matrix WIndex in two-stage codebook is id⁽¹⁾(i, j) and id⁽²⁾(i,j)；So store W It is converted into storing id⁽¹⁾(i, j) and id⁽²⁾(i,j)。

In above-mentioned technical proposal, the step 1) in the value of d meet：D can be divided exactly by the line number N of matrix W.

Based on the compression method of above-mentioned neutral net acoustic model, present invention also offers a kind of audio recognition method, Methods described includes：

Step T1) for the speech feature vector of input, after the forward calculation by input layer and hidden layer, obtain VectorThe subvector that dimension is d is split into, is obtainedWherein

Step T2) calculate output layerSpecifically include：

Weight matrix W is by two code book C⁽¹⁾And C⁽²⁾And id is indexed accordingly⁽¹⁾(i, j) and id⁽²⁾(i, j) is represented, its Middle i ∈ { 1,2 ..., M },

TraversalFor i=1,2 ..., M are calculated successivelyWithSuch as There is id in the middle of this process in fruit^(k)(i, j)=id^(k)(i ', j), k ∈ { 1,2 }, i '>I, then work as calculatingWhen, directly useResult；Calculate：

Exported：Y=[y₁,…,y_i,…,y_M]；

Step T3) to carry out softmax to y regular, obtains likelihood valueWherein

Step T4) a feeding decoders are decoded；Obtain the recognition result of textual form.

The advantage of the invention is that：The method of the present invention can reduce the memory space of neutral net acoustic model, while Greatly reduce quantization error, it is to avoid code book scale is exponentially increased.

Brief description of the drawings

The flow chart of Fig. 1 neutral net acoustic model compression methods of the invention.

Specific embodiment

The present invention will be further described in detail with specific embodiment below in conjunction with the accompanying drawings.

As shown in figure 1, a kind of compression method of neutral net acoustic model, methods described includes：

Step S1) row vector of the output layer weight matrix W of neutral net acoustic model (DNN) is split into dimension Number is the subvector of d：

Wherein, W is M × N matrix；

In the present embodiment, DNN models have 7 layers, wherein the weight matrix scale of 5 hidden layers is all 5000 × 500, The weight matrix scale of input layer is 5000 × 360, and the weight matrix scale of output layer is 20000 × 500.Input is seen The dimension of direction finding amount be 360, specifically 13 Jan Vermeer domain cepstrum coefficient (MFCC) features by extension, Linear discriminant analysis (LDA), maximum likelihood linearly convert (MLLT) and the maximum likelihood of feature space is linear 40 dimensional features are obtained after returning (FMLLR), the extension of each 4 frame of context is carried out to it afterwards, obtained The input feature vector of (4+1+4) × 40=360 dimensions.The data set for using is standard English data set Switchboard, training Data are 286 hours, test data 3 hours；Output layer parameter amount accounts for the half of whole model parameter amount.

In the present embodiment,

Step S2) use scale for 1024 code book, to step 1) subvector that obtains carries out one-level vector quantization, Obtain one-level code bookThe code book contains K altogether₁Individual codebook vectors, if weights square Codebook vectors corresponding to j-th subvector of the battle array rows of W i-th are in C⁽¹⁾In index value be id⁽¹⁾(i,j)∈ 1 ..., K₁, corresponding codebook vectors areIdentified with codebook vectors and replace the subvector of matrix W Obtain matrix W^*：

The residual error that the first order quantifies is calculated, residual matrix R is obtained：

Wherein,

Use scale carries out two grades of vector quantizations to residual vector for 1024 code book, obtains two grades of code booksThe code book contains K altogether₂Individual codebook vectors, if j-th subvector of the rows of weight matrix R i-th Corresponding codebook vectors are in C⁽²⁾In index value be id⁽²⁾(i, j) ∈ 1 ..., K₂, corresponding codebook vectors areUse codebook vectorsInstead of the subvector of corresponding matrix RObtain matrix R^*：

Step S4) use matrix W^*And R^*Represent weight matrix W：

Subvector in matrix WIndex in two-stage codebook is id⁽¹⁾(i, j) and id⁽²⁾(i,j)；So store W It is converted into storing id⁽¹⁾(i, j) and id⁽²⁾(i,j)；

The method of the present invention inherits conventional method and can save the characteristic of amount of calculation, in the method, a subvector Two codebook vectors sums for adhering to code book not at the same level separately can be quantified as, therefore during DNN forward calculations, it is single Individual subvector with activation multiplication of vectors, can also be converted into two parts be multiplied respectively again plus and：

If the shared codebook vector in the first order or the second level quantify of the subvector in same row, it is possible to simplified operation.

Based on above-mentioned neutral net acoustic model compression method, present invention also offers a kind of audio recognition method；Institute The method of stating includes：

In the present embodiment,With output layer weight matrixCorrespondence, M=20000, N=500, d=4.

Step T2) calculate output layer

Because weight matrix W can be by two code book C⁽¹⁾And C⁽²⁾And id is indexed accordingly⁽¹⁾(i, j) and id⁽²⁾(i, j) table Show, wherein i ∈ { 1,2 ..., M },

TraversalFor i=1,2 ..., M are calculated successivelyWithSuch as There is id in the middle of this process in fruit^(k)(i, j)=id^(k)(i ', j), k ∈ { 1,2 }, i '>I, then work as calculatingWhen, can directly useResult, so as to save amount of calculation；

Calculate：

Exported：Y=[y₁,…,y_i,…,y_M]；

Step T3) to carry out softmax to y regular, obtains likelihood valueWherein

The performance to the present embodiment is analyzed below.

The word error rate (word error rate, WER) of each model is tested using test set, model is respectively not Compressed model, the model (code books of the code book of 1024 scales and 8192 scales) of single-stage vector quantization compression and (1024 scale code books carry out first order quantization to the model of multi-stage vector quantization compression, and 1024 scale code books carry out second Level quantifies)；

The computing formula of word error rate is as follows：

Compression ratio must be the ratio between storage area after model compression and needed for before compression, and its computing formula is：

Wherein M and N are respectively the row and column of matrix, and respectively equal to 20000 and 500, J are every row subvector number, value It is 500/4=125, K₁And K₂The respectively scale of two-stage codebook, sizeof (data) refers to storage individual data institute The bit number of needs, such as real-coded GA, it is necessary to 32 bits.

It is using the memory space required for the weight matrix after two grades of vector quantization compressions of the invention：

sizeof(data)×d×(K₁+K₂)+log₂(K₁×K₂)×M×J。

Experimental result is shown in Table 1：

Table 1

By experimental result as can be seen that using single-stage vector quantization, quantization error is larger, single-stage vector quantization pressure is used DNN performance impairments after contracting are obvious；After being compressed to DNN using multi-stage vector quantization, it is only necessary to use Two less code books of scale, just can substantially reduce quantization error, while so that the intimate nothing of the recognition performance of model Damage.Rear two row in contrast form：" 8192 " and " 1024+1024 ", although the pressure of model after multi-stage vector quantization Contracting is than higher than the model after single-stage vector quantization, this is because the new two grades of code books for adding need additional space to remember Record index；But have benefited from the diminution of code book total scale, table of the multilevel vector quantization method in terms of amount of calculation reduction Single-stage vector quantization method is now better than, while avoiding code book scale from being exponentially increased, has been accomplished to DNN's Performance undamaged compression.

Claims

1. a kind of compression method of neutral net acoustic model, methods described includes：By neutral net acoustic model The row vector of output layer weight matrix W is divided into several subvectors according to specified dimension；To several subvectors One-level vector quantization is carried out, one-level code book is obtained, the subvector of matrix W is replaced with one-level codebook vectors, obtain square Battle array W^*；Using matrix W and W^*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations；Obtain Two grades of code books, the vector of matrix R is replaced with two grades of codebook vectors, obtains matrix R^*；Finally use matrix W^*And R^*Table Show weight matrix W.

2. the compression method of neutral net acoustic model according to claim 1, it is characterised in that the side Method is specifically included：

Step S1) row vector of the output layer weight matrix W of neutral net acoustic model is split into dimension for d Subvector：

Wherein, W is M × N matrix；

To step S1) subvector that obtains carries out one-level vector quantization, obtains one-level code book The code book contains K altogether₁Individual codebook vectors, if corresponding to j-th subvector of the rows of weight matrix W i-th One-level codebook vectors are in C⁽¹⁾In index value be id⁽¹⁾(i,j)∈{1,…,K₁, corresponding codebook vectors are Use codebook vectorsInstead of the subvector of matrix WObtain matrix W^*：

Calculate residual matrix R：

Wherein,

r_{i, j}^{T} = ω_{i, j}^{T} - c_{{id}^{(1)} (i j)}^{T};

To vectorTwo grades of vector quantizations are carried out, two grades of code books are obtainedThe code book contains K altogether₂ Individual codebook vectors, if the codebook vectors corresponding to j-th subvector of the rows of weight matrix R i-th are in C⁽²⁾In index value It is id⁽²⁾(i,j)∈{1,…,K₂, corresponding codebook vectors areReplace corresponding matrix R's with codebook vectors SubvectorObtain matrix R^*：

Step S4) use matrix W^*And R^*Represent weight matrix：

3. the compression method of neutral net acoustic model according to claim 2, it is characterised in that the step It is rapid 1) in d value meet：D can be divided exactly by the line number N of matrix W.

4. a kind of audio recognition method, the compression method reality based on the neutral net acoustic model described in claim 3 Existing, the method includes：

Step T2) calculate output layerSpecifically include：

y_{i} = Σ_{j = 1}^{\frac{N}{d}} (c_{{id}^{(1)} (i, j)}^{T} \cdot x_{j} + c_{{id}^{(2)} (i, j)}^{T} \cdot x_{j})

Exported：Y=[y₁,…,y_i,…,y_M]；

Step T3) to carry out softmax to y regular, obtains likelihood valueWherein