CN106847268B - Neural network acoustic model compression and voice recognition method - Google Patents
Neural network acoustic model compression and voice recognition method Download PDFInfo
- Publication number
- CN106847268B CN106847268B CN201510881044.4A CN201510881044A CN106847268B CN 106847268 B CN106847268 B CN 106847268B CN 201510881044 A CN201510881044 A CN 201510881044A CN 106847268 B CN106847268 B CN 106847268B
- Authority
- CN
- China
- Prior art keywords
- matrix
- vector
- codebook
- vectors
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Abstract
The invention provides a compression method of a neural network acoustic model, which comprises the following steps: dividing row vectors of an output layer weight matrix W of the neural network acoustic model according to a specified dimensionDividing the vector into a plurality of sub-vectors; performing first-stage vector quantization on a plurality of sub-vectors to obtain a first-stage codebook, and replacing the sub-vectors of the matrix W with the first-stage codebook vectors to obtain the matrix W*(ii) a Using matrices W and W*Calculating a residual error matrix R, and performing two-stage vector quantization on a vector of R; obtaining a secondary codebook, and replacing the vector of the matrix R with the vector of the secondary codebook to obtain the matrix R*(ii) a Finally using the matrix W*And R*Representing the weight matrix W. The method can reduce the storage space of the neural network acoustic model, greatly reduce the quantization error and avoid the exponential increase of the codebook scale.
Description
Technical Field
The invention relates to the field of voice recognition, in particular to a neural network acoustic model compression and voice recognition method.
Background
In the field of speech recognition, acoustic modeling by using Deep Neural Networks (DNN) has a good effect. The deep structure of DNN makes the model have strong learning ability, and results in huge model parameters, so it is difficult to apply DNN to acoustic modeling for speech recognition on mobile devices with weak computing power: the problems of large storage requirement and high computational complexity are mainly faced.
Vector quantization based methods are used to compress the DNN model, saving storage space and computational effort. The principle is as follows:
weight matrix for DNNEach line thereof is orientedAll quantities are split intoSubvectors of dimension d:
whereinIs the jth sub-vector of the ith row of the weight matrix W, the superscript T denotes the transpose,thereafter, all the subvectors are quantized into K codebook vectors using a vector quantization method. Thus, the original M × N matrix can be represented by a codebook including K d-dimensional vectors, and further needs (log)2K) X (MJ) bits to record the index of each sub-vector in the codebook. In the forward calculation of DNN, because the sub-vectors in the same column are multiplied by the same activation vector, if a plurality of sub-vectors exist in the sub-vectors in the same column and are quantized into the same codebook vector, the multiplication results of the sub-vectors and the activation vector can be shared, thereby reducing the calculation times.
The method for compressing DNN by using vector quantization may cause DNN performance to be affected, and the affected degree depends on quantization error of vector quantization, however, in the conventional vector quantization, only a single-stage codebook is used, and when the codebook is small (i.e. the number of codebook vectors in the codebook is small), the quantization error is high, and in order to reduce the quantization error, the codebook size has to be exponentially increased, which may greatly increase the amount of computation, so that the method loses the significance of saving space and computation.
Disclosure of Invention
The invention aims to solve the problem of large quantization error of a method for compressing DNN by vector quantization, and provides a method for compressing DNN by using a multi-stage vector quantization method, adding a second-stage quantization, quantizing the residual error of the first-stage quantization again, and finally replacing the original weight matrix by using a two-stage codebook, thereby greatly reducing the quantization error and avoiding exponential increase of the codebook scale.
In order to achieve the above object, the present invention provides a compression method of a neural network acoustic model, the method comprising: dividing row vectors of an output layer weight matrix W of the neural network acoustic model into a plurality of sub-vectors according to a specified dimension; performing first-stage vector quantization on a plurality of sub-vectors to obtain a first-stage codebook, and replacing the sub-vectors of the matrix W with the first-stage codebook vectors to obtain the matrix W*(ii) a Using matrices W and W*Calculating a residual error matrix R, and performing two-stage vector quantization on a vector of R; obtaining a secondary codebook, and replacing the vector of the matrix R with the vector of the secondary codebook to obtain the matrix R*(ii) a Finally using the matrix W*And R*Representing the weight matrix W.
In the above technical solution, the method specifically includes:
step S1) splits the row vector of the output layer weight matrix W of the neural network acoustic model into sub-vectors of dimension d:
wherein W is an M × N matrix;
step S2) carrying out primary vector quantization on the sub-vectors obtained in the step S1) to obtain a primary codebook, and replacing the sub-vectors of the matrix W with the primary codebook vectors to obtain the matrix W*;
Performing primary vector quantization on the subvectors obtained in the step S1) to obtain a primary codebook The codebook contains K1A codebook vector, wherein a first-level codebook vector corresponding to the jth sub-vector of the ith row of the weight matrix W is set at C(1)In (1)Index value id(1)(i,j)∈{1,…,K1Is the corresponding codebook vector ofUsing codebook vectorsSubvectors replacing the matrix WObtain matrix W*:
Step S3) using the matrices W and W*Calculating a residual error matrix R, and performing two-stage vector quantization on a vector of R; obtaining a secondary codebook, and replacing the vector of the matrix R with the vector of the secondary codebook to obtain the matrix R*;
Calculating a residual matrix R:
for vectorPerforming two-stage vector quantization to obtain a two-stage codebookThe codebook contains K2A codebook vector corresponding to the jth sub-vector in the ith row of the weight matrix R is set as C(2)Index value in is id(2)(i,j)∈{1,…,K2Is the corresponding codebook vector ofReplacement of the corresponding sub-vectors of the matrix R by codebook vectorsObtain a matrix R*:
Step S4) uses the matrix W*And R*Representing the weight matrix W:
subvectors in the matrix WIndex in the two-level codebook is id(1)(i, j) and id(2)(i, j); thus storage W is converted to storage id(1)(i, j) and id(2)(i,j)。
In the above technical solution, the value of d in the step 1) satisfies: d is divisible by the number of rows N of the matrix W.
Based on the compression method of the neural network acoustic model, the invention also provides a voice recognition method, which comprises the following steps:
step T1) for the input speech feature vector, after the forward calculation of the input layer and the hidden layer, obtaining the vectorSplitting the vector into sub-vectors with dimension d to obtainWherein
the weight matrix W is composed of two codebooks C(1)And C(2)And corresponding index id(1)(i, j) and id(2)(i, j) where i ∈ {1,2, …, M },
go throughFor i ═ 1,2, …, M, calculated sequentiallyAndif in the process there is an id(k)(i,j)=id(k)(i′,j),k∈{1,2},i′>i, then calculatingWhen it is used directlyThe result of (1); and (3) calculating:
obtaining an output: y ═ y1,…,yi,…,yM];
Step T4) sending a to a decoder for decoding; a recognition result in text form is obtained.
The invention has the advantages that: the method can reduce the storage space of the neural network acoustic model, greatly reduce the quantization error and avoid the exponential increase of the codebook scale.
Drawings
FIG. 1 is a flow chart of a neural network acoustic model compression method of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples.
As shown in fig. 1, a method for compressing an acoustic model of a neural network, the method comprising:
step S1) splits the row vectors of the output layer weight matrix W of the neural network acoustic model (DNN) into subvectors of dimension d:
wherein W is an M × N matrix;
in this embodiment, the DNN model has 7 layers, wherein the scale of the weight matrix of 5 hidden layers is 5000 × 500, the scale of the weight matrix of the input layer is 5000 × 360, and the scale of the weight matrix of the output layer is 20000 × 500. The dimension of the input observation vector is 360, specifically, the dimension of the input observation vector is 40-dimensional features obtained by performing expansion, Linear Discriminant Analysis (LDA), Maximum Likelihood Linear Transformation (MLLT) and maximum likelihood linear regression (FMLLR) on 13-dimensional mel-domain cepstrum coefficient (MFCC) features, and then the input observation vector is subjected to expansion of 4 frames of context to obtain input features of (4+1+4) × 40 ═ 360 dimensions. The adopted data set is a standard English data set Switchboard, the training data is 286 hours, and the testing data is 3 hours; the output layer parameters account for about half of the total model parameters.
step S2) performs one-level vector quantization on the subvectors obtained in step 1) using a codebook of size 1024,obtaining a first-level codebookThe codebook contains K1A codebook vector corresponding to the jth sub-vector of the ith row of the weight matrix W is set as C(1)Index value in is id(1)(i,j)∈{1,…,K1Is the corresponding codebook vector ofReplacement of subvectors of matrix W by codebook vector identificationObtain matrix W*:
Step S3) using the matrices W and W*Calculating a residual error matrix R, and performing two-stage vector quantization on a vector of R; obtaining a secondary codebook, and replacing the vector of the matrix R with the vector of the secondary codebook to obtain the matrix R*;
Calculating the residual error of the first-stage quantization to obtain a residual error matrix R:
performing secondary vector quantization on the residual vector by adopting a 1024-scale codebook to obtain a secondary codebookThe codebook contains K2A codebook vector corresponding to the jth sub-vector in the ith row of the weight matrix R is set as C(2)Index value in is id(2)(i,j)∈{1,…,K2Is the corresponding codebook vector ofUsing codebook vectorsSub-vectors replacing the corresponding matrix RObtain a matrix R*:
Step S4) uses the matrix W*And R*Representing the weight matrix W:
subvectors in the matrix WIndex in the two-level codebook is id(1)(i, j) and id(2)(i, j); thus storage W is converted to storage id(1)(i, j) and id(2)(i,j);
The method of the invention inherits the characteristic that the traditional method can save the calculated amount, in the method, one sub-vector can be quantized into the sum of the codebook vectors belonging to two different levels of codebooks, therefore, in the DNN forward calculation process, the multiplication of a single sub-vector and an activation vector can be converted into the multiplication and the summation of two parts respectively:
the operation can be simplified if the sub-vectors in the same column share a codebook vector in the first or second stage quantization.
Based on the neural network acoustic model compression method, the invention also provides a voice recognition method; the method comprises the following steps:
step T1) for the input speech feature vector, after the forward calculation of the input layer and the hidden layer, obtaining the vectorSplitting the vector into sub-vectors with dimension d to obtainWherein
In the present embodiment, it is preferred that,and output layer weight matrixCorrespondingly, M is 20000, N is 500, and d is 4.
Since the weight matrix W can be composed of two codebooks C(1)And C(2)And corresponding index id(1)(i, j) and id(2)(i, j) where i ∈ {1,2, …, M },
go throughFor i ═ 1,2, …, M, calculated sequentiallyAndif in the process there is an id(k)(i,j)=id(k)(i′,j),k∈{1,2},i′>i, then calculatingWhen it is used directlyThereby saving the amount of calculation;
and (3) calculating:
obtaining an output: y ═ y1,…,yi,…,yM];
Step T4) sending a to a decoder for decoding; a recognition result in text form is obtained.
The performance of this example is analyzed below.
Testing Word Error Rates (WERs) of all models by using a test set, wherein the models are respectively an uncompressed model, a single-stage vector quantization compressed model (a 1024-scale codebook and an 8192-scale codebook) and a multi-stage vector quantization compressed model (the 1024-scale codebook is subjected to first-stage quantization, and the 1024-scale codebook is subjected to second-stage quantization);
the word error rate is calculated as follows:
the compression ratio is the ratio of the storage space required after the model is compressed and before the model is compressed, and the calculation formula is as follows:
wherein M and N are rows and columns of the matrix respectively and are respectively equal to 20000 and 500, J is the number of subvectors in each row, and the value is 500/4-125, K1And K2Respectively, the size of the two-stage codebook, sizeof (data) refers to the number of bits required to store a single data, such as 32 bits for floating point type data.
The storage space required by the weight matrix after the two-stage vector quantization compression is as follows:
sizeof(data)×d×(K1+K2)+log2(K1×K2)×M×J。
the results are shown in Table 1:
TABLE 1
The experimental result shows that the single-stage vector quantization is adopted, the quantization error is large, and the DNN performance after the single-stage vector quantization compression is obviously damaged; after the DNN is compressed by adopting multi-stage vector quantization, only two codebooks with smaller scale are needed, so that the quantization error can be greatly reduced, and the identification performance of the model is nearly lossless. The last two rows in the table are compared: "8192" and "1024 + 1024", although the compression ratio of the model after multi-level vector quantization is higher than that of the model after single-level vector quantization, because the newly added two-level codebook requires additional space to record indexes; however, due to the reduction of the total size of the codebook, the performance of the multi-stage vector quantization method in the aspect of reducing the calculation amount is better than that of the single-stage vector quantization method, and the performance lossless compression of DNN is realized while the exponential increase of the codebook size is avoided.
Claims (3)
1. A method of compression of a neural network acoustic model, the method comprising: dividing row vectors of an output layer weight matrix W of the neural network acoustic model into a plurality of sub-vectors according to a specified dimension; for a plurality of sub-vectorsLine-level vector quantization to obtain a level-level codebook, and replacing the sub-vector of the matrix W with the level-level codebook vector to obtain the matrix W*(ii) a Using matrices W and W*Calculating a residual error matrix R, and performing two-stage vector quantization on a vector of R; obtaining a secondary codebook, and replacing the vector of the matrix R with the vector of the secondary codebook to obtain the matrix R*(ii) a Finally using the matrix W*And R*Representing a weight matrix W;
the method specifically comprises the following steps:
step S1) splits the row vector of the output layer weight matrix W of the neural network acoustic model into sub-vectors of dimension d:
wherein W is an M × N matrix;
step S2) carrying out primary vector quantization on the sub-vectors obtained in the step S1) to obtain a primary codebook, and replacing the sub-vectors of the matrix W with the primary codebook vectors to obtain the matrix W*;
Performing primary vector quantization on the subvectors obtained in the step S1) to obtain a primary codebook The codebook contains K1A codebook vector, wherein a first-level codebook vector corresponding to the jth sub-vector of the ith row of the weight matrix W is set at C(1)Index value in is id(1)(i,j)∈{1,…,K1Is the corresponding codebook vector ofUsing codebook vectorsSubvectors replacing the matrix WObtain matrix W*:
Step S3) using the matrices W and W*Calculating a residual error matrix R, and performing two-stage vector quantization on a vector of R; obtaining a secondary codebook, and replacing the vector of the matrix R with the vector of the secondary codebook to obtain the matrix R*;
Calculating a residual matrix R:
for vectorPerforming two-stage vector quantization to obtain a two-stage codebookThe codebook contains K2A codebook vector corresponding to the jth sub-vector in the ith row of the weight matrix R is set as C(2)Index value in is id(2)(i,j)∈{1,…,K2Is the corresponding codebook vector ofReplacement of the corresponding sub-vectors of the matrix R by codebook vectorsObtain a matrix R*:
Step S4) uses the matrix W*And R*Representing the weight matrix:
2. The compression method of the neural network acoustic model according to claim 1, wherein the value of d in the step S1) satisfies the following condition: d is divisible by the number of columns N of the matrix W.
3. A speech recognition method implemented based on the compression method of the neural network acoustic model of claim 2, the method comprising:
step T1) for the input speech feature vector, after the forward calculation of the input layer and the hidden layer, obtaining the vectorSplitting the vector into sub-vectors with dimension d to obtainWherein
the weight matrix W is composed of two codebooks C(1)And C(2)And corresponding index id(1)(i, j) and id(2)(i, j) where i ∈ {1,2, …, M },
go throughFor i ═ 1,2, …, M, calculated sequentiallyAndif in the process there is an id(k)(i,j)=id(k)(i′,j),k∈{1,2},i′>i, then calculating When it is used directlyThe result of (1); and (3) calculating:
obtaining an output: y ═ y1,…,yi,…,yM];
Step T4) sending a to a decoder for decoding; a recognition result in text form is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510881044.4A CN106847268B (en) | 2015-12-03 | 2015-12-03 | Neural network acoustic model compression and voice recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510881044.4A CN106847268B (en) | 2015-12-03 | 2015-12-03 | Neural network acoustic model compression and voice recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106847268A CN106847268A (en) | 2017-06-13 |
CN106847268B true CN106847268B (en) | 2020-04-24 |
Family
ID=59149498
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510881044.4A Active CN106847268B (en) | 2015-12-03 | 2015-12-03 | Neural network acoustic model compression and voice recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106847268B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109147773B (en) * | 2017-06-16 | 2021-10-26 | 上海寒武纪信息科技有限公司 | Voice recognition device and method |
CN110809771A (en) * | 2017-07-06 | 2020-02-18 | 谷歌有限责任公司 | System and method for compression and distribution of machine learning models |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982803A (en) * | 2012-12-11 | 2013-03-20 | 华南师范大学 | Isolated word speech recognition method based on HRSF and improved DTW algorithm |
-
2015
- 2015-12-03 CN CN201510881044.4A patent/CN106847268B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106847268A (en) | 2017-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tang et al. | Deep speaker embedding learning with multi-level pooling for text-independent speaker verification | |
US20100217753A1 (en) | Multi-stage quantization method and device | |
US10115393B1 (en) | Reduced size computerized speech model speaker adaptation | |
Chang et al. | A Segment-based Speech Recognition System for Isolated Mandarin Syllables | |
JP2004341532A (en) | Adaptation of compressed acoustic model | |
Senior et al. | Fine context, low-rank, softplus deep neural networks for mobile speech recognition | |
Hong et al. | Statistics pooling time delay neural network based on x-vector for speaker verification | |
US8386249B2 (en) | Compressing feature space transforms | |
CN111008517A (en) | Tensor decomposition technology-based neural language model compression method | |
CN106847268B (en) | Neural network acoustic model compression and voice recognition method | |
CN111814448A (en) | Method and device for quantizing pre-training language model | |
CN114418088A (en) | Model training method | |
US9792910B2 (en) | Method and apparatus for improving speech recognition processing performance | |
CN112652299B (en) | Quantification method and device of time series speech recognition deep learning model | |
US20180165578A1 (en) | Deep neural network compression apparatus and method | |
Sakthi et al. | Speech Recognition model compression | |
US20220092382A1 (en) | Quantization for neural network computation | |
Marcheret et al. | Optimal quantization and bit allocation for compressing large discriminative feature space transforms | |
CN111368976B (en) | Data compression method based on neural network feature recognition | |
CN117133275B (en) | Parallelization voice recognition model establishment method based on unit dot product similarity characteristics | |
Paliwal et al. | Scalable distributed speech recognition using multi-frame GMM-based block quantization. | |
Pereira et al. | Evaluating Robustness to Noise and Compression of Deep Neural Networks for Keyword Spotting | |
Sun et al. | Combination of sparse classification and multilayer perceptron for noise-robust ASR | |
Hamid | Speaker Sound Coding Using Vector Quantization Technique (Vq) | |
Tan et al. | Quantization of speech features: source coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |