CN106847268A - A kind of neutral net acoustic model compression and audio recognition method - Google Patents
A kind of neutral net acoustic model compression and audio recognition method Download PDFInfo
- Publication number
- CN106847268A CN106847268A CN201510881044.4A CN201510881044A CN106847268A CN 106847268 A CN106847268 A CN 106847268A CN 201510881044 A CN201510881044 A CN 201510881044A CN 106847268 A CN106847268 A CN 106847268A
- Authority
- CN
- China
- Prior art keywords
- matrix
- vector
- subvector
- codebook vectors
- grades
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000006835 compression Effects 0.000 title claims abstract description 21
- 238000007906 compression Methods 0.000 title claims abstract description 21
- 230000007935 neutral effect Effects 0.000 title claims abstract description 21
- 239000013598 vector Substances 0.000 claims abstract description 108
- 239000011159 matrix material Substances 0.000 claims abstract description 90
- 238000013139 quantization Methods 0.000 claims abstract description 46
- 238000004364 calculation method Methods 0.000 claims description 11
- 235000013399 edible fruits Nutrition 0.000 claims description 3
- 238000007476 Maximum Likelihood Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention provides a kind of compression method of neutral net acoustic model, methods described includes:The row vector of the output layer weight matrix W of neutral net acoustic model is divided into several subvectors according to specified dimension;Several subvectors are carried out with one-level vector quantization, one-level code book is obtained, the subvector of matrix W is replaced with one-level codebook vectors, obtain matrix W*;Using matrix W and W*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations;Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain matrix R*;Finally use matrix W*And R*Represent weight matrix W.The method of the present invention can reduce the memory space of neutral net acoustic model, while substantially reducing quantization error, it is to avoid code book scale is exponentially increased.
Description
Technical field
The present invention relates to field of speech recognition, more particularly to a kind of neutral net acoustic model compresses and speech recognition side
Method.
Background technology
In field of speech recognition, acoustics is carried out using deep-neural-network (Deep Neural Network, DNN)
Modeling achieves good effect.The deep structure of DNN causes that model has very strong learning ability, also results in
Model parameter amount is huge, therefore to carry out the sound of speech recognition using DNN on computing capability mobile device on the weak side
Learn modeling very difficult:It is main to be faced with storage demand greatly and computation complexity problem high.
Method based on vector quantization is used to be compressed DNN models, so as to save storage area and amount of calculation.
Its principle is as follows:
Weight matrix for DNNEach of which row vector is all split intoIndividual dimension is the son of d
Vector:
WhereinIt is j-th subvector of the rows of weight matrix W i-th, subscript T represents transposition,It
Afterwards, whole subvectors are quantified as K codebook vectors using the method for vector quantization.So, the square of script M × N
Battle array, it is possible to represented with a code book for containing K d dimensional vector, further need exist for (log2K the individual ratios of) × (MJ)
Spy records each subvector index in the codebook.The method can save amount of calculation simultaneously, in the forward direction of DNN
In calculating, because the subvector in same row is all to activate multiplication of vectors with identical, if in same row
Same codebook vectors are quantified as in the presence of several subvectors, then these subvectors and activation vector in subvector
Multiplied result can just share, so as to reduce calculation times.
The performance of DNN can be caused to be affected using the method for vector quantization compression DNN, its impacted degree
The quantization error of vector quantization is depended on, but traditional vector quantization only uses single-stage code book, when code book is smaller (i.e.
Codebook vectors negligible amounts in code book) when, quantization error is higher, in order to reduce quantization error, it has to be in
Exponentially improve code book scale, amount of calculation can thus greatly increased so that the method lose save space and
The meaning of calculating.
The content of the invention
The larger problem of the quantization error that exists it is an object of the invention to the method for overcoming vector quantization compression DNN,
Proposition is compressed using the method for multi-stage vector quantization to DNN, by adding the quantization of the second level, to the first order
The residual error of quantization is quantified again, final to replace original weight matrix using two-stage codebook, on the one hand significantly
Quantization error is reduced, while avoid code book scale being exponentially increased.
To achieve these goals, the invention provides a kind of compression method of neutral net acoustic model, the side
Method includes:The row vector of the output layer weight matrix W of neutral net acoustic model is divided into according to specified dimension
Several subvectors;Several subvectors are carried out with one-level vector quantization, one-level code book is obtained, with one-level code book to
Amount replaces the subvector of matrix W, obtains matrix W*;Using matrix W and W*, residual matrix R is calculated, and to R's
Vector carries out two grades of vector quantizations;Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain square
Battle array R*;Finally use matrix W*And R*Represent weight matrix W.
In above-mentioned technical proposal, methods described is specifically included:
Step S1) row vector of the output layer weight matrix W of neutral net acoustic model is split into the son that dimension is d
Vector:
Wherein, W is M × N matrix;
Step S2) to step S1) subvector that obtains carries out one-level vector quantization, obtains one-level code book, uses one-level
Codebook vectors replace the subvector of matrix W, obtain matrix W*;
To step S1) subvector that obtains carries out one-level vector quantization, obtains one-level code book The code book contains K altogether1Individual codebook vectors, if corresponding to j-th subvector of the rows of weight matrix W i-th
One-level codebook vectors are in C(1)In index value be id(1)(i, j) ∈ 1 ..., K1, corresponding codebook vectors are
Use codebook vectorsInstead of the subvector of matrix WObtain matrix W*:
Step S3) utilize matrix W and W*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations;
Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain matrix R*;
Calculate residual matrix R:
Wherein,
To vectorTwo grades of vector quantizations are carried out, two grades of code books are obtainedThe code book contains K altogether2
Individual codebook vectors, if the codebook vectors corresponding to j-th subvector of the rows of weight matrix R i-th are in C(2)In index value
It is id(2)(i, j) ∈ 1 ..., K2, corresponding codebook vectors areReplace corresponding matrix R's with codebook vectors
SubvectorObtain matrix R*:
Step S4) use matrix W*And R*Represent weight matrix W:
Subvector in matrix WIndex in two-stage codebook is id(1)(i, j) and id(2)(i,j);So store W
It is converted into storing id(1)(i, j) and id(2)(i,j)。
In above-mentioned technical proposal, the step 1) in the value of d meet:D can be divided exactly by the line number N of matrix W.
Based on the compression method of above-mentioned neutral net acoustic model, present invention also offers a kind of audio recognition method,
Methods described includes:
Step T1) for the speech feature vector of input, after the forward calculation by input layer and hidden layer, obtain
VectorThe subvector that dimension is d is split into, is obtainedWherein
Step T2) calculate output layerSpecifically include:
Weight matrix W is by two code book C(1)And C(2)And id is indexed accordingly(1)(i, j) and id(2)(i, j) is represented, its
Middle i ∈ { 1,2 ..., M },
TraversalFor i=1,2 ..., M are calculated successivelyWithSuch as
There is id in the middle of this process in fruit(k)(i, j)=id(k)(i ', j), k ∈ { 1,2 }, i '>I, then work as calculatingWhen, directly useResult;Calculate:
Exported:Y=[y1,…,yi,…,yM];
Step T3) to carry out softmax to y regular, obtains likelihood valueWherein
Step T4) a feeding decoders are decoded;Obtain the recognition result of textual form.
The advantage of the invention is that:The method of the present invention can reduce the memory space of neutral net acoustic model, while
Greatly reduce quantization error, it is to avoid code book scale is exponentially increased.
Brief description of the drawings
The flow chart of Fig. 1 neutral net acoustic model compression methods of the invention.
Specific embodiment
The present invention will be further described in detail with specific embodiment below in conjunction with the accompanying drawings.
As shown in figure 1, a kind of compression method of neutral net acoustic model, methods described includes:
Step S1) row vector of the output layer weight matrix W of neutral net acoustic model (DNN) is split into dimension
Number is the subvector of d:
Wherein, W is M × N matrix;
In the present embodiment, DNN models have 7 layers, wherein the weight matrix scale of 5 hidden layers is all 5000 × 500,
The weight matrix scale of input layer is 5000 × 360, and the weight matrix scale of output layer is 20000 × 500.Input is seen
The dimension of direction finding amount be 360, specifically 13 Jan Vermeer domain cepstrum coefficient (MFCC) features by extension,
Linear discriminant analysis (LDA), maximum likelihood linearly convert (MLLT) and the maximum likelihood of feature space is linear
40 dimensional features are obtained after returning (FMLLR), the extension of each 4 frame of context is carried out to it afterwards, obtained
The input feature vector of (4+1+4) × 40=360 dimensions.The data set for using is standard English data set Switchboard, training
Data are 286 hours, test data 3 hours;Output layer parameter amount accounts for the half of whole model parameter amount.
In the present embodiment,
Step S2) use scale for 1024 code book, to step 1) subvector that obtains carries out one-level vector quantization,
Obtain one-level code bookThe code book contains K altogether1Individual codebook vectors, if weights square
Codebook vectors corresponding to j-th subvector of the battle array rows of W i-th are in C(1)In index value be id(1)(i,j)∈
1 ..., K1, corresponding codebook vectors areIdentified with codebook vectors and replace the subvector of matrix W
Obtain matrix W*:
Step S3) utilize matrix W and W*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations;
Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain matrix R*;
The residual error that the first order quantifies is calculated, residual matrix R is obtained:
Wherein,
Use scale carries out two grades of vector quantizations to residual vector for 1024 code book, obtains two grades of code booksThe code book contains K altogether2Individual codebook vectors, if j-th subvector of the rows of weight matrix R i-th
Corresponding codebook vectors are in C(2)In index value be id(2)(i, j) ∈ 1 ..., K2, corresponding codebook vectors areUse codebook vectorsInstead of the subvector of corresponding matrix RObtain matrix R*:
Step S4) use matrix W*And R*Represent weight matrix W:
Subvector in matrix WIndex in two-stage codebook is id(1)(i, j) and id(2)(i,j);So store W
It is converted into storing id(1)(i, j) and id(2)(i,j);
The method of the present invention inherits conventional method and can save the characteristic of amount of calculation, in the method, a subvector
Two codebook vectors sums for adhering to code book not at the same level separately can be quantified as, therefore during DNN forward calculations, it is single
Individual subvector with activation multiplication of vectors, can also be converted into two parts be multiplied respectively again plus and:
If the shared codebook vector in the first order or the second level quantify of the subvector in same row, it is possible to simplified operation.
Based on above-mentioned neutral net acoustic model compression method, present invention also offers a kind of audio recognition method;Institute
The method of stating includes:
Step T1) for the speech feature vector of input, after the forward calculation by input layer and hidden layer, obtain
VectorThe subvector that dimension is d is split into, is obtainedWherein
In the present embodiment,With output layer weight matrixCorrespondence, M=20000,
N=500, d=4.
Step T2) calculate output layer
Because weight matrix W can be by two code book C(1)And C(2)And id is indexed accordingly(1)(i, j) and id(2)(i, j) table
Show, wherein i ∈ { 1,2 ..., M },
TraversalFor i=1,2 ..., M are calculated successivelyWithSuch as
There is id in the middle of this process in fruit(k)(i, j)=id(k)(i ', j), k ∈ { 1,2 }, i '>I, then work as calculatingWhen, can directly useResult, so as to save amount of calculation;
Calculate:
Exported:Y=[y1,…,yi,…,yM];
Step T3) to carry out softmax to y regular, obtains likelihood valueWherein
Step T4) a feeding decoders are decoded;Obtain the recognition result of textual form.
The performance to the present embodiment is analyzed below.
The word error rate (word error rate, WER) of each model is tested using test set, model is respectively not
Compressed model, the model (code books of the code book of 1024 scales and 8192 scales) of single-stage vector quantization compression and
(1024 scale code books carry out first order quantization to the model of multi-stage vector quantization compression, and 1024 scale code books carry out second
Level quantifies);
The computing formula of word error rate is as follows:
Compression ratio must be the ratio between storage area after model compression and needed for before compression, and its computing formula is:
Wherein M and N are respectively the row and column of matrix, and respectively equal to 20000 and 500, J are every row subvector number, value
It is 500/4=125, K1And K2The respectively scale of two-stage codebook, sizeof (data) refers to storage individual data institute
The bit number of needs, such as real-coded GA, it is necessary to 32 bits.
It is using the memory space required for the weight matrix after two grades of vector quantization compressions of the invention:
sizeof(data)×d×(K1+K2)+log2(K1×K2)×M×J。
Experimental result is shown in Table 1:
Table 1
By experimental result as can be seen that using single-stage vector quantization, quantization error is larger, single-stage vector quantization pressure is used
DNN performance impairments after contracting are obvious;After being compressed to DNN using multi-stage vector quantization, it is only necessary to use
Two less code books of scale, just can substantially reduce quantization error, while so that the intimate nothing of the recognition performance of model
Damage.Rear two row in contrast form:" 8192 " and " 1024+1024 ", although the pressure of model after multi-stage vector quantization
Contracting is than higher than the model after single-stage vector quantization, this is because the new two grades of code books for adding need additional space to remember
Record index;But have benefited from the diminution of code book total scale, table of the multilevel vector quantization method in terms of amount of calculation reduction
Single-stage vector quantization method is now better than, while avoiding code book scale from being exponentially increased, has been accomplished to DNN's
Performance undamaged compression.
Claims (4)
1. a kind of compression method of neutral net acoustic model, methods described includes:By neutral net acoustic model
The row vector of output layer weight matrix W is divided into several subvectors according to specified dimension;To several subvectors
One-level vector quantization is carried out, one-level code book is obtained, the subvector of matrix W is replaced with one-level codebook vectors, obtain square
Battle array W*;Using matrix W and W*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations;Obtain
Two grades of code books, the vector of matrix R is replaced with two grades of codebook vectors, obtains matrix R*;Finally use matrix W*And R*Table
Show weight matrix W.
2. the compression method of neutral net acoustic model according to claim 1, it is characterised in that the side
Method is specifically included:
Step S1) row vector of the output layer weight matrix W of neutral net acoustic model is split into dimension for d
Subvector:
Wherein, W is M × N matrix;
Step S2) to step S1) subvector that obtains carries out one-level vector quantization, obtains one-level code book, uses one-level
Codebook vectors replace the subvector of matrix W, obtain matrix W*;
To step S1) subvector that obtains carries out one-level vector quantization, obtains one-level code book The code book contains K altogether1Individual codebook vectors, if corresponding to j-th subvector of the rows of weight matrix W i-th
One-level codebook vectors are in C(1)In index value be id(1)(i,j)∈{1,…,K1, corresponding codebook vectors are
Use codebook vectorsInstead of the subvector of matrix WObtain matrix W*:
Step S3) utilize matrix W and W*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations;
Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain matrix R*;
Calculate residual matrix R:
Wherein,
To vectorTwo grades of vector quantizations are carried out, two grades of code books are obtainedThe code book contains K altogether2
Individual codebook vectors, if the codebook vectors corresponding to j-th subvector of the rows of weight matrix R i-th are in C(2)In index value
It is id(2)(i,j)∈{1,…,K2, corresponding codebook vectors areReplace corresponding matrix R's with codebook vectors
SubvectorObtain matrix R*:
Step S4) use matrix W*And R*Represent weight matrix:
Subvector in matrix WIndex in two-stage codebook is id(1)(i, j) and id(2)(i,j);So store W
It is converted into storing id(1)(i, j) and id(2)(i,j)。
3. the compression method of neutral net acoustic model according to claim 2, it is characterised in that the step
It is rapid 1) in d value meet:D can be divided exactly by the line number N of matrix W.
4. a kind of audio recognition method, the compression method reality based on the neutral net acoustic model described in claim 3
Existing, the method includes:
Step T1) for the speech feature vector of input, after the forward calculation by input layer and hidden layer, obtain
VectorThe subvector that dimension is d is split into, is obtainedWherein
Step T2) calculate output layerSpecifically include:
Weight matrix W is by two code book C(1)And C(2)And id is indexed accordingly(1)(i, j) and id(2)(i, j) is represented, its
Middle i ∈ { 1,2 ..., M },
TraversalFor i=1,2 ..., M are calculated successivelyWithSuch as
There is id in the middle of this process in fruit(k)(i, j)=id(k)(i ', j), k ∈ { 1,2 }, i '>I, then work as calculatingWhen, directly useResult;Calculate:
Exported:Y=[y1,…,yi,…,yM];
Step T3) to carry out softmax to y regular, obtains likelihood valueWherein
Step T4) a feeding decoders are decoded;Obtain the recognition result of textual form.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510881044.4A CN106847268B (en) | 2015-12-03 | 2015-12-03 | Neural network acoustic model compression and voice recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510881044.4A CN106847268B (en) | 2015-12-03 | 2015-12-03 | Neural network acoustic model compression and voice recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106847268A true CN106847268A (en) | 2017-06-13 |
CN106847268B CN106847268B (en) | 2020-04-24 |
Family
ID=59149498
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510881044.4A Active CN106847268B (en) | 2015-12-03 | 2015-12-03 | Neural network acoustic model compression and voice recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106847268B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109147773A (en) * | 2017-06-16 | 2019-01-04 | 上海寒武纪信息科技有限公司 | A kind of speech recognition equipment and method |
CN110809771A (en) * | 2017-07-06 | 2020-02-18 | 谷歌有限责任公司 | System and method for compression and distribution of machine learning models |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982803A (en) * | 2012-12-11 | 2013-03-20 | 华南师范大学 | Isolated word speech recognition method based on HRSF and improved DTW algorithm |
-
2015
- 2015-12-03 CN CN201510881044.4A patent/CN106847268B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982803A (en) * | 2012-12-11 | 2013-03-20 | 华南师范大学 | Isolated word speech recognition method based on HRSF and improved DTW algorithm |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109147773A (en) * | 2017-06-16 | 2019-01-04 | 上海寒武纪信息科技有限公司 | A kind of speech recognition equipment and method |
CN110809771A (en) * | 2017-07-06 | 2020-02-18 | 谷歌有限责任公司 | System and method for compression and distribution of machine learning models |
CN110809771B (en) * | 2017-07-06 | 2024-05-28 | 谷歌有限责任公司 | System and method for compression and distribution of machine learning models |
Also Published As
Publication number | Publication date |
---|---|
CN106847268B (en) | 2020-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10614798B2 (en) | Memory compression in a deep neural network | |
US5323486A (en) | Speech coding system having codebook storing differential vectors between each two adjoining code vectors | |
US10115393B1 (en) | Reduced size computerized speech model speaker adaptation | |
Deng et al. | Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration | |
CN110600047A (en) | Perceptual STARGAN-based many-to-many speaker conversion method | |
EP2207167B1 (en) | Multistage quantizing method | |
CN106203624A (en) | Vector Quantization based on deep neural network and method | |
CN104538028A (en) | Continuous voice recognition method based on deep long and short term memory recurrent neural network | |
CN111816156A (en) | Many-to-many voice conversion method and system based on speaker style feature modeling | |
CN111326168A (en) | Voice separation method and device, electronic equipment and storage medium | |
CN109902164B (en) | Method for solving question-answering of open long format video by using convolution bidirectional self-attention network | |
Yang et al. | Steganalysis of VoIP streams with CNN-LSTM network | |
Jiang et al. | An improved vector quantization method using deep neural network | |
CN102881293A (en) | Over-complete dictionary constructing method applicable to voice compression sensing | |
CN111814448B (en) | Pre-training language model quantization method and device | |
CN102708871A (en) | Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model | |
US20040199382A1 (en) | Method and apparatus for formant tracking using a residual model | |
CN115101085A (en) | Multi-speaker time-domain voice separation method for enhancing external attention through convolution | |
CN102436815B (en) | Voice identifying device applied to on-line test system of spoken English | |
CN106847268A (en) | A kind of neutral net acoustic model compression and audio recognition method | |
CN113806543B (en) | Text classification method of gate control circulation unit based on residual jump connection | |
Ju et al. | Tea-pse 3.0: Tencent-ethereal-audio-lab personalized speech enhancement system for icassp 2023 dns-challenge | |
JP6820764B2 (en) | Acoustic model learning device and acoustic model learning program | |
US20220092382A1 (en) | Quantization for neural network computation | |
Li et al. | A fast convolutional self-attention based speech dereverberation method for robust speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |