CN106847268A - A kind of neutral net acoustic model compression and audio recognition method - Google Patents

A kind of neutral net acoustic model compression and audio recognition method Download PDF

Info

Publication number
CN106847268A
CN106847268A CN201510881044.4A CN201510881044A CN106847268A CN 106847268 A CN106847268 A CN 106847268A CN 201510881044 A CN201510881044 A CN 201510881044A CN 106847268 A CN106847268 A CN 106847268A
Authority
CN
China
Prior art keywords
matrix
vector
subvector
codebook vectors
grades
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510881044.4A
Other languages
Chinese (zh)
Other versions
CN106847268B (en
Inventor
张鹏远
邢安昊
潘接林
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN201510881044.4A priority Critical patent/CN106847268B/en
Publication of CN106847268A publication Critical patent/CN106847268A/en
Application granted granted Critical
Publication of CN106847268B publication Critical patent/CN106847268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a kind of compression method of neutral net acoustic model, methods described includes:The row vector of the output layer weight matrix W of neutral net acoustic model is divided into several subvectors according to specified dimension;Several subvectors are carried out with one-level vector quantization, one-level code book is obtained, the subvector of matrix W is replaced with one-level codebook vectors, obtain matrix W*;Using matrix W and W*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations;Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain matrix R*;Finally use matrix W*And R*Represent weight matrix W.The method of the present invention can reduce the memory space of neutral net acoustic model, while substantially reducing quantization error, it is to avoid code book scale is exponentially increased.

Description

A kind of neutral net acoustic model compression and audio recognition method
Technical field
The present invention relates to field of speech recognition, more particularly to a kind of neutral net acoustic model compresses and speech recognition side Method.
Background technology
In field of speech recognition, acoustics is carried out using deep-neural-network (Deep Neural Network, DNN) Modeling achieves good effect.The deep structure of DNN causes that model has very strong learning ability, also results in Model parameter amount is huge, therefore to carry out the sound of speech recognition using DNN on computing capability mobile device on the weak side Learn modeling very difficult:It is main to be faced with storage demand greatly and computation complexity problem high.
Method based on vector quantization is used to be compressed DNN models, so as to save storage area and amount of calculation. Its principle is as follows:
Weight matrix for DNNEach of which row vector is all split intoIndividual dimension is the son of d Vector:
WhereinIt is j-th subvector of the rows of weight matrix W i-th, subscript T represents transposition,It Afterwards, whole subvectors are quantified as K codebook vectors using the method for vector quantization.So, the square of script M × N Battle array, it is possible to represented with a code book for containing K d dimensional vector, further need exist for (log2K the individual ratios of) × (MJ) Spy records each subvector index in the codebook.The method can save amount of calculation simultaneously, in the forward direction of DNN In calculating, because the subvector in same row is all to activate multiplication of vectors with identical, if in same row Same codebook vectors are quantified as in the presence of several subvectors, then these subvectors and activation vector in subvector Multiplied result can just share, so as to reduce calculation times.
The performance of DNN can be caused to be affected using the method for vector quantization compression DNN, its impacted degree The quantization error of vector quantization is depended on, but traditional vector quantization only uses single-stage code book, when code book is smaller (i.e. Codebook vectors negligible amounts in code book) when, quantization error is higher, in order to reduce quantization error, it has to be in Exponentially improve code book scale, amount of calculation can thus greatly increased so that the method lose save space and The meaning of calculating.
The content of the invention
The larger problem of the quantization error that exists it is an object of the invention to the method for overcoming vector quantization compression DNN, Proposition is compressed using the method for multi-stage vector quantization to DNN, by adding the quantization of the second level, to the first order The residual error of quantization is quantified again, final to replace original weight matrix using two-stage codebook, on the one hand significantly Quantization error is reduced, while avoid code book scale being exponentially increased.
To achieve these goals, the invention provides a kind of compression method of neutral net acoustic model, the side Method includes:The row vector of the output layer weight matrix W of neutral net acoustic model is divided into according to specified dimension Several subvectors;Several subvectors are carried out with one-level vector quantization, one-level code book is obtained, with one-level code book to Amount replaces the subvector of matrix W, obtains matrix W*;Using matrix W and W*, residual matrix R is calculated, and to R's Vector carries out two grades of vector quantizations;Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain square Battle array R*;Finally use matrix W*And R*Represent weight matrix W.
In above-mentioned technical proposal, methods described is specifically included:
Step S1) row vector of the output layer weight matrix W of neutral net acoustic model is split into the son that dimension is d Vector:
Wherein, W is M × N matrix;
Step S2) to step S1) subvector that obtains carries out one-level vector quantization, obtains one-level code book, uses one-level Codebook vectors replace the subvector of matrix W, obtain matrix W*
To step S1) subvector that obtains carries out one-level vector quantization, obtains one-level code book The code book contains K altogether1Individual codebook vectors, if corresponding to j-th subvector of the rows of weight matrix W i-th One-level codebook vectors are in C(1)In index value be id(1)(i, j) ∈ 1 ..., K1, corresponding codebook vectors are Use codebook vectorsInstead of the subvector of matrix WObtain matrix W*
Step S3) utilize matrix W and W*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations; Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain matrix R*
Calculate residual matrix R:
Wherein,
To vectorTwo grades of vector quantizations are carried out, two grades of code books are obtainedThe code book contains K altogether2 Individual codebook vectors, if the codebook vectors corresponding to j-th subvector of the rows of weight matrix R i-th are in C(2)In index value It is id(2)(i, j) ∈ 1 ..., K2, corresponding codebook vectors areReplace corresponding matrix R's with codebook vectors SubvectorObtain matrix R*
Step S4) use matrix W*And R*Represent weight matrix W:
Subvector in matrix WIndex in two-stage codebook is id(1)(i, j) and id(2)(i,j);So store W It is converted into storing id(1)(i, j) and id(2)(i,j)。
In above-mentioned technical proposal, the step 1) in the value of d meet:D can be divided exactly by the line number N of matrix W.
Based on the compression method of above-mentioned neutral net acoustic model, present invention also offers a kind of audio recognition method, Methods described includes:
Step T1) for the speech feature vector of input, after the forward calculation by input layer and hidden layer, obtain VectorThe subvector that dimension is d is split into, is obtainedWherein
Step T2) calculate output layerSpecifically include:
Weight matrix W is by two code book C(1)And C(2)And id is indexed accordingly(1)(i, j) and id(2)(i, j) is represented, its Middle i ∈ { 1,2 ..., M },
TraversalFor i=1,2 ..., M are calculated successivelyWithSuch as There is id in the middle of this process in fruit(k)(i, j)=id(k)(i ', j), k ∈ { 1,2 }, i '>I, then work as calculatingWhen, directly useResult;Calculate:
Exported:Y=[y1,…,yi,…,yM];
Step T3) to carry out softmax to y regular, obtains likelihood valueWherein
Step T4) a feeding decoders are decoded;Obtain the recognition result of textual form.
The advantage of the invention is that:The method of the present invention can reduce the memory space of neutral net acoustic model, while Greatly reduce quantization error, it is to avoid code book scale is exponentially increased.
Brief description of the drawings
The flow chart of Fig. 1 neutral net acoustic model compression methods of the invention.
Specific embodiment
The present invention will be further described in detail with specific embodiment below in conjunction with the accompanying drawings.
As shown in figure 1, a kind of compression method of neutral net acoustic model, methods described includes:
Step S1) row vector of the output layer weight matrix W of neutral net acoustic model (DNN) is split into dimension Number is the subvector of d:
Wherein, W is M × N matrix;
In the present embodiment, DNN models have 7 layers, wherein the weight matrix scale of 5 hidden layers is all 5000 × 500, The weight matrix scale of input layer is 5000 × 360, and the weight matrix scale of output layer is 20000 × 500.Input is seen The dimension of direction finding amount be 360, specifically 13 Jan Vermeer domain cepstrum coefficient (MFCC) features by extension, Linear discriminant analysis (LDA), maximum likelihood linearly convert (MLLT) and the maximum likelihood of feature space is linear 40 dimensional features are obtained after returning (FMLLR), the extension of each 4 frame of context is carried out to it afterwards, obtained The input feature vector of (4+1+4) × 40=360 dimensions.The data set for using is standard English data set Switchboard, training Data are 286 hours, test data 3 hours;Output layer parameter amount accounts for the half of whole model parameter amount.
In the present embodiment,
Step S2) use scale for 1024 code book, to step 1) subvector that obtains carries out one-level vector quantization, Obtain one-level code bookThe code book contains K altogether1Individual codebook vectors, if weights square Codebook vectors corresponding to j-th subvector of the battle array rows of W i-th are in C(1)In index value be id(1)(i,j)∈ 1 ..., K1, corresponding codebook vectors areIdentified with codebook vectors and replace the subvector of matrix W Obtain matrix W*
Step S3) utilize matrix W and W*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations; Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain matrix R*
The residual error that the first order quantifies is calculated, residual matrix R is obtained:
Wherein,
Use scale carries out two grades of vector quantizations to residual vector for 1024 code book, obtains two grades of code booksThe code book contains K altogether2Individual codebook vectors, if j-th subvector of the rows of weight matrix R i-th Corresponding codebook vectors are in C(2)In index value be id(2)(i, j) ∈ 1 ..., K2, corresponding codebook vectors areUse codebook vectorsInstead of the subvector of corresponding matrix RObtain matrix R*
Step S4) use matrix W*And R*Represent weight matrix W:
Subvector in matrix WIndex in two-stage codebook is id(1)(i, j) and id(2)(i,j);So store W It is converted into storing id(1)(i, j) and id(2)(i,j);
The method of the present invention inherits conventional method and can save the characteristic of amount of calculation, in the method, a subvector Two codebook vectors sums for adhering to code book not at the same level separately can be quantified as, therefore during DNN forward calculations, it is single Individual subvector with activation multiplication of vectors, can also be converted into two parts be multiplied respectively again plus and:
If the shared codebook vector in the first order or the second level quantify of the subvector in same row, it is possible to simplified operation.
Based on above-mentioned neutral net acoustic model compression method, present invention also offers a kind of audio recognition method;Institute The method of stating includes:
Step T1) for the speech feature vector of input, after the forward calculation by input layer and hidden layer, obtain VectorThe subvector that dimension is d is split into, is obtainedWherein
In the present embodiment,With output layer weight matrixCorrespondence, M=20000, N=500, d=4.
Step T2) calculate output layer
Because weight matrix W can be by two code book C(1)And C(2)And id is indexed accordingly(1)(i, j) and id(2)(i, j) table Show, wherein i ∈ { 1,2 ..., M },
TraversalFor i=1,2 ..., M are calculated successivelyWithSuch as There is id in the middle of this process in fruit(k)(i, j)=id(k)(i ', j), k ∈ { 1,2 }, i '>I, then work as calculatingWhen, can directly useResult, so as to save amount of calculation;
Calculate:
Exported:Y=[y1,…,yi,…,yM];
Step T3) to carry out softmax to y regular, obtains likelihood valueWherein
Step T4) a feeding decoders are decoded;Obtain the recognition result of textual form.
The performance to the present embodiment is analyzed below.
The word error rate (word error rate, WER) of each model is tested using test set, model is respectively not Compressed model, the model (code books of the code book of 1024 scales and 8192 scales) of single-stage vector quantization compression and (1024 scale code books carry out first order quantization to the model of multi-stage vector quantization compression, and 1024 scale code books carry out second Level quantifies);
The computing formula of word error rate is as follows:
Compression ratio must be the ratio between storage area after model compression and needed for before compression, and its computing formula is:
Wherein M and N are respectively the row and column of matrix, and respectively equal to 20000 and 500, J are every row subvector number, value It is 500/4=125, K1And K2The respectively scale of two-stage codebook, sizeof (data) refers to storage individual data institute The bit number of needs, such as real-coded GA, it is necessary to 32 bits.
It is using the memory space required for the weight matrix after two grades of vector quantization compressions of the invention:
sizeof(data)×d×(K1+K2)+log2(K1×K2)×M×J。
Experimental result is shown in Table 1:
Table 1
By experimental result as can be seen that using single-stage vector quantization, quantization error is larger, single-stage vector quantization pressure is used DNN performance impairments after contracting are obvious;After being compressed to DNN using multi-stage vector quantization, it is only necessary to use Two less code books of scale, just can substantially reduce quantization error, while so that the intimate nothing of the recognition performance of model Damage.Rear two row in contrast form:" 8192 " and " 1024+1024 ", although the pressure of model after multi-stage vector quantization Contracting is than higher than the model after single-stage vector quantization, this is because the new two grades of code books for adding need additional space to remember Record index;But have benefited from the diminution of code book total scale, table of the multilevel vector quantization method in terms of amount of calculation reduction Single-stage vector quantization method is now better than, while avoiding code book scale from being exponentially increased, has been accomplished to DNN's Performance undamaged compression.

Claims (4)

1. a kind of compression method of neutral net acoustic model, methods described includes:By neutral net acoustic model The row vector of output layer weight matrix W is divided into several subvectors according to specified dimension;To several subvectors One-level vector quantization is carried out, one-level code book is obtained, the subvector of matrix W is replaced with one-level codebook vectors, obtain square Battle array W*;Using matrix W and W*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations;Obtain Two grades of code books, the vector of matrix R is replaced with two grades of codebook vectors, obtains matrix R*;Finally use matrix W*And R*Table Show weight matrix W.
2. the compression method of neutral net acoustic model according to claim 1, it is characterised in that the side Method is specifically included:
Step S1) row vector of the output layer weight matrix W of neutral net acoustic model is split into dimension for d Subvector:
Wherein, W is M × N matrix;
Step S2) to step S1) subvector that obtains carries out one-level vector quantization, obtains one-level code book, uses one-level Codebook vectors replace the subvector of matrix W, obtain matrix W*
To step S1) subvector that obtains carries out one-level vector quantization, obtains one-level code book The code book contains K altogether1Individual codebook vectors, if corresponding to j-th subvector of the rows of weight matrix W i-th One-level codebook vectors are in C(1)In index value be id(1)(i,j)∈{1,…,K1, corresponding codebook vectors are Use codebook vectorsInstead of the subvector of matrix WObtain matrix W*
Step S3) utilize matrix W and W*, residual matrix R is calculated, and vector to R carries out two grades of vector quantizations; Two grades of code books are obtained, the vector of matrix R is replaced with two grades of codebook vectors, obtain matrix R*
Calculate residual matrix R:
Wherein, r i , j T = ω i , j T - c id ( 1 ) ( i j ) T ;
To vectorTwo grades of vector quantizations are carried out, two grades of code books are obtainedThe code book contains K altogether2 Individual codebook vectors, if the codebook vectors corresponding to j-th subvector of the rows of weight matrix R i-th are in C(2)In index value It is id(2)(i,j)∈{1,…,K2, corresponding codebook vectors areReplace corresponding matrix R's with codebook vectors SubvectorObtain matrix R*
Step S4) use matrix W*And R*Represent weight matrix:
Subvector in matrix WIndex in two-stage codebook is id(1)(i, j) and id(2)(i,j);So store W It is converted into storing id(1)(i, j) and id(2)(i,j)。
3. the compression method of neutral net acoustic model according to claim 2, it is characterised in that the step It is rapid 1) in d value meet:D can be divided exactly by the line number N of matrix W.
4. a kind of audio recognition method, the compression method reality based on the neutral net acoustic model described in claim 3 Existing, the method includes:
Step T1) for the speech feature vector of input, after the forward calculation by input layer and hidden layer, obtain VectorThe subvector that dimension is d is split into, is obtainedWherein
Step T2) calculate output layerSpecifically include:
Weight matrix W is by two code book C(1)And C(2)And id is indexed accordingly(1)(i, j) and id(2)(i, j) is represented, its Middle i ∈ { 1,2 ..., M },
TraversalFor i=1,2 ..., M are calculated successivelyWithSuch as There is id in the middle of this process in fruit(k)(i, j)=id(k)(i ', j), k ∈ { 1,2 }, i '>I, then work as calculatingWhen, directly useResult;Calculate:
y i = Σ j = 1 N d ( c id ( 1 ) ( i , j ) T · x j + c id ( 2 ) ( i , j ) T · x j )
Exported:Y=[y1,…,yi,…,yM];
Step T3) to carry out softmax to y regular, obtains likelihood valueWherein
Step T4) a feeding decoders are decoded;Obtain the recognition result of textual form.
CN201510881044.4A 2015-12-03 2015-12-03 Neural network acoustic model compression and voice recognition method Active CN106847268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510881044.4A CN106847268B (en) 2015-12-03 2015-12-03 Neural network acoustic model compression and voice recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510881044.4A CN106847268B (en) 2015-12-03 2015-12-03 Neural network acoustic model compression and voice recognition method

Publications (2)

Publication Number Publication Date
CN106847268A true CN106847268A (en) 2017-06-13
CN106847268B CN106847268B (en) 2020-04-24

Family

ID=59149498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510881044.4A Active CN106847268B (en) 2015-12-03 2015-12-03 Neural network acoustic model compression and voice recognition method

Country Status (1)

Country Link
CN (1) CN106847268B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147773A (en) * 2017-06-16 2019-01-04 上海寒武纪信息科技有限公司 A kind of speech recognition equipment and method
CN110809771A (en) * 2017-07-06 2020-02-18 谷歌有限责任公司 System and method for compression and distribution of machine learning models

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982803A (en) * 2012-12-11 2013-03-20 华南师范大学 Isolated word speech recognition method based on HRSF and improved DTW algorithm

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982803A (en) * 2012-12-11 2013-03-20 华南师范大学 Isolated word speech recognition method based on HRSF and improved DTW algorithm

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147773A (en) * 2017-06-16 2019-01-04 上海寒武纪信息科技有限公司 A kind of speech recognition equipment and method
CN110809771A (en) * 2017-07-06 2020-02-18 谷歌有限责任公司 System and method for compression and distribution of machine learning models
CN110809771B (en) * 2017-07-06 2024-05-28 谷歌有限责任公司 System and method for compression and distribution of machine learning models

Also Published As

Publication number Publication date
CN106847268B (en) 2020-04-24

Similar Documents

Publication Publication Date Title
US10614798B2 (en) Memory compression in a deep neural network
US5323486A (en) Speech coding system having codebook storing differential vectors between each two adjoining code vectors
US10115393B1 (en) Reduced size computerized speech model speaker adaptation
Deng et al. Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration
CN110600047A (en) Perceptual STARGAN-based many-to-many speaker conversion method
EP2207167B1 (en) Multistage quantizing method
CN106203624A (en) Vector Quantization based on deep neural network and method
CN104538028A (en) Continuous voice recognition method based on deep long and short term memory recurrent neural network
CN111816156A (en) Many-to-many voice conversion method and system based on speaker style feature modeling
CN111326168A (en) Voice separation method and device, electronic equipment and storage medium
CN109902164B (en) Method for solving question-answering of open long format video by using convolution bidirectional self-attention network
Yang et al. Steganalysis of VoIP streams with CNN-LSTM network
Jiang et al. An improved vector quantization method using deep neural network
CN102881293A (en) Over-complete dictionary constructing method applicable to voice compression sensing
CN111814448B (en) Pre-training language model quantization method and device
CN102708871A (en) Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model
US20040199382A1 (en) Method and apparatus for formant tracking using a residual model
CN115101085A (en) Multi-speaker time-domain voice separation method for enhancing external attention through convolution
CN102436815B (en) Voice identifying device applied to on-line test system of spoken English
CN106847268A (en) A kind of neutral net acoustic model compression and audio recognition method
CN113806543B (en) Text classification method of gate control circulation unit based on residual jump connection
Ju et al. Tea-pse 3.0: Tencent-ethereal-audio-lab personalized speech enhancement system for icassp 2023 dns-challenge
JP6820764B2 (en) Acoustic model learning device and acoustic model learning program
US20220092382A1 (en) Quantization for neural network computation
Li et al. A fast convolutional self-attention based speech dereverberation method for robust speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant