CN110858480B - Speech recognition method based on N-element grammar neural network language model - Google Patents

Speech recognition method based on N-element grammar neural network language model Download PDF

Info

Publication number
CN110858480B
CN110858480B CN201810928881.1A CN201810928881A CN110858480B CN 110858480 B CN110858480 B CN 110858480B CN 201810928881 A CN201810928881 A CN 201810928881A CN 110858480 B CN110858480 B CN 110858480B
Authority
CN
China
Prior art keywords
language model
neural network
model
candidate
network language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810928881.1A
Other languages
Chinese (zh)
Other versions
CN110858480A (en
Inventor
张鹏远
张一珂
潘接林
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN201810928881.1A priority Critical patent/CN110858480B/en
Publication of CN110858480A publication Critical patent/CN110858480A/en
Application granted granted Critical
Publication of CN110858480B publication Critical patent/CN110858480B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a speech recognition method based on an N-element grammar neural network language model, which comprises the following steps: step 1) establishing and training an N-order N-element grammar neural network language model; step 2), for each test voice u, selecting K candidate results with highest scores by using a recognizer; recalculating language model scores of K candidate results based on the trained N-order N-element grammar neural network language model; and then, recalculating the scores of the K candidate results, and selecting the candidate result with the highest score as the final recognition result of the test voice u. The performance and the calculation efficiency of the speech recognition method are superior to those of the speech recognition method based on the RNN language model.

Description

Speech recognition method based on N-element grammar neural network language model
Technical Field
The invention relates to the field of speech recognition and the field of natural language processing, in particular to a method based on an N-gram neural network language model.
Background
A Language Model (LM) is a mathematical model that describes the probability distribution of word sequences, which plays an important role in applications related to natural Language processing. With the development of Deep learning technology, Deep Neural Network (DNN) based language model modeling technology has shown great potential in a series of tasks such as speech recognition, machine translation, text generation, and the like.
Relevant studies have shown that the performance of neural network language models depends heavily on the specific model structure. The currently mainstream neural network structures include standard DNN, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). The DNN model is generally used for simple classification tasks. The CNN model can model high-dimensional data and capture the relevance among all dimensions of the data, and is mainly used for tasks such as image processing and the like. While the RNN model can efficiently compress historical information using recursive concatenation, and is therefore suitable for modeling serialized data. Because natural sentences naturally have strong time sequence relationship and are typical serialized data, the RNN model is widely applied to natural language related tasks.
At present, the language model based on the RNN obtains obviously better performance than a statistical language model and a DNN/CNN language model on a series of tasks such as voice recognition, machine translation and the like. However, in the speech recognition task, the RNN model also has the following problems: 1) after being expanded along time, the recursive structure is similar to a deep DNN model, and the deep network has the phenomena of gradient disappearance and gradient explosion in the training process, so that the RNN model is difficult to train, and the performance of the RNN model is limited; 2) compared with forward network structures such as DNN and CNN, the parallelization degree of the RNN structure is low, and the RNN structure cannot be parallelized on a time axis, so that the time complexity of RNN model calculation is high, and the RNN model calculation method is difficult to be applied to voice recognition systems such as voice input methods and intelligent sound boxes with high real-time rate requirements.
Disclosure of Invention
The invention aims to overcome the technical defects, and provides the voice recognition method based on the N-gram neural network language model on the premise of simplifying the structure of the neural network language model, reducing the computational complexity of the model and improving the parallelizable degree of the model without losing the performance of the voice recognition system.
In order to achieve the above object, the present invention provides a speech recognition method based on an N-gram neural network language model, the method comprising:
step 1) establishing and training an N-order N-element grammar neural network language model;
step 2), for each test voice u, selecting K candidate results with highest scores by using a recognizer; recalculating language model scores of K candidate results based on the trained N-order N-element grammar neural network language model; and then, recalculating the scores of the K candidate results, and selecting the candidate result with the highest score as the final recognition result of the test voice u.
As an improvement of the above method, the step 1) specifically includes:
step 1-1) a sentence l ═ w containing M words in the training set S is given1,…,wMInputting the N-order N-element grammar neural network language model as the word w to be predictediN words w before 1 ≦ i ≦ Mi-n,…,wi-1In a matrix of one-hot codes
Figure BDA0001766049200000021
The low-dimensional expression vector obtained after the table look-up operation
Figure BDA0001766049200000022
Wherein V represents the size of the vocabulary table, is the row number of the matrix C, and h is the column number of the matrix C;
step 1-2) obtaining words w through affine transformation calculationi-kFinal feature representation
Figure BDA0001766049200000023
Sigma denotes a non-linear function, a matrix
Figure BDA0001766049200000024
Step 1-3) calculating a history information vector
Figure BDA0001766049200000025
Step 1-4) for historical information vector hiBy affine transformation, to obtain implicit expression vectors of historical information
Figure BDA0001766049200000026
Matrix array
Figure BDA0001766049200000027
Step 1-5) HiCarrying out probability regularization through affine transformation and softmax function to obtain a word w to be predictediProbability distribution of
Figure BDA0001766049200000028
Representing softmax functions, matrices
Figure BDA0001766049200000029
Step 1-6) calculating yi=(yi1,…,yiV) And wi=(wi1,…,wiV) As a function of the loss
Figure BDA00017660492000000210
Figure BDA00017660492000000211
yidIs yiComponent of (a), widIs wiD is more than or equal to 1 and less than or equal to V;
step 1-7) for the N-order N-element grammar neural network language model
Figure BDA00017660492000000212
Figure BDA00017660492000000213
Calculating an auxiliary loss function L according to the steps 1-4) to 1-6)i-k
Figure BDA00017660492000000214
Step 1-8) obtaining n loss functions L through steps 1-6) to 1-7)i-n+1,……,Li(ii) a Final word w to be predictediIs optimized to the objective function
Figure BDA00017660492000000215
Comprises the following steps:
Figure BDA00017660492000000216
wherein alpha is more than or equal to 0 and less than or equal to 1 and is weight;
step 1-9) updating model parameters based on the training set by using random gradient descent according to the following formula:
Figure BDA0001766049200000031
wherein, λ is the learning rate, θ is the model parameter, including: matrix C, Fk,k=1,2,…n,H,W,
Figure BDA0001766049200000032
And (4) for updating the model parameters, when the model parameters theta are converged, finishing the training of the N-order N-element grammar neural network language model.
As an improvement of the above method, the step 2) specifically includes:
step 2-1) for each test voice u, obtaining a plurality of recognition candidate results by using a recognizer, wherein the recognizer can give the p-th candidate result u of the test voice upScore S (u) ofp):
S(up)=a(up)+μl(up)
Wherein, a (u)p) Is a candidate result upIs scored on the acoustic model, l (u)p) Is a candidate result upMu is a language model score coefficient; selecting K candidate results u with highest scoresp,1≤p≤K;
Step 2-2) recalculating candidate result u by using the trained N-element grammar neural network language modelpAnd P is more than or equal to 1 and less than or equal to Kp);
Step 2-3) recalculating candidate result upAnd a score of p is not less than 1 and not more than K
Figure BDA0001766049200000033
Figure BDA0001766049200000034
Step 2-4) selecting scores
Figure BDA0001766049200000035
The highest candidate is used as the final recognition result of the test speech u.
As an improvement of the above method, the step 2-2) is specifically:
for a candidate result containing M words
Figure BDA0001766049200000036
Inputting the trained N-order N-gram neural network language model to obtain the word of each word
Figure BDA0001766049200000037
Of the loss function, i.e. the probability ypmCandidate result upAnd P is more than or equal to 1 and less than or equal to Kp) Comprises the following steps:
Figure BDA0001766049200000038
the invention has the advantages that:
1. the N-element grammar neural network language model only adopts a forward network structure, so that the problems of gradient extinction and gradient explosion can be effectively avoided;
2. compared with an RNN language model, the N-element grammar neural network language model reduces the calculation complexity of the model and improves the parallel efficiency of the model;
3. the performance and the calculation efficiency of the speech recognition method are superior to those of the speech recognition method based on the RNN language model.
Drawings
FIG. 1 is a schematic structural diagram of an N-gram neural network language model according to the present invention;
FIG. 2 is a flowchart of a speech recognition method based on an N-gram neural network language model according to the present invention.
Detailed Description
The method of the present invention is described in detail below with reference to the accompanying drawings and examples.
As shown in fig. 1 and fig. 2, the present invention provides a speech recognition method based on an N-gram neural network language model, including:
for an N-order N-gram neural network language model,
1) given a sentence l ═ w comprising M words in the training set S1,…,wMIn predicting word wi(i ═ 1,2, …, M), the input to the model is the word w to be predictediThe preceding n words wi-n,……,wi-1One-hot encoding is carried out on the matrix
Figure BDA0001766049200000041
The obtained low-dimensional representation vector after the table look-up operation
Figure BDA0001766049200000042
Where V represents the size of the vocabulary, typically h < V.
2) Obtaining words w by affine transformation calculationi-kFinal characterization
Figure BDA0001766049200000043
Where σ represents a non-linear function, a matrix
Figure BDA0001766049200000044
3) Computing historical information vectors
Figure BDA0001766049200000045
4) To hiBy affine transformation, to obtain implicit expression vectors of historical information
Figure BDA0001766049200000046
Where σ represents a non-linear function, a matrix
Figure BDA0001766049200000047
5)HiCarrying out probability regularization through affine transformation and softmax function to obtain word wi+1Probability distribution of
Figure BDA0001766049200000048
Wherein
Figure BDA0001766049200000049
Representing softmax functions, matrices
Figure BDA00017660492000000410
6) Then calculate yiAnd wiCross entropy of (Cross entropy) as a function of the loss at time i
Figure BDA00017660492000000411
Figure BDA00017660492000000412
7) For the n-th order model, use
Figure BDA00017660492000000413
Instead of h in step 3)iThen calculating the auxiliary loss function according to the steps 4) to 6)
Figure BDA00017660492000000414
8) Finally, n loss functions L are obtained through the steps 6) to 7)i-n+1,……,Li. The invention employs LiAs the main optimization target at time i, all other loss functions Li-n+1,……,Li-1As a secondary optimization objective. The final optimization objective function form is as follows:
Figure BDA0001766049200000051
wherein alpha (alpha is more than or equal to 0 and less than or equal to 1) is the weight of the main optimization target and the auxiliary optimization target.
9) The model is trained in a Stochastic Gradient Descent (SGD) mode. Namely, the model parameters are updated according to the following formula:
Figure BDA0001766049200000052
wherein θ is a model parameter, i.e. the matrix C, F described in the above stepk(k ═ 1,2, … n), H, W. λ is the learning rate, which is used to control the step length of parameter update.
10) After the model training is completed, in the testing stage, for a sentence containing M words, s ═ w1,…,wMCalculating w of each word in turn according to the steps 1) to 6) abovemProbability y of (1, 2, … M)m(at time m, there is no need to calculate the input y of the auxiliary objective function in step 7)m-n+1,…,ym-1). The probability of sentence s can then be obtained by
Figure BDA0001766049200000053
11) And (4) re-scoring the recognition result: in the process of identifying and decoding once, for each test voice u, a plurality of identification candidate results are obtained by using an identifier, and the identifier gives the kth candidate result u of the test voice ukScore S (u) ofk):
S(uk)=a(uk)+μl(uk)
Wherein, a (u)k) Is a candidate result ukScore of the acoustic model of (1), l (k)k) Is a candidate result ukMu is a language model score coefficient; selecting K candidate results u with highest scores k,1≤k≤K。
In the process of re-grading the recognition result, using an N-element grammar neural network language model to recalculate u according to the step 10)kIs scored as a language model of (u)k) And recalculate the candidate ukIs scored by
Figure BDA0001766049200000054
Figure BDA0001766049200000055
Then, the user can use the device to perform the operation,picking scores
Figure BDA0001766049200000056
The highest candidate is used as the final recognition result of the test speech u.
Example (c):
this example illustrates the implementation of the present invention on the Switchboard dataset and the performance comparison with the RNN language model.
The example selects a model order of n 20 and a vocabulary size of V25,000. When predicting word wiWhen the input of the model is wi-20,……,wi-1First, each word wi-k(k ═ 1,2, … 20) is expressed as a 25,000 dimensional one-hot vector. If the hidden vector dimension h is 300, the dimension of the matrix C is 25,000 × 300, and the matrix F is selectedkThe dimensions of H are all 300 × 300, and the dimensions of matrix W are 300 × 25,000. And selecting a sigmoid function as a nonlinear function sigma.
Obtaining w through the table look-up operation of the matrix Ci-kLow dimensional word vector
Figure BDA0001766049200000061
By affine transformation fi-k=σ(Fkei-k) Further obtain wi-kCorresponding implicit feature representation
Figure BDA0001766049200000062
For any time i, accumulating the implicit characteristics f of the words before the time ii-20,fi-19,…,fi-1Obtaining the historical information vector of the i moment
Figure BDA0001766049200000063
By affine transformation Hi=σ(Hhi) Further obtaining a history information vector hiCorresponding implicit feature representation
Figure BDA0001766049200000064
Figure BDA0001766049200000065
By affine transformation
Figure BDA0001766049200000066
Get the word wiProbability distribution of
Figure BDA0001766049200000067
Wherein the content of the first and second substances,
Figure BDA0001766049200000068
representing the softmax function. Then calculating the final optimized objective function according to the formula in the step 8) of the invention
Figure BDA0001766049200000069
And updating the model parameters according to the formula in step 9) of the invention. The example takes a weight α of 0.5 and a learning rate λ of 1.0, and the training process contains a total of 50 iterations (Epoch).
After model training is complete, for a given sentence, s ═ w1,…,wMThe sentence probability p(s) is calculated according to the formula in step 10). Then, the top 100 candidate results of each test voice are re-scored according to the formula in the step 11) of the invention, and the re-scoring score is selected
Figure BDA00017660492000000610
The highest candidate is used as the final recognition result of the test speech u.
On the Switchboard data set, the performance pair ratio of the N-gram Neural Network Language Model (NNLM) and the RNN language model of the invention is shown in Table 1. The parameter quantity represents the calculation complexity of the model, and the more the parameter quantity of the general model is, the higher the calculation complexity is; the real-time rate is based on the operation time of the RNN model, and the index indicates the calculation efficiency of the model, and when the parameter amount of each model is equivalent, the index can be regarded as the parallelization efficiency. The lower the real-time rate, the higher the computational efficiency and parallelization degree of the model. The result shows that compared with the current mainstream speech recognition method based on the RNN language model, the method can reduce the computational complexity of the network language model and improve the parallelization efficiency of the model under the condition of not losing the recognition precision.
Table 1: performance comparison of different neural network language models on the Switchboard test set
Identifying error rates Amount of ginseng Real time rate
RNN 19.08% 10.5M 1
NNLM 18.97% 10M 0.73
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (2)

1. A method of speech recognition based on an N-gram neural network language model, the method comprising:
step 1) establishing and training an N-order N-element grammar neural network language model;
step 2), for each test voice u, selecting K candidate results with highest scores by using a recognizer; recalculating language model scores of K candidate results based on the trained N-order N-element grammar neural network language model; then, the scores of the K candidate results are recalculated, and the candidate result with the highest score is selected as the final recognition result of the test voice u;
the step 2) specifically comprises the following steps:
step 2-1) for each test voice u, obtaining a plurality of recognition candidate results by using a recognizer, wherein the recognizer can give the p-th candidate result u of the test voice upScore of S (u)p):
S(up)=a(up)+μl(up)
Wherein, a (u)p) Is a candidate result upIs scored on the acoustic model, l (u)p) Is a candidate result upMu is a language model score coefficient; selecting K candidate results u with highest scoresp,1≤p≤K;
Step 2-2) recalculating candidate result u by using the trained N-element grammar neural network language modelpAnd P is more than or equal to 1 and less than or equal to Kp);
Step 2-3) recalculating candidate result upAnd a score of p is not less than 1 and not more than K
Figure FDA0003482832200000011
Figure FDA0003482832200000012
Step 2-4) selecting scores
Figure FDA0003482832200000013
The highest candidate result is used as the final recognition result of the test voice u;
the step 2-2) is specifically as follows:
for a candidate result containing M words
Figure FDA0003482832200000014
Inputting the trained N-order N-gram neural network language model to obtain the word of each word
Figure FDA0003482832200000015
Of the loss function, i.e. the probability ypmCandidate result upAnd P is more than or equal to 1 and less than or equal to Kp) Comprises the following steps:
Figure FDA0003482832200000016
2. the speech recognition method based on the N-gram neural network language model according to claim 1, wherein the step 1) specifically comprises:
step 1-1) a sentence l ═ w containing M words in the training set S is given1,...,wMInputting the N-order N-element grammar neural network language model as the word w to be predictediN words w before 1 ≦ i ≦ Mi-n,...,wi-1In a matrix of one-hot codes
Figure FDA0003482832200000021
The low-dimensional expression vector obtained after the table look-up operation
Figure FDA0003482832200000022
Wherein V represents the size of the vocabulary table, is the row number of the matrix C, and h is the column number of the matrix C;
step 1-2) obtaining words w through affine transformation calculationi-kFinal feature representation
Figure FDA0003482832200000023
Sigma denotes a non-linear function, a matrix
Figure FDA0003482832200000024
Step 1-3) calculating a history information vector
Figure FDA0003482832200000025
Step 1-4) for historical information vector hiBy affine transformation, to obtain implicit expression vectors of historical information
Figure FDA0003482832200000026
Matrix array
Figure FDA0003482832200000027
Step 1-5) HiCarrying out probability regularization through affine transformation and softmax function to obtain a word w to be predictediProbability distribution of
Figure FDA0003482832200000028
Figure FDA0003482832200000029
Representing softmax functions, matrices
Figure FDA00034828322000000210
Step 1-6) calculating yi=(yi1,...,yiV) And wi=(wi1,...,wiV) As a function of the loss
Figure FDA00034828322000000211
Figure FDA00034828322000000212
yidIs yiComponent (b) of,widIs wiD is more than or equal to 1 and less than or equal to V;
steps 1-7) to
Figure FDA00034828322000000213
Calculating n-1 auxiliary loss functions L according to the steps 1-4) to 1-6)i-k
Figure FDA00034828322000000214
Step 1-8) calculating the word w to be predictediIs optimized to the objective function
Figure FDA00034828322000000215
Comprises the following steps:
Figure FDA00034828322000000216
wherein alpha is more than or equal to 0 and less than or equal to 1 and is weight;
step 1-9) updating model parameters based on the training set by using random gradient descent according to the following formula:
Figure FDA00034828322000000217
wherein, λ is the learning rate, θ is the model parameter, including: matrix C, Fk,k=1,2,...n,H,W,
Figure FDA00034828322000000218
And (4) for updating the model parameters, when the model parameters theta are converged, finishing the training of the N-order N-element grammar neural network language model.
CN201810928881.1A 2018-08-15 2018-08-15 Speech recognition method based on N-element grammar neural network language model Active CN110858480B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810928881.1A CN110858480B (en) 2018-08-15 2018-08-15 Speech recognition method based on N-element grammar neural network language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810928881.1A CN110858480B (en) 2018-08-15 2018-08-15 Speech recognition method based on N-element grammar neural network language model

Publications (2)

Publication Number Publication Date
CN110858480A CN110858480A (en) 2020-03-03
CN110858480B true CN110858480B (en) 2022-05-17

Family

ID=69635973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810928881.1A Active CN110858480B (en) 2018-08-15 2018-08-15 Speech recognition method based on N-element grammar neural network language model

Country Status (1)

Country Link
CN (1) CN110858480B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111554276B (en) * 2020-05-15 2023-11-03 深圳前海微众银行股份有限公司 Speech recognition method, device, equipment and computer readable storage medium
CN112037773B (en) * 2020-11-05 2021-01-29 北京淇瑀信息科技有限公司 N-optimal spoken language semantic recognition method and device and electronic equipment
CN113380228A (en) * 2021-06-08 2021-09-10 北京它思智能科技有限公司 Online voice recognition method and system based on recurrent neural network language model
CN117316143A (en) * 2023-11-30 2023-12-29 深圳市金大智能创新科技有限公司 Method for human-computer interaction based on virtual person

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106486115A (en) * 2015-08-28 2017-03-08 株式会社东芝 Improve method and apparatus and audio recognition method and the device of neutral net language model
CN106803422A (en) * 2015-11-26 2017-06-06 中国科学院声学研究所 A kind of language model re-evaluation method based on memory network in short-term long
CN108062954A (en) * 2016-11-08 2018-05-22 科大讯飞股份有限公司 Audio recognition method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140067394A1 (en) * 2012-08-28 2014-03-06 King Abdulaziz City For Science And Technology System and method for decoding speech
US10176799B2 (en) * 2016-02-02 2019-01-08 Mitsubishi Electric Research Laboratories, Inc. Method and system for training language models to reduce recognition errors

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106486115A (en) * 2015-08-28 2017-03-08 株式会社东芝 Improve method and apparatus and audio recognition method and the device of neutral net language model
CN106803422A (en) * 2015-11-26 2017-06-06 中国科学院声学研究所 A kind of language model re-evaluation method based on memory network in short-term long
CN108062954A (en) * 2016-11-08 2018-05-22 科大讯飞股份有限公司 Audio recognition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A Neural Probabilistic Language Model;Yoshua Bengio等;《Journal of Machine Learning Research 3》;20030203;1137–1155页 *

Also Published As

Publication number Publication date
CN110858480A (en) 2020-03-03

Similar Documents

Publication Publication Date Title
CN110858480B (en) Speech recognition method based on N-element grammar neural network language model
Audhkhasi et al. Direct acoustics-to-word models for english conversational speech recognition
US10929744B2 (en) Fixed-point training method for deep neural networks based on dynamic fixed-point conversion scheme
CN111145728B (en) Speech recognition model training method, system, mobile terminal and storage medium
CN108804611B (en) Dialog reply generation method and system based on self comment sequence learning
WO2017135334A1 (en) Method and system for training language models to reduce recognition errors
CN111429889A (en) Method, apparatus, device and computer readable storage medium for real-time speech recognition based on truncated attention
CN108831445A (en) Sichuan dialect recognition methods, acoustic training model method, device and equipment
CN111210807B (en) Speech recognition model training method, system, mobile terminal and storage medium
Huang et al. SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition
US20180068652A1 (en) Apparatus and method for training a neural network language model, speech recognition apparatus and method
CN104538028A (en) Continuous voice recognition method based on deep long and short term memory recurrent neural network
CN111145729A (en) Speech recognition model training method, system, mobile terminal and storage medium
Bai et al. A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting.
CN110085215A (en) A kind of language model data Enhancement Method based on generation confrontation network
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN108549703A (en) A kind of training method of the Mongol language model based on Recognition with Recurrent Neural Network
CN114925195A (en) Standard content text abstract generation method integrating vocabulary coding and structure coding
CN116578699A (en) Sequence classification prediction method and system based on Transformer
CN113806543B (en) Text classification method of gate control circulation unit based on residual jump connection
US20180061395A1 (en) Apparatus and method for training a neural network auxiliary model, speech recognition apparatus and method
CN109670171B (en) Word vector representation learning method based on word pair asymmetric co-occurrence
Madhavaraj et al. Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages
CN115376547B (en) Pronunciation evaluation method, pronunciation evaluation device, computer equipment and storage medium
CN111210815A (en) Deep neural network construction method for voice command word recognition, and recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant