CN110858480B - Speech recognition method based on N-element grammar neural network language model - Google Patents
Speech recognition method based on N-element grammar neural network language model Download PDFInfo
- Publication number
- CN110858480B CN110858480B CN201810928881.1A CN201810928881A CN110858480B CN 110858480 B CN110858480 B CN 110858480B CN 201810928881 A CN201810928881 A CN 201810928881A CN 110858480 B CN110858480 B CN 110858480B
- Authority
- CN
- China
- Prior art keywords
- language model
- neural network
- model
- candidate
- network language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012360 testing method Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 14
- 238000004364 calculation method Methods 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 23
- 239000011159 matrix material Substances 0.000 claims description 20
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 12
- 230000009466 transformation Effects 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 11
- 239000013604 expression vector Substances 0.000 claims description 5
- 238000012886 linear function Methods 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000006872 improvement Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 238000012821 model calculation Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a speech recognition method based on an N-element grammar neural network language model, which comprises the following steps: step 1) establishing and training an N-order N-element grammar neural network language model; step 2), for each test voice u, selecting K candidate results with highest scores by using a recognizer; recalculating language model scores of K candidate results based on the trained N-order N-element grammar neural network language model; and then, recalculating the scores of the K candidate results, and selecting the candidate result with the highest score as the final recognition result of the test voice u. The performance and the calculation efficiency of the speech recognition method are superior to those of the speech recognition method based on the RNN language model.
Description
Technical Field
The invention relates to the field of speech recognition and the field of natural language processing, in particular to a method based on an N-gram neural network language model.
Background
A Language Model (LM) is a mathematical model that describes the probability distribution of word sequences, which plays an important role in applications related to natural Language processing. With the development of Deep learning technology, Deep Neural Network (DNN) based language model modeling technology has shown great potential in a series of tasks such as speech recognition, machine translation, text generation, and the like.
Relevant studies have shown that the performance of neural network language models depends heavily on the specific model structure. The currently mainstream neural network structures include standard DNN, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). The DNN model is generally used for simple classification tasks. The CNN model can model high-dimensional data and capture the relevance among all dimensions of the data, and is mainly used for tasks such as image processing and the like. While the RNN model can efficiently compress historical information using recursive concatenation, and is therefore suitable for modeling serialized data. Because natural sentences naturally have strong time sequence relationship and are typical serialized data, the RNN model is widely applied to natural language related tasks.
At present, the language model based on the RNN obtains obviously better performance than a statistical language model and a DNN/CNN language model on a series of tasks such as voice recognition, machine translation and the like. However, in the speech recognition task, the RNN model also has the following problems: 1) after being expanded along time, the recursive structure is similar to a deep DNN model, and the deep network has the phenomena of gradient disappearance and gradient explosion in the training process, so that the RNN model is difficult to train, and the performance of the RNN model is limited; 2) compared with forward network structures such as DNN and CNN, the parallelization degree of the RNN structure is low, and the RNN structure cannot be parallelized on a time axis, so that the time complexity of RNN model calculation is high, and the RNN model calculation method is difficult to be applied to voice recognition systems such as voice input methods and intelligent sound boxes with high real-time rate requirements.
Disclosure of Invention
The invention aims to overcome the technical defects, and provides the voice recognition method based on the N-gram neural network language model on the premise of simplifying the structure of the neural network language model, reducing the computational complexity of the model and improving the parallelizable degree of the model without losing the performance of the voice recognition system.
In order to achieve the above object, the present invention provides a speech recognition method based on an N-gram neural network language model, the method comprising:
step 1) establishing and training an N-order N-element grammar neural network language model;
step 2), for each test voice u, selecting K candidate results with highest scores by using a recognizer; recalculating language model scores of K candidate results based on the trained N-order N-element grammar neural network language model; and then, recalculating the scores of the K candidate results, and selecting the candidate result with the highest score as the final recognition result of the test voice u.
As an improvement of the above method, the step 1) specifically includes:
step 1-1) a sentence l ═ w containing M words in the training set S is given1,…,wMInputting the N-order N-element grammar neural network language model as the word w to be predictediN words w before 1 ≦ i ≦ Mi-n,…,wi-1In a matrix of one-hot codesThe low-dimensional expression vector obtained after the table look-up operationWherein V represents the size of the vocabulary table, is the row number of the matrix C, and h is the column number of the matrix C;
step 1-2) obtaining words w through affine transformation calculationi-kFinal feature representationSigma denotes a non-linear function, a matrix
Step 1-4) for historical information vector hiBy affine transformation, to obtain implicit expression vectors of historical informationMatrix array
Step 1-5) HiCarrying out probability regularization through affine transformation and softmax function to obtain a word w to be predictediProbability distribution ofRepresenting softmax functions, matrices
Step 1-6) calculating yi=(yi1,…,yiV) And wi=(wi1,…,wiV) As a function of the loss yidIs yiComponent of (a), widIs wiD is more than or equal to 1 and less than or equal to V;
step 1-7) for the N-order N-element grammar neural network language model Calculating an auxiliary loss function L according to the steps 1-4) to 1-6)i-k:
Step 1-8) obtaining n loss functions L through steps 1-6) to 1-7)i-n+1,……,Li(ii) a Final word w to be predictediIs optimized to the objective functionComprises the following steps:
wherein alpha is more than or equal to 0 and less than or equal to 1 and is weight;
step 1-9) updating model parameters based on the training set by using random gradient descent according to the following formula:
wherein, λ is the learning rate, θ is the model parameter, including: matrix C, Fk,k=1,2,…n,H,W,And (4) for updating the model parameters, when the model parameters theta are converged, finishing the training of the N-order N-element grammar neural network language model.
As an improvement of the above method, the step 2) specifically includes:
step 2-1) for each test voice u, obtaining a plurality of recognition candidate results by using a recognizer, wherein the recognizer can give the p-th candidate result u of the test voice upScore S (u) ofp):
S(up)=a(up)+μl(up)
Wherein, a (u)p) Is a candidate result upIs scored on the acoustic model, l (u)p) Is a candidate result upMu is a language model score coefficient; selecting K candidate results u with highest scoresp,1≤p≤K;
Step 2-2) recalculating candidate result u by using the trained N-element grammar neural network language modelpAnd P is more than or equal to 1 and less than or equal to Kp);
Step 2-4) selecting scoresThe highest candidate is used as the final recognition result of the test speech u.
As an improvement of the above method, the step 2-2) is specifically:
for a candidate result containing M wordsInputting the trained N-order N-gram neural network language model to obtain the word of each wordOf the loss function, i.e. the probability ypmCandidate result upAnd P is more than or equal to 1 and less than or equal to Kp) Comprises the following steps:
the invention has the advantages that:
1. the N-element grammar neural network language model only adopts a forward network structure, so that the problems of gradient extinction and gradient explosion can be effectively avoided;
2. compared with an RNN language model, the N-element grammar neural network language model reduces the calculation complexity of the model and improves the parallel efficiency of the model;
3. the performance and the calculation efficiency of the speech recognition method are superior to those of the speech recognition method based on the RNN language model.
Drawings
FIG. 1 is a schematic structural diagram of an N-gram neural network language model according to the present invention;
FIG. 2 is a flowchart of a speech recognition method based on an N-gram neural network language model according to the present invention.
Detailed Description
The method of the present invention is described in detail below with reference to the accompanying drawings and examples.
As shown in fig. 1 and fig. 2, the present invention provides a speech recognition method based on an N-gram neural network language model, including:
for an N-order N-gram neural network language model,
1) given a sentence l ═ w comprising M words in the training set S1,…,wMIn predicting word wi(i ═ 1,2, …, M), the input to the model is the word w to be predictediThe preceding n words wi-n,……,wi-1One-hot encoding is carried out on the matrixThe obtained low-dimensional representation vector after the table look-up operationWhere V represents the size of the vocabulary, typically h < V.
2) Obtaining words w by affine transformation calculationi-kFinal characterizationWhere σ represents a non-linear function, a matrix
4) To hiBy affine transformation, to obtain implicit expression vectors of historical informationWhere σ represents a non-linear function, a matrix
5)HiCarrying out probability regularization through affine transformation and softmax function to obtain word wi+1Probability distribution ofWhereinRepresenting softmax functions, matrices
7) For the n-th order model, useInstead of h in step 3)iThen calculating the auxiliary loss function according to the steps 4) to 6)
8) Finally, n loss functions L are obtained through the steps 6) to 7)i-n+1,……,Li. The invention employs LiAs the main optimization target at time i, all other loss functions Li-n+1,……,Li-1As a secondary optimization objective. The final optimization objective function form is as follows:
wherein alpha (alpha is more than or equal to 0 and less than or equal to 1) is the weight of the main optimization target and the auxiliary optimization target.
9) The model is trained in a Stochastic Gradient Descent (SGD) mode. Namely, the model parameters are updated according to the following formula:
wherein θ is a model parameter, i.e. the matrix C, F described in the above stepk(k ═ 1,2, … n), H, W. λ is the learning rate, which is used to control the step length of parameter update.
10) After the model training is completed, in the testing stage, for a sentence containing M words, s ═ w1,…,wMCalculating w of each word in turn according to the steps 1) to 6) abovemProbability y of (1, 2, … M)m(at time m, there is no need to calculate the input y of the auxiliary objective function in step 7)m-n+1,…,ym-1). The probability of sentence s can then be obtained by
11) And (4) re-scoring the recognition result: in the process of identifying and decoding once, for each test voice u, a plurality of identification candidate results are obtained by using an identifier, and the identifier gives the kth candidate result u of the test voice ukScore S (u) ofk):
S(uk)=a(uk)+μl(uk)
Wherein, a (u)k) Is a candidate result ukScore of the acoustic model of (1), l (k)k) Is a candidate result ukMu is a language model score coefficient; selecting K candidate results u with highest scores k,1≤k≤K。
In the process of re-grading the recognition result, using an N-element grammar neural network language model to recalculate u according to the step 10)kIs scored as a language model of (u)k) And recalculate the candidate ukIs scored by
Then, the user can use the device to perform the operation,picking scoresThe highest candidate is used as the final recognition result of the test speech u.
Example (c):
this example illustrates the implementation of the present invention on the Switchboard dataset and the performance comparison with the RNN language model.
The example selects a model order of n 20 and a vocabulary size of V25,000. When predicting word wiWhen the input of the model is wi-20,……,wi-1First, each word wi-k(k ═ 1,2, … 20) is expressed as a 25,000 dimensional one-hot vector. If the hidden vector dimension h is 300, the dimension of the matrix C is 25,000 × 300, and the matrix F is selectedkThe dimensions of H are all 300 × 300, and the dimensions of matrix W are 300 × 25,000. And selecting a sigmoid function as a nonlinear function sigma.
By affine transformation fi-k=σ(Fkei-k) Further obtain wi-kCorresponding implicit feature representation
For any time i, accumulating the implicit characteristics f of the words before the time ii-20,fi-19,…,fi-1Obtaining the historical information vector of the i moment
By affine transformation Hi=σ(Hhi) Further obtaining a history information vector hiCorresponding implicit feature representation
By affine transformationGet the word wiProbability distribution ofWherein the content of the first and second substances,representing the softmax function. Then calculating the final optimized objective function according to the formula in the step 8) of the inventionAnd updating the model parameters according to the formula in step 9) of the invention. The example takes a weight α of 0.5 and a learning rate λ of 1.0, and the training process contains a total of 50 iterations (Epoch).
After model training is complete, for a given sentence, s ═ w1,…,wMThe sentence probability p(s) is calculated according to the formula in step 10). Then, the top 100 candidate results of each test voice are re-scored according to the formula in the step 11) of the invention, and the re-scoring score is selectedThe highest candidate is used as the final recognition result of the test speech u.
On the Switchboard data set, the performance pair ratio of the N-gram Neural Network Language Model (NNLM) and the RNN language model of the invention is shown in Table 1. The parameter quantity represents the calculation complexity of the model, and the more the parameter quantity of the general model is, the higher the calculation complexity is; the real-time rate is based on the operation time of the RNN model, and the index indicates the calculation efficiency of the model, and when the parameter amount of each model is equivalent, the index can be regarded as the parallelization efficiency. The lower the real-time rate, the higher the computational efficiency and parallelization degree of the model. The result shows that compared with the current mainstream speech recognition method based on the RNN language model, the method can reduce the computational complexity of the network language model and improve the parallelization efficiency of the model under the condition of not losing the recognition precision.
Table 1: performance comparison of different neural network language models on the Switchboard test set
Identifying error rates | Amount of ginseng | Real time rate | |
RNN | 19.08% | 10.5 |
1 |
NNLM | 18.97% | 10M | 0.73 |
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (2)
1. A method of speech recognition based on an N-gram neural network language model, the method comprising:
step 1) establishing and training an N-order N-element grammar neural network language model;
step 2), for each test voice u, selecting K candidate results with highest scores by using a recognizer; recalculating language model scores of K candidate results based on the trained N-order N-element grammar neural network language model; then, the scores of the K candidate results are recalculated, and the candidate result with the highest score is selected as the final recognition result of the test voice u;
the step 2) specifically comprises the following steps:
step 2-1) for each test voice u, obtaining a plurality of recognition candidate results by using a recognizer, wherein the recognizer can give the p-th candidate result u of the test voice upScore of S (u)p):
S(up)=a(up)+μl(up)
Wherein, a (u)p) Is a candidate result upIs scored on the acoustic model, l (u)p) Is a candidate result upMu is a language model score coefficient; selecting K candidate results u with highest scoresp,1≤p≤K;
Step 2-2) recalculating candidate result u by using the trained N-element grammar neural network language modelpAnd P is more than or equal to 1 and less than or equal to Kp);
Step 2-4) selecting scoresThe highest candidate result is used as the final recognition result of the test voice u;
the step 2-2) is specifically as follows:
for a candidate result containing M wordsInputting the trained N-order N-gram neural network language model to obtain the word of each wordOf the loss function, i.e. the probability ypmCandidate result upAnd P is more than or equal to 1 and less than or equal to Kp) Comprises the following steps:
2. the speech recognition method based on the N-gram neural network language model according to claim 1, wherein the step 1) specifically comprises:
step 1-1) a sentence l ═ w containing M words in the training set S is given1,...,wMInputting the N-order N-element grammar neural network language model as the word w to be predictediN words w before 1 ≦ i ≦ Mi-n,...,wi-1In a matrix of one-hot codesThe low-dimensional expression vector obtained after the table look-up operationWherein V represents the size of the vocabulary table, is the row number of the matrix C, and h is the column number of the matrix C;
step 1-2) obtaining words w through affine transformation calculationi-kFinal feature representationSigma denotes a non-linear function, a matrix
Step 1-4) for historical information vector hiBy affine transformation, to obtain implicit expression vectors of historical informationMatrix array
Step 1-5) HiCarrying out probability regularization through affine transformation and softmax function to obtain a word w to be predictediProbability distribution of Representing softmax functions, matrices
Step 1-6) calculating yi=(yi1,...,yiV) And wi=(wi1,...,wiV) As a function of the loss yidIs yiComponent (b) of,widIs wiD is more than or equal to 1 and less than or equal to V;
Step 1-8) calculating the word w to be predictediIs optimized to the objective functionComprises the following steps:
wherein alpha is more than or equal to 0 and less than or equal to 1 and is weight;
step 1-9) updating model parameters based on the training set by using random gradient descent according to the following formula:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810928881.1A CN110858480B (en) | 2018-08-15 | 2018-08-15 | Speech recognition method based on N-element grammar neural network language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810928881.1A CN110858480B (en) | 2018-08-15 | 2018-08-15 | Speech recognition method based on N-element grammar neural network language model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110858480A CN110858480A (en) | 2020-03-03 |
CN110858480B true CN110858480B (en) | 2022-05-17 |
Family
ID=69635973
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810928881.1A Active CN110858480B (en) | 2018-08-15 | 2018-08-15 | Speech recognition method based on N-element grammar neural network language model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110858480B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111554276B (en) * | 2020-05-15 | 2023-11-03 | 深圳前海微众银行股份有限公司 | Speech recognition method, device, equipment and computer readable storage medium |
CN112037773B (en) * | 2020-11-05 | 2021-01-29 | 北京淇瑀信息科技有限公司 | N-optimal spoken language semantic recognition method and device and electronic equipment |
CN113380228A (en) * | 2021-06-08 | 2021-09-10 | 北京它思智能科技有限公司 | Online voice recognition method and system based on recurrent neural network language model |
CN117316143A (en) * | 2023-11-30 | 2023-12-29 | 深圳市金大智能创新科技有限公司 | Method for human-computer interaction based on virtual person |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106486115A (en) * | 2015-08-28 | 2017-03-08 | 株式会社东芝 | Improve method and apparatus and audio recognition method and the device of neutral net language model |
CN106803422A (en) * | 2015-11-26 | 2017-06-06 | 中国科学院声学研究所 | A kind of language model re-evaluation method based on memory network in short-term long |
CN108062954A (en) * | 2016-11-08 | 2018-05-22 | 科大讯飞股份有限公司 | Audio recognition method and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140067394A1 (en) * | 2012-08-28 | 2014-03-06 | King Abdulaziz City For Science And Technology | System and method for decoding speech |
US10176799B2 (en) * | 2016-02-02 | 2019-01-08 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for training language models to reduce recognition errors |
-
2018
- 2018-08-15 CN CN201810928881.1A patent/CN110858480B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106486115A (en) * | 2015-08-28 | 2017-03-08 | 株式会社东芝 | Improve method and apparatus and audio recognition method and the device of neutral net language model |
CN106803422A (en) * | 2015-11-26 | 2017-06-06 | 中国科学院声学研究所 | A kind of language model re-evaluation method based on memory network in short-term long |
CN108062954A (en) * | 2016-11-08 | 2018-05-22 | 科大讯飞股份有限公司 | Audio recognition method and device |
Non-Patent Citations (1)
Title |
---|
A Neural Probabilistic Language Model;Yoshua Bengio等;《Journal of Machine Learning Research 3》;20030203;1137–1155页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110858480A (en) | 2020-03-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110858480B (en) | Speech recognition method based on N-element grammar neural network language model | |
Audhkhasi et al. | Direct acoustics-to-word models for english conversational speech recognition | |
US10929744B2 (en) | Fixed-point training method for deep neural networks based on dynamic fixed-point conversion scheme | |
CN111145728B (en) | Speech recognition model training method, system, mobile terminal and storage medium | |
CN108804611B (en) | Dialog reply generation method and system based on self comment sequence learning | |
WO2017135334A1 (en) | Method and system for training language models to reduce recognition errors | |
CN111429889A (en) | Method, apparatus, device and computer readable storage medium for real-time speech recognition based on truncated attention | |
CN108831445A (en) | Sichuan dialect recognition methods, acoustic training model method, device and equipment | |
CN111210807B (en) | Speech recognition model training method, system, mobile terminal and storage medium | |
Huang et al. | SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition | |
US20180068652A1 (en) | Apparatus and method for training a neural network language model, speech recognition apparatus and method | |
CN104538028A (en) | Continuous voice recognition method based on deep long and short term memory recurrent neural network | |
CN111145729A (en) | Speech recognition model training method, system, mobile terminal and storage medium | |
Bai et al. | A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting. | |
CN110085215A (en) | A kind of language model data Enhancement Method based on generation confrontation network | |
CN113255366B (en) | Aspect-level text emotion analysis method based on heterogeneous graph neural network | |
CN108549703A (en) | A kind of training method of the Mongol language model based on Recognition with Recurrent Neural Network | |
CN114925195A (en) | Standard content text abstract generation method integrating vocabulary coding and structure coding | |
CN116578699A (en) | Sequence classification prediction method and system based on Transformer | |
CN113806543B (en) | Text classification method of gate control circulation unit based on residual jump connection | |
US20180061395A1 (en) | Apparatus and method for training a neural network auxiliary model, speech recognition apparatus and method | |
CN109670171B (en) | Word vector representation learning method based on word pair asymmetric co-occurrence | |
Madhavaraj et al. | Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages | |
CN115376547B (en) | Pronunciation evaluation method, pronunciation evaluation device, computer equipment and storage medium | |
CN111210815A (en) | Deep neural network construction method for voice command word recognition, and recognition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |