CN111951785A - Voice recognition method and device and terminal equipment - Google Patents
Voice recognition method and device and terminal equipment Download PDFInfo
- Publication number
- CN111951785A CN111951785A CN201910407618.2A CN201910407618A CN111951785A CN 111951785 A CN111951785 A CN 111951785A CN 201910407618 A CN201910407618 A CN 201910407618A CN 111951785 A CN111951785 A CN 111951785A
- Authority
- CN
- China
- Prior art keywords
- conditional probability
- voice recognition
- loss function
- recognition model
- adjusting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000006870 function Effects 0.000 claims abstract description 70
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000004590 computer program Methods 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The invention is suitable for the technical field of voice recognition, and provides a voice recognition method, a voice recognition device and terminal equipment, wherein the method comprises the following steps: calculating a first conditional probability of a sentence according to a pre-trained language model; adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function; and training the voice recognition model by using the second loss function, and performing voice recognition by using the trained voice recognition model. The invention can improve the accuracy of voice recognition.
Description
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a voice recognition method, a voice recognition device and terminal equipment.
Background
The voice recognition technology aims to recognize input voice signals and output characters which can be read by a computer, and can be applied to smart homes, smart vehicles, smart customer service robots and the like. With the development of Deep learning technology, the speech recognition technology is changed from the traditional machine learning Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) to the Deep Neural Network (DNN) based technology. The DNN-based speech recognition techniques are divided into two categories: one is to replace the original GMM part by DNN, namely Deep Neural Networks and Hidden Markov models (DNN-HMM), and the other is to use end-to-end speech recognition technology based on the Deep Neural Networks.
Because the End-To-End Speech Recognition technology (End-To-End Automatic Speech Recognition) based on the deep neural network can directly realize the input and decoding Recognition of Speech, complex alignment work and pronunciation dictionary making work are not needed, a large amount of early-stage preparation time can be saved, and the method is widely applied. At present, the existing end-to-end speech recognition technology (such as continuous time sequence classification (CTC), deep full feedforward connection neural network (DFSMN), Attention mechanism sequence to sequence network (Seq 2 Seq-Attention) and the like) cannot learn a complex language model, and the complex language model usually recognizes input speech through sound waveforms, so that the logic of recognized characters is poor. Therefore, when the trained speech recognition model is used for speech recognition, if the trained speech recognition model meets more complex speech, the recognition accuracy is lower.
Disclosure of Invention
In view of this, embodiments of the present invention provide a speech recognition method, a speech recognition device, and a terminal device, so as to solve the problem in the prior art that a trained speech recognition model has low recognition accuracy when encountering complex speech.
A first aspect of an embodiment of the present invention provides a speech recognition method, including:
calculating a first conditional probability of a sentence according to a pre-trained language model;
adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and training the voice recognition model by using the second loss function, and performing voice recognition by using the trained voice recognition model.
A second aspect of an embodiment of the present invention provides a speech recognition apparatus, including:
the first conditional probability calculating module is used for calculating the first conditional probability of the sentence according to the pre-trained language model;
the adjusting module is used for adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and the voice recognition module is used for training the voice recognition model by utilizing the second loss function and performing voice recognition by utilizing the trained voice recognition model.
A third aspect of embodiments of the present invention provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method according to the first aspect when executing the computer program.
A fourth aspect of embodiments of the present invention provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, performs the steps of the method according to the first aspect.
In the embodiment of the invention, a pre-trained language model is used for calculating the first conditional probability of a sentence, the original first loss function of a speech recognition model is corrected to obtain a second loss function, the second loss function is further used for training the speech recognition model, the optimization of the loss function of the speech recognition model is realized, and the characteristics of the pre-trained language model are introduced; because the first loss function is optimized by adopting the first conditional probability of the pre-trained language model, the pre-trained language model is embedded into the speech recognition model, and the recognition precision of the trained speech recognition model is higher.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flow chart of a speech recognition method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a specific implementation process of adjusting the first loss function according to the second conditional probability and the influence coefficient according to the embodiment of the present invention;
fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Fig. 1 is a schematic flow chart of a speech recognition method according to an embodiment of the present invention, which is detailed as follows:
s101: a first conditional probability of the sentence is calculated from the pre-trained language model.
It should be noted that the language model can summarize the internal relation between words from a large amount of text information, reduce the error rate of recognized words, and make the recognition result more logical, and the commonly used language models are n-gram language models and neural network-based language models.
The pre-trained speech model in the embodiment of the invention can be trained by a speech model training tool SRILM and adopting an n-gram speech model, wherein a parameter n represents that the probability of the current word occurrence is related to the probability of the previous n-1 words occurrence. In the embodiment of the present invention, a ternary language model, that is, a language model with n ═ 3 is trained, and the probability of the current word occurrence is related to the probability of the 2 preceding words occurrence. And the sentence refers to a sentence predicted to be generated by the speech recognition model according to the input sample (speech data).
Further, the calculating the conditional probability of the sentence according to the pre-trained language model includes:
for each sentence, its first conditional probability is calculated according to:
in the above formula (1), P (S) represents the first conditional probability of sentence S, C (w)i-(n-1),…,wi-1,wi) Meaning word wi-(n-1),…,wi-1Word w after occurrenceiNumber of occurrences, C (w)i-(n-1),…,wi-1) Meaning word wi-(n-1),…wi-2Word w after occurrencei-1The number of occurrences, m represents the number of samples, n represents a positive integer greater than 1, and i represents the ith word.
Since the n-gram language model means that the probability of the current word occurrence is related to the probability of the previous n-1 words occurrence, the first conditional probability p (S) for sentence S can be expressed as:
in the above formula (2), P (w)i|wi-(n-1),…,wi-1) Meaning word wiIn the word wi-(n-1),…,wi-1The probability of occurrence in the case of occurrence can be calculated by a maximum likelihood estimation method, and the above equation (1) can be obtained.
S102: and adjusting the first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function.
The loss function is a difference value between the predicted value and the true value, and reflects the deviation degree between the predicted value and the true value, and the lower the deviation degree between the predicted value and the true value is, the more accurate the prediction result is, so the smaller the loss function is, the better the quality of the finally trained model is, namely, the higher the accuracy of the speech recognition is.
The first loss function refers to an original loss function of the speech recognition model. The original loss function of the speech recognition model is adjusted by utilizing the first conditional probability, so that the characteristics of the pre-trained speech model are introduced, and the accuracy of the trained speech recognition model can be improved.
Specifically, the adjusting a first loss function of the speech recognition model according to the first conditional probability includes:
calculating a second conditional probability using the first conditional probability;
and adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model.
And after the first conditional probability P (S) is obtained through calculation, the first conditional probability P (S) is transformed to obtain a second conditional probability T, and the first loss function is adjusted by utilizing the T and the influence coefficient r of the pre-trained language model.
Further, the calculating a second conditional probability using the first conditional probability includes:
calculating by using the first conditional probability according to the following formula:
in the above equation (3), T represents the calculated second conditional probability, p (S) represents the first conditional probability, and length represents the length of the sentence S, that is, the number of words included in S.
As shown in fig. 2, fig. 2 is a flowchart illustrating a specific implementation process of adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model, and includes the following steps S201 to S203:
s201: obtaining a plurality of sentences obtained through prediction, and calculating a second conditional probability of each sentence;
multiple predicted sentences are obtained from the speech recognition model, assuming that k predicted sentences are obtained, y _ pred1,y_pred2,…,y_predk. Respectively calculating the T value of each sentence by using the above equations (1) and (3) to obtain T1,T2,…,Tk。
S202: calculating average conditional probability according to the second conditional probabilities of all sentences and the influence coefficients;
calculating an average conditional probability T according to the T value obtained in the step S201 and the influence coefficient r of the pre-trained language model and the following formulai:
In the above formula (4), TiRepresenting the calculated average conditional probability, r representing the influence coefficient, k representing the number of sentences, j representing the jth sentence, TjRepresenting a second conditional probability for the jth sentence.
S203: adjusting the first loss function using the average conditional probability.
The method for adjusting the first loss function by using the average conditional probability comprises the following steps: and adding the average conditional probability to the original loss function to obtain a second loss function, namely the adjusted loss function.
It should be noted that, since different influence coefficients r influence the recognition accuracy of the finally trained speech recognition model, different values of r will be adopted for different sample data.
In a preferred implementation manner of the embodiment of the present invention, the influence coefficient is an optimal influence coefficient, and the method for obtaining the optimal influence coefficient includes:
respectively training the voice recognition model by adopting a plurality of influence coefficients, and determining the influence coefficient which enables the recognition precision of the voice recognition model to be highest according to a training result, namely the optimal influence coefficient;
the adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model includes:
and adjusting the first loss function according to the second conditional probability and the optimal influence coefficient.
In general, the selectable value range of the influence coefficient r is between 0 and 1. In the embodiment of the invention, through practical training, the following conclusion can be obtained: when the value range of the influence coefficient r is [0.1,0.5], the converged speech recognition model has better recognition accuracy. However, for voice data with different sizes and different field ranges, different influence coefficients r should be selected, that is, the selection of the influence coefficient r is related to the size of the input voice data and the field range, and in an actual process, an optimal influence coefficient can be selected as required.
Optionally, the training the speech recognition model with a plurality of influence coefficients respectively includes:
and presetting a value interval for the influence coefficients, adjusting the values of the influence coefficients according to a preset step length, and training the voice recognition model by using each influence coefficient.
In the training process of the voice recognition model, a value interval can be preset for r, assuming that the value interval is [0.1,0.5], the value of r is automatically adjusted according to the step length of 0.1, the voice recognition model is trained by the value, and the r which enables the converged voice recognition model to have the highest recognition accuracy is determined according to the training result, namely the optimal influence coefficient.
After the optimal influence coefficient is determined, the loss function is adjusted according to the optimal influence coefficient and the first conditional probability, and a speech recognition model is trained according to the adjusted first loss function.
S103: and training the voice recognition model by using the second loss function, and performing voice recognition by using the trained voice recognition model.
It should be noted that the training process of the speech recognition model is as follows: inputting sample data with labels to a voice recognition model, wherein the sample data is voice data and a text corresponding to the voice data; and performing feature extraction on the sample data to obtain a feature sequence, encoding the feature sequence, decoding to obtain a predicted value, performing difference between the predicted value and a true value to obtain a loss function, and performing model training according to the loss function until the model converges to obtain the trained speech recognition model.
And then, carrying out parameter adjustment on the voice recognition model by using the value of the second loss function, and finally obtaining the voice recognition model with the optimal parameters, namely the trained voice recognition model.
When the trained voice recognition model is used for voice recognition, the audio data to be recognized is input into the trained voice recognition model, and the trained voice recognition model outputs the text corresponding to the audio to be recognized, so that the voice recognition can be realized.
In the embodiment of the invention, a pre-trained language model is used for calculating the first conditional probability of a sentence, the original first loss function of a speech recognition model is corrected to obtain a second loss function, the second loss function is further used for training the speech recognition model, the optimization of the loss function of the speech recognition model is realized, and the characteristics of the pre-trained language model are introduced; because the first loss function is optimized by adopting the first conditional probability of the pre-trained language model, the pre-trained language model is embedded into the speech recognition model, and the recognition precision of the trained speech recognition model is higher.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention, where the apparatus includes: a first conditional probability calculation module 31, an adjustment module 32 and a speech recognition module 33. Wherein:
and a first conditional probability calculating module 31, configured to calculate a first conditional probability of the sentence according to the pre-trained language model.
Further, the first conditional probability calculating module 31 is specifically configured to: for each sentence, its first conditional probability is calculated according to:
in the above formula, P (S) represents the first conditional probability of sentence S, C (w)i-(n-1),…,wi-1,wi) Meaning word wi-(n-1),…,wi-1Word w after occurrenceiNumber of occurrences, C (w)i-(n-1),…,wi-1) Meaning word wi-(n-1),…wi-2Word w after occurrencei-1The number of occurrences, m represents the number of samples, n represents a positive integer greater than 1, and i represents the ith word.
And the adjusting module 32 is configured to adjust the first loss function of the speech recognition model according to the first conditional probability to obtain a second loss function.
Further, the adjusting module 32 includes: a second conditional probability calculating unit 321 and an adjusting unit 322, wherein:
the second conditional probability calculating unit 321 is configured to calculate a second conditional probability by using the first conditional probability.
Further, the second conditional probability calculating unit 321 is specifically configured to:
calculating by using the first conditional probability according to the following formula:
in the above equation (3), T represents the calculated second conditional probability, p (S) represents the first conditional probability, and length represents the length of the sentence S.
The adjusting unit 322 is configured to adjust the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model.
Further, the adjusting unit 322 includes:
a first calculating subunit 3221 configured to obtain a plurality of predicted sentences, and calculate a second conditional probability of each sentence;
a second calculating subunit 3222, configured to calculate an average conditional probability according to the second conditional probabilities of all the sentences and the influence coefficients;
an adjusting subunit 3223 is configured to adjust the first loss function by using the average conditional probability.
And a speech recognition module 33, configured to train the speech recognition model by using the second loss function, and perform speech recognition by using the trained speech recognition model.
Preferably, the influence coefficient is an optimal influence coefficient, and the apparatus further includes an optimal influence coefficient obtaining module 34, configured to use a plurality of influence coefficients to respectively train the speech recognition model, and determine, according to a training result, an influence coefficient that maximizes the recognition accuracy of the speech recognition model, that is, the optimal influence coefficient;
preferably, the adjusting unit 322 is configured to adjust the first loss function according to the second conditional probability and the optimal influence coefficient.
Further, the optimal influence coefficient obtaining module 34 is specifically configured to: and presetting a value interval for the influence coefficients, adjusting the values of the influence coefficients according to a preset step length, and training the voice recognition model by using each influence coefficient.
Fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 4, the terminal device 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42, such as a speech recognition program, stored in said memory 41 and operable on said processor 40. The processor 40, when executing the computer program 42, implements the steps in the various speech recognition method embodiments described above, such as the steps S101 to S103 shown in fig. 1. Alternatively, the processor 40, when executing the computer program 42, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 31 to 33 shown in fig. 3.
Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 42 in the terminal device 4. For example, the computer program 42 may be divided into a first conditional probability calculation module, an adjustment module, and a speech recognition module, each of which functions as follows:
the first conditional probability calculating module is used for calculating the first conditional probability of the sentence according to the pre-trained language model;
the adjusting module is used for adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and the voice recognition module is used for training the voice recognition model by utilizing the second loss function and performing voice recognition by utilizing the trained voice recognition model.
The terminal device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 4 and does not constitute a limitation of terminal device 4 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.
The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing the computer program and other programs and data required by the terminal device. The memory 41 may also be used to temporarily store data that has been output or is to be output.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.
Claims (10)
1. A speech recognition method, comprising:
calculating a first conditional probability of a sentence according to a pre-trained language model;
adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and training the voice recognition model by using the second loss function, and performing voice recognition by using the trained voice recognition model.
2. The method of claim 1, wherein said calculating a first conditional probability for a sentence according to a pre-trained language model comprises:
for each sentence, its first conditional probability is calculated according to:
in the above formula, P (S) represents the first conditional probability of sentence S, C (w)i-(n-1),…,wi-1,wi) Meaning word wi-(n-1),…,wi-1Word w after occurrenceiNumber of occurrences, C (w)i-(n-1),…,wi-1) Meaning word wi-(n-1),…wi-2Word w after occurrencei-1The number of occurrences, m represents the number of samples, n represents a positive integer greater than 1, and i represents the ith word.
3. The method of claim 1, wherein said adjusting a first loss function of a speech recognition model based on said first conditional probability comprises:
calculating a second conditional probability using the first conditional probability;
and adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model.
4. The method of claim 3, wherein said calculating a second conditional probability using said first conditional probability comprises:
calculating by using the first conditional probability according to the following formula:
in the above formula, T represents the calculated second conditional probability, p (S) represents the first conditional probability, and length represents the length of sentence S.
5. The method of claim 4, wherein said adjusting said first penalty function based on said second conditional probability and an influence coefficient of said pre-trained language model comprises:
obtaining a plurality of sentences obtained through prediction, and calculating a second conditional probability of each sentence;
calculating average conditional probability according to the second conditional probabilities of all sentences and the influence coefficients;
adjusting the first loss function using the average conditional probability.
6. The method as claimed in claim 5, wherein the influence coefficient is an optimal influence coefficient, and the optimal influence coefficient is obtained by:
respectively training the voice recognition model by adopting a plurality of influence coefficients, and determining the influence coefficient which enables the recognition precision of the voice recognition model to be highest according to a training result, namely the optimal influence coefficient;
the adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model includes:
and adjusting the first loss function according to the second conditional probability and the optimal influence coefficient.
7. The method of claim 6, wherein the training the speech recognition model with the plurality of influence coefficients comprises:
and presetting a value interval for the influence coefficients, adjusting the values of the influence coefficients according to a preset step length, and training the voice recognition model by using each influence coefficient.
8. A speech recognition apparatus, comprising:
the first conditional probability calculating module is used for calculating the first conditional probability of the sentence according to the pre-trained language model;
the adjusting module is used for adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and the voice recognition module is used for training the voice recognition model by utilizing the second loss function and performing voice recognition by utilizing the trained voice recognition model.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910407618.2A CN111951785B (en) | 2019-05-16 | 2019-05-16 | Voice recognition method and device and terminal equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910407618.2A CN111951785B (en) | 2019-05-16 | 2019-05-16 | Voice recognition method and device and terminal equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111951785A true CN111951785A (en) | 2020-11-17 |
CN111951785B CN111951785B (en) | 2024-03-15 |
Family
ID=73336907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910407618.2A Active CN111951785B (en) | 2019-05-16 | 2019-05-16 | Voice recognition method and device and terminal equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111951785B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113223504A (en) * | 2021-04-30 | 2021-08-06 | 平安科技(深圳)有限公司 | Acoustic model training method, device, equipment and storage medium |
CN113327581A (en) * | 2021-05-04 | 2021-08-31 | 西安博达软件股份有限公司 | Recognition model optimization method and system for improving speech recognition accuracy |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5839106A (en) * | 1996-12-17 | 1998-11-17 | Apple Computer, Inc. | Large-vocabulary speech recognition using an integrated syntactic and semantic statistical language model |
KR20050011441A (en) * | 2003-07-23 | 2005-01-29 | 주식회사 팬택 | Method for modificating hmm |
JP2010078877A (en) * | 2008-09-25 | 2010-04-08 | Pioneer Electronic Corp | Speech recognition device, speech recognition method, and speech recognition program |
CN102999533A (en) * | 2011-09-19 | 2013-03-27 | 腾讯科技(深圳)有限公司 | Textspeak identification method and system |
US20170206890A1 (en) * | 2016-01-16 | 2017-07-20 | Genesys Telecommunications Laboratories, Inc. | Language model customization in speech recognition for speech analytics |
US20170221474A1 (en) * | 2016-02-02 | 2017-08-03 | Mitsubishi Electric Research Laboratories, Inc. | Method and System for Training Language Models to Reduce Recognition Errors |
CN107154260A (en) * | 2017-04-11 | 2017-09-12 | 北京智能管家科技有限公司 | A kind of domain-adaptive audio recognition method and device |
CN107480144A (en) * | 2017-08-03 | 2017-12-15 | 中国人民大学 | Possess the image natural language description generation method and device across language learning ability |
CN108962223A (en) * | 2018-06-25 | 2018-12-07 | 厦门快商通信息技术有限公司 | A kind of voice gender identification method, equipment and medium based on deep learning |
CN109272990A (en) * | 2018-09-25 | 2019-01-25 | 江南大学 | Audio recognition method based on convolutional neural networks |
CN109410914A (en) * | 2018-08-28 | 2019-03-01 | 江西师范大学 | A kind of Jiangxi dialect phonetic and dialect point recognition methods |
-
2019
- 2019-05-16 CN CN201910407618.2A patent/CN111951785B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5839106A (en) * | 1996-12-17 | 1998-11-17 | Apple Computer, Inc. | Large-vocabulary speech recognition using an integrated syntactic and semantic statistical language model |
KR20050011441A (en) * | 2003-07-23 | 2005-01-29 | 주식회사 팬택 | Method for modificating hmm |
JP2010078877A (en) * | 2008-09-25 | 2010-04-08 | Pioneer Electronic Corp | Speech recognition device, speech recognition method, and speech recognition program |
CN102999533A (en) * | 2011-09-19 | 2013-03-27 | 腾讯科技(深圳)有限公司 | Textspeak identification method and system |
US20170206890A1 (en) * | 2016-01-16 | 2017-07-20 | Genesys Telecommunications Laboratories, Inc. | Language model customization in speech recognition for speech analytics |
US20170221474A1 (en) * | 2016-02-02 | 2017-08-03 | Mitsubishi Electric Research Laboratories, Inc. | Method and System for Training Language Models to Reduce Recognition Errors |
CN107154260A (en) * | 2017-04-11 | 2017-09-12 | 北京智能管家科技有限公司 | A kind of domain-adaptive audio recognition method and device |
CN107480144A (en) * | 2017-08-03 | 2017-12-15 | 中国人民大学 | Possess the image natural language description generation method and device across language learning ability |
CN108962223A (en) * | 2018-06-25 | 2018-12-07 | 厦门快商通信息技术有限公司 | A kind of voice gender identification method, equipment and medium based on deep learning |
CN109410914A (en) * | 2018-08-28 | 2019-03-01 | 江西师范大学 | A kind of Jiangxi dialect phonetic and dialect point recognition methods |
CN109272990A (en) * | 2018-09-25 | 2019-01-25 | 江南大学 | Audio recognition method based on convolutional neural networks |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113223504A (en) * | 2021-04-30 | 2021-08-06 | 平安科技(深圳)有限公司 | Acoustic model training method, device, equipment and storage medium |
CN113223504B (en) * | 2021-04-30 | 2023-12-26 | 平安科技(深圳)有限公司 | Training method, device, equipment and storage medium of acoustic model |
CN113327581A (en) * | 2021-05-04 | 2021-08-31 | 西安博达软件股份有限公司 | Recognition model optimization method and system for improving speech recognition accuracy |
CN113327581B (en) * | 2021-05-04 | 2022-05-24 | 西安博达软件股份有限公司 | Recognition model optimization method and system for improving speech recognition accuracy |
Also Published As
Publication number | Publication date |
---|---|
CN111951785B (en) | 2024-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11741355B2 (en) | Training of student neural network with teacher neural networks | |
JP5901001B1 (en) | Method and device for acoustic language model training | |
CN112528637B (en) | Text processing model training method, device, computer equipment and storage medium | |
US7813926B2 (en) | Training system for a speech recognition application | |
CN111402861B (en) | Voice recognition method, device, equipment and storage medium | |
US20200034702A1 (en) | Training of student neural network with switched teacher neural networks | |
US20140156575A1 (en) | Method and Apparatus of Processing Data Using Deep Belief Networks Employing Low-Rank Matrix Factorization | |
US20200160850A1 (en) | Speech recognition system, speech recognition method and computer program product | |
JP2023545988A (en) | Transformer transducer: One model that combines streaming and non-streaming speech recognition | |
CN110211562B (en) | Voice synthesis method, electronic equipment and readable storage medium | |
JP7034279B2 (en) | Filtering model training method and speech recognition method | |
CN111951785B (en) | Voice recognition method and device and terminal equipment | |
CN114067786A (en) | Voice recognition method and device, electronic equipment and storage medium | |
US12057124B2 (en) | Reducing streaming ASR model delay with self alignment | |
WO2011071560A1 (en) | Compressing feature space transforms | |
US10991363B2 (en) | Priors adaptation for conservative training of acoustic model | |
Tanaka et al. | Neural speech-to-text language models for rescoring hypotheses of dnn-hmm hybrid automatic speech recognition systems | |
CN117877483A (en) | Training method of spoken language scoring model, spoken language scoring method and related equipment | |
CN116805495B (en) | Pronunciation deviation detection and action feedback method and system based on large language model | |
CN113851111A (en) | Voice recognition method and voice recognition device | |
CN116994570A (en) | Training method and device of voice recognition model, and voice recognition method and device | |
CN110717022A (en) | Robot dialogue generation method and device, readable storage medium and robot | |
CN116361316A (en) | Semantic engine adaptation method, device, equipment and storage medium | |
CN114117051A (en) | Training method of part-of-speech tagging model, part-of-speech tagging method and electronic equipment | |
CN111383641A (en) | Voice recognition method, device and controller |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |