CN111951785A - Voice recognition method and device and terminal equipment - Google Patents

Voice recognition method and device and terminal equipment Download PDF

Info

Publication number
CN111951785A
CN111951785A CN201910407618.2A CN201910407618A CN111951785A CN 111951785 A CN111951785 A CN 111951785A CN 201910407618 A CN201910407618 A CN 201910407618A CN 111951785 A CN111951785 A CN 111951785A
Authority
CN
China
Prior art keywords
conditional probability
voice recognition
loss function
recognition model
adjusting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910407618.2A
Other languages
Chinese (zh)
Other versions
CN111951785B (en
Inventor
陈明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan TCL Group Industrial Research Institute Co Ltd
Original Assignee
Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan TCL Group Industrial Research Institute Co Ltd filed Critical Wuhan TCL Group Industrial Research Institute Co Ltd
Priority to CN201910407618.2A priority Critical patent/CN111951785B/en
Publication of CN111951785A publication Critical patent/CN111951785A/en
Application granted granted Critical
Publication of CN111951785B publication Critical patent/CN111951785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The invention is suitable for the technical field of voice recognition, and provides a voice recognition method, a voice recognition device and terminal equipment, wherein the method comprises the following steps: calculating a first conditional probability of a sentence according to a pre-trained language model; adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function; and training the voice recognition model by using the second loss function, and performing voice recognition by using the trained voice recognition model. The invention can improve the accuracy of voice recognition.

Description

Voice recognition method and device and terminal equipment
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a voice recognition method, a voice recognition device and terminal equipment.
Background
The voice recognition technology aims to recognize input voice signals and output characters which can be read by a computer, and can be applied to smart homes, smart vehicles, smart customer service robots and the like. With the development of Deep learning technology, the speech recognition technology is changed from the traditional machine learning Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) to the Deep Neural Network (DNN) based technology. The DNN-based speech recognition techniques are divided into two categories: one is to replace the original GMM part by DNN, namely Deep Neural Networks and Hidden Markov models (DNN-HMM), and the other is to use end-to-end speech recognition technology based on the Deep Neural Networks.
Because the End-To-End Speech Recognition technology (End-To-End Automatic Speech Recognition) based on the deep neural network can directly realize the input and decoding Recognition of Speech, complex alignment work and pronunciation dictionary making work are not needed, a large amount of early-stage preparation time can be saved, and the method is widely applied. At present, the existing end-to-end speech recognition technology (such as continuous time sequence classification (CTC), deep full feedforward connection neural network (DFSMN), Attention mechanism sequence to sequence network (Seq 2 Seq-Attention) and the like) cannot learn a complex language model, and the complex language model usually recognizes input speech through sound waveforms, so that the logic of recognized characters is poor. Therefore, when the trained speech recognition model is used for speech recognition, if the trained speech recognition model meets more complex speech, the recognition accuracy is lower.
Disclosure of Invention
In view of this, embodiments of the present invention provide a speech recognition method, a speech recognition device, and a terminal device, so as to solve the problem in the prior art that a trained speech recognition model has low recognition accuracy when encountering complex speech.
A first aspect of an embodiment of the present invention provides a speech recognition method, including:
calculating a first conditional probability of a sentence according to a pre-trained language model;
adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and training the voice recognition model by using the second loss function, and performing voice recognition by using the trained voice recognition model.
A second aspect of an embodiment of the present invention provides a speech recognition apparatus, including:
the first conditional probability calculating module is used for calculating the first conditional probability of the sentence according to the pre-trained language model;
the adjusting module is used for adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and the voice recognition module is used for training the voice recognition model by utilizing the second loss function and performing voice recognition by utilizing the trained voice recognition model.
A third aspect of embodiments of the present invention provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method according to the first aspect when executing the computer program.
A fourth aspect of embodiments of the present invention provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, performs the steps of the method according to the first aspect.
In the embodiment of the invention, a pre-trained language model is used for calculating the first conditional probability of a sentence, the original first loss function of a speech recognition model is corrected to obtain a second loss function, the second loss function is further used for training the speech recognition model, the optimization of the loss function of the speech recognition model is realized, and the characteristics of the pre-trained language model are introduced; because the first loss function is optimized by adopting the first conditional probability of the pre-trained language model, the pre-trained language model is embedded into the speech recognition model, and the recognition precision of the trained speech recognition model is higher.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flow chart of a speech recognition method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a specific implementation process of adjusting the first loss function according to the second conditional probability and the influence coefficient according to the embodiment of the present invention;
fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Fig. 1 is a schematic flow chart of a speech recognition method according to an embodiment of the present invention, which is detailed as follows:
s101: a first conditional probability of the sentence is calculated from the pre-trained language model.
It should be noted that the language model can summarize the internal relation between words from a large amount of text information, reduce the error rate of recognized words, and make the recognition result more logical, and the commonly used language models are n-gram language models and neural network-based language models.
The pre-trained speech model in the embodiment of the invention can be trained by a speech model training tool SRILM and adopting an n-gram speech model, wherein a parameter n represents that the probability of the current word occurrence is related to the probability of the previous n-1 words occurrence. In the embodiment of the present invention, a ternary language model, that is, a language model with n ═ 3 is trained, and the probability of the current word occurrence is related to the probability of the 2 preceding words occurrence. And the sentence refers to a sentence predicted to be generated by the speech recognition model according to the input sample (speech data).
Further, the calculating the conditional probability of the sentence according to the pre-trained language model includes:
for each sentence, its first conditional probability is calculated according to:
Figure BDA0002061750690000041
in the above formula (1), P (S) represents the first conditional probability of sentence S, C (w)i-(n-1),…,wi-1,wi) Meaning word wi-(n-1),…,wi-1Word w after occurrenceiNumber of occurrences, C (w)i-(n-1),…,wi-1) Meaning word wi-(n-1),…wi-2Word w after occurrencei-1The number of occurrences, m represents the number of samples, n represents a positive integer greater than 1, and i represents the ith word.
Since the n-gram language model means that the probability of the current word occurrence is related to the probability of the previous n-1 words occurrence, the first conditional probability p (S) for sentence S can be expressed as:
Figure BDA0002061750690000042
in the above formula (2), P (w)i|wi-(n-1),…,wi-1) Meaning word wiIn the word wi-(n-1),…,wi-1The probability of occurrence in the case of occurrence can be calculated by a maximum likelihood estimation method, and the above equation (1) can be obtained.
S102: and adjusting the first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function.
The loss function is a difference value between the predicted value and the true value, and reflects the deviation degree between the predicted value and the true value, and the lower the deviation degree between the predicted value and the true value is, the more accurate the prediction result is, so the smaller the loss function is, the better the quality of the finally trained model is, namely, the higher the accuracy of the speech recognition is.
The first loss function refers to an original loss function of the speech recognition model. The original loss function of the speech recognition model is adjusted by utilizing the first conditional probability, so that the characteristics of the pre-trained speech model are introduced, and the accuracy of the trained speech recognition model can be improved.
Specifically, the adjusting a first loss function of the speech recognition model according to the first conditional probability includes:
calculating a second conditional probability using the first conditional probability;
and adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model.
And after the first conditional probability P (S) is obtained through calculation, the first conditional probability P (S) is transformed to obtain a second conditional probability T, and the first loss function is adjusted by utilizing the T and the influence coefficient r of the pre-trained language model.
Further, the calculating a second conditional probability using the first conditional probability includes:
calculating by using the first conditional probability according to the following formula:
Figure BDA0002061750690000051
in the above equation (3), T represents the calculated second conditional probability, p (S) represents the first conditional probability, and length represents the length of the sentence S, that is, the number of words included in S.
As shown in fig. 2, fig. 2 is a flowchart illustrating a specific implementation process of adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model, and includes the following steps S201 to S203:
s201: obtaining a plurality of sentences obtained through prediction, and calculating a second conditional probability of each sentence;
multiple predicted sentences are obtained from the speech recognition model, assuming that k predicted sentences are obtained, y _ pred1,y_pred2,…,y_predk. Respectively calculating the T value of each sentence by using the above equations (1) and (3) to obtain T1,T2,…,Tk
S202: calculating average conditional probability according to the second conditional probabilities of all sentences and the influence coefficients;
calculating an average conditional probability T according to the T value obtained in the step S201 and the influence coefficient r of the pre-trained language model and the following formulai
Figure BDA0002061750690000052
In the above formula (4), TiRepresenting the calculated average conditional probability, r representing the influence coefficient, k representing the number of sentences, j representing the jth sentence, TjRepresenting a second conditional probability for the jth sentence.
S203: adjusting the first loss function using the average conditional probability.
The method for adjusting the first loss function by using the average conditional probability comprises the following steps: and adding the average conditional probability to the original loss function to obtain a second loss function, namely the adjusted loss function.
It should be noted that, since different influence coefficients r influence the recognition accuracy of the finally trained speech recognition model, different values of r will be adopted for different sample data.
In a preferred implementation manner of the embodiment of the present invention, the influence coefficient is an optimal influence coefficient, and the method for obtaining the optimal influence coefficient includes:
respectively training the voice recognition model by adopting a plurality of influence coefficients, and determining the influence coefficient which enables the recognition precision of the voice recognition model to be highest according to a training result, namely the optimal influence coefficient;
the adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model includes:
and adjusting the first loss function according to the second conditional probability and the optimal influence coefficient.
In general, the selectable value range of the influence coefficient r is between 0 and 1. In the embodiment of the invention, through practical training, the following conclusion can be obtained: when the value range of the influence coefficient r is [0.1,0.5], the converged speech recognition model has better recognition accuracy. However, for voice data with different sizes and different field ranges, different influence coefficients r should be selected, that is, the selection of the influence coefficient r is related to the size of the input voice data and the field range, and in an actual process, an optimal influence coefficient can be selected as required.
Optionally, the training the speech recognition model with a plurality of influence coefficients respectively includes:
and presetting a value interval for the influence coefficients, adjusting the values of the influence coefficients according to a preset step length, and training the voice recognition model by using each influence coefficient.
In the training process of the voice recognition model, a value interval can be preset for r, assuming that the value interval is [0.1,0.5], the value of r is automatically adjusted according to the step length of 0.1, the voice recognition model is trained by the value, and the r which enables the converged voice recognition model to have the highest recognition accuracy is determined according to the training result, namely the optimal influence coefficient.
After the optimal influence coefficient is determined, the loss function is adjusted according to the optimal influence coefficient and the first conditional probability, and a speech recognition model is trained according to the adjusted first loss function.
S103: and training the voice recognition model by using the second loss function, and performing voice recognition by using the trained voice recognition model.
It should be noted that the training process of the speech recognition model is as follows: inputting sample data with labels to a voice recognition model, wherein the sample data is voice data and a text corresponding to the voice data; and performing feature extraction on the sample data to obtain a feature sequence, encoding the feature sequence, decoding to obtain a predicted value, performing difference between the predicted value and a true value to obtain a loss function, and performing model training according to the loss function until the model converges to obtain the trained speech recognition model.
And then, carrying out parameter adjustment on the voice recognition model by using the value of the second loss function, and finally obtaining the voice recognition model with the optimal parameters, namely the trained voice recognition model.
When the trained voice recognition model is used for voice recognition, the audio data to be recognized is input into the trained voice recognition model, and the trained voice recognition model outputs the text corresponding to the audio to be recognized, so that the voice recognition can be realized.
In the embodiment of the invention, a pre-trained language model is used for calculating the first conditional probability of a sentence, the original first loss function of a speech recognition model is corrected to obtain a second loss function, the second loss function is further used for training the speech recognition model, the optimization of the loss function of the speech recognition model is realized, and the characteristics of the pre-trained language model are introduced; because the first loss function is optimized by adopting the first conditional probability of the pre-trained language model, the pre-trained language model is embedded into the speech recognition model, and the recognition precision of the trained speech recognition model is higher.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention, where the apparatus includes: a first conditional probability calculation module 31, an adjustment module 32 and a speech recognition module 33. Wherein:
and a first conditional probability calculating module 31, configured to calculate a first conditional probability of the sentence according to the pre-trained language model.
Further, the first conditional probability calculating module 31 is specifically configured to: for each sentence, its first conditional probability is calculated according to:
Figure BDA0002061750690000081
in the above formula, P (S) represents the first conditional probability of sentence S, C (w)i-(n-1),…,wi-1,wi) Meaning word wi-(n-1),…,wi-1Word w after occurrenceiNumber of occurrences, C (w)i-(n-1),…,wi-1) Meaning word wi-(n-1),…wi-2Word w after occurrencei-1The number of occurrences, m represents the number of samples, n represents a positive integer greater than 1, and i represents the ith word.
And the adjusting module 32 is configured to adjust the first loss function of the speech recognition model according to the first conditional probability to obtain a second loss function.
Further, the adjusting module 32 includes: a second conditional probability calculating unit 321 and an adjusting unit 322, wherein:
the second conditional probability calculating unit 321 is configured to calculate a second conditional probability by using the first conditional probability.
Further, the second conditional probability calculating unit 321 is specifically configured to:
calculating by using the first conditional probability according to the following formula:
Figure BDA0002061750690000082
in the above equation (3), T represents the calculated second conditional probability, p (S) represents the first conditional probability, and length represents the length of the sentence S.
The adjusting unit 322 is configured to adjust the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model.
Further, the adjusting unit 322 includes:
a first calculating subunit 3221 configured to obtain a plurality of predicted sentences, and calculate a second conditional probability of each sentence;
a second calculating subunit 3222, configured to calculate an average conditional probability according to the second conditional probabilities of all the sentences and the influence coefficients;
an adjusting subunit 3223 is configured to adjust the first loss function by using the average conditional probability.
And a speech recognition module 33, configured to train the speech recognition model by using the second loss function, and perform speech recognition by using the trained speech recognition model.
Preferably, the influence coefficient is an optimal influence coefficient, and the apparatus further includes an optimal influence coefficient obtaining module 34, configured to use a plurality of influence coefficients to respectively train the speech recognition model, and determine, according to a training result, an influence coefficient that maximizes the recognition accuracy of the speech recognition model, that is, the optimal influence coefficient;
preferably, the adjusting unit 322 is configured to adjust the first loss function according to the second conditional probability and the optimal influence coefficient.
Further, the optimal influence coefficient obtaining module 34 is specifically configured to: and presetting a value interval for the influence coefficients, adjusting the values of the influence coefficients according to a preset step length, and training the voice recognition model by using each influence coefficient.
Fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 4, the terminal device 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42, such as a speech recognition program, stored in said memory 41 and operable on said processor 40. The processor 40, when executing the computer program 42, implements the steps in the various speech recognition method embodiments described above, such as the steps S101 to S103 shown in fig. 1. Alternatively, the processor 40, when executing the computer program 42, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 31 to 33 shown in fig. 3.
Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 42 in the terminal device 4. For example, the computer program 42 may be divided into a first conditional probability calculation module, an adjustment module, and a speech recognition module, each of which functions as follows:
the first conditional probability calculating module is used for calculating the first conditional probability of the sentence according to the pre-trained language model;
the adjusting module is used for adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and the voice recognition module is used for training the voice recognition model by utilizing the second loss function and performing voice recognition by utilizing the trained voice recognition model.
The terminal device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 4 and does not constitute a limitation of terminal device 4 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.
The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing the computer program and other programs and data required by the terminal device. The memory 41 may also be used to temporarily store data that has been output or is to be output.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A speech recognition method, comprising:
calculating a first conditional probability of a sentence according to a pre-trained language model;
adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and training the voice recognition model by using the second loss function, and performing voice recognition by using the trained voice recognition model.
2. The method of claim 1, wherein said calculating a first conditional probability for a sentence according to a pre-trained language model comprises:
for each sentence, its first conditional probability is calculated according to:
Figure FDA0002061750680000011
in the above formula, P (S) represents the first conditional probability of sentence S, C (w)i-(n-1),…,wi-1,wi) Meaning word wi-(n-1),…,wi-1Word w after occurrenceiNumber of occurrences, C (w)i-(n-1),…,wi-1) Meaning word wi-(n-1),…wi-2Word w after occurrencei-1The number of occurrences, m represents the number of samples, n represents a positive integer greater than 1, and i represents the ith word.
3. The method of claim 1, wherein said adjusting a first loss function of a speech recognition model based on said first conditional probability comprises:
calculating a second conditional probability using the first conditional probability;
and adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model.
4. The method of claim 3, wherein said calculating a second conditional probability using said first conditional probability comprises:
calculating by using the first conditional probability according to the following formula:
Figure FDA0002061750680000021
in the above formula, T represents the calculated second conditional probability, p (S) represents the first conditional probability, and length represents the length of sentence S.
5. The method of claim 4, wherein said adjusting said first penalty function based on said second conditional probability and an influence coefficient of said pre-trained language model comprises:
obtaining a plurality of sentences obtained through prediction, and calculating a second conditional probability of each sentence;
calculating average conditional probability according to the second conditional probabilities of all sentences and the influence coefficients;
adjusting the first loss function using the average conditional probability.
6. The method as claimed in claim 5, wherein the influence coefficient is an optimal influence coefficient, and the optimal influence coefficient is obtained by:
respectively training the voice recognition model by adopting a plurality of influence coefficients, and determining the influence coefficient which enables the recognition precision of the voice recognition model to be highest according to a training result, namely the optimal influence coefficient;
the adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model includes:
and adjusting the first loss function according to the second conditional probability and the optimal influence coefficient.
7. The method of claim 6, wherein the training the speech recognition model with the plurality of influence coefficients comprises:
and presetting a value interval for the influence coefficients, adjusting the values of the influence coefficients according to a preset step length, and training the voice recognition model by using each influence coefficient.
8. A speech recognition apparatus, comprising:
the first conditional probability calculating module is used for calculating the first conditional probability of the sentence according to the pre-trained language model;
the adjusting module is used for adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and the voice recognition module is used for training the voice recognition model by utilizing the second loss function and performing voice recognition by utilizing the trained voice recognition model.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN201910407618.2A 2019-05-16 2019-05-16 Voice recognition method and device and terminal equipment Active CN111951785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910407618.2A CN111951785B (en) 2019-05-16 2019-05-16 Voice recognition method and device and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910407618.2A CN111951785B (en) 2019-05-16 2019-05-16 Voice recognition method and device and terminal equipment

Publications (2)

Publication Number Publication Date
CN111951785A true CN111951785A (en) 2020-11-17
CN111951785B CN111951785B (en) 2024-03-15

Family

ID=73336907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910407618.2A Active CN111951785B (en) 2019-05-16 2019-05-16 Voice recognition method and device and terminal equipment

Country Status (1)

Country Link
CN (1) CN111951785B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223504A (en) * 2021-04-30 2021-08-06 平安科技(深圳)有限公司 Acoustic model training method, device, equipment and storage medium
CN113327581A (en) * 2021-05-04 2021-08-31 西安博达软件股份有限公司 Recognition model optimization method and system for improving speech recognition accuracy

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5839106A (en) * 1996-12-17 1998-11-17 Apple Computer, Inc. Large-vocabulary speech recognition using an integrated syntactic and semantic statistical language model
KR20050011441A (en) * 2003-07-23 2005-01-29 주식회사 팬택 Method for modificating hmm
JP2010078877A (en) * 2008-09-25 2010-04-08 Pioneer Electronic Corp Speech recognition device, speech recognition method, and speech recognition program
CN102999533A (en) * 2011-09-19 2013-03-27 腾讯科技(深圳)有限公司 Textspeak identification method and system
US20170206890A1 (en) * 2016-01-16 2017-07-20 Genesys Telecommunications Laboratories, Inc. Language model customization in speech recognition for speech analytics
US20170221474A1 (en) * 2016-02-02 2017-08-03 Mitsubishi Electric Research Laboratories, Inc. Method and System for Training Language Models to Reduce Recognition Errors
CN107154260A (en) * 2017-04-11 2017-09-12 北京智能管家科技有限公司 A kind of domain-adaptive audio recognition method and device
CN107480144A (en) * 2017-08-03 2017-12-15 中国人民大学 Possess the image natural language description generation method and device across language learning ability
CN108962223A (en) * 2018-06-25 2018-12-07 厦门快商通信息技术有限公司 A kind of voice gender identification method, equipment and medium based on deep learning
CN109272990A (en) * 2018-09-25 2019-01-25 江南大学 Audio recognition method based on convolutional neural networks
CN109410914A (en) * 2018-08-28 2019-03-01 江西师范大学 A kind of Jiangxi dialect phonetic and dialect point recognition methods

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5839106A (en) * 1996-12-17 1998-11-17 Apple Computer, Inc. Large-vocabulary speech recognition using an integrated syntactic and semantic statistical language model
KR20050011441A (en) * 2003-07-23 2005-01-29 주식회사 팬택 Method for modificating hmm
JP2010078877A (en) * 2008-09-25 2010-04-08 Pioneer Electronic Corp Speech recognition device, speech recognition method, and speech recognition program
CN102999533A (en) * 2011-09-19 2013-03-27 腾讯科技(深圳)有限公司 Textspeak identification method and system
US20170206890A1 (en) * 2016-01-16 2017-07-20 Genesys Telecommunications Laboratories, Inc. Language model customization in speech recognition for speech analytics
US20170221474A1 (en) * 2016-02-02 2017-08-03 Mitsubishi Electric Research Laboratories, Inc. Method and System for Training Language Models to Reduce Recognition Errors
CN107154260A (en) * 2017-04-11 2017-09-12 北京智能管家科技有限公司 A kind of domain-adaptive audio recognition method and device
CN107480144A (en) * 2017-08-03 2017-12-15 中国人民大学 Possess the image natural language description generation method and device across language learning ability
CN108962223A (en) * 2018-06-25 2018-12-07 厦门快商通信息技术有限公司 A kind of voice gender identification method, equipment and medium based on deep learning
CN109410914A (en) * 2018-08-28 2019-03-01 江西师范大学 A kind of Jiangxi dialect phonetic and dialect point recognition methods
CN109272990A (en) * 2018-09-25 2019-01-25 江南大学 Audio recognition method based on convolutional neural networks

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223504A (en) * 2021-04-30 2021-08-06 平安科技(深圳)有限公司 Acoustic model training method, device, equipment and storage medium
CN113223504B (en) * 2021-04-30 2023-12-26 平安科技(深圳)有限公司 Training method, device, equipment and storage medium of acoustic model
CN113327581A (en) * 2021-05-04 2021-08-31 西安博达软件股份有限公司 Recognition model optimization method and system for improving speech recognition accuracy
CN113327581B (en) * 2021-05-04 2022-05-24 西安博达软件股份有限公司 Recognition model optimization method and system for improving speech recognition accuracy

Also Published As

Publication number Publication date
CN111951785B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
US11741355B2 (en) Training of student neural network with teacher neural networks
JP5901001B1 (en) Method and device for acoustic language model training
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
US7813926B2 (en) Training system for a speech recognition application
CN111402861B (en) Voice recognition method, device, equipment and storage medium
US20200034702A1 (en) Training of student neural network with switched teacher neural networks
US20140156575A1 (en) Method and Apparatus of Processing Data Using Deep Belief Networks Employing Low-Rank Matrix Factorization
US20200160850A1 (en) Speech recognition system, speech recognition method and computer program product
JP2023545988A (en) Transformer transducer: One model that combines streaming and non-streaming speech recognition
CN110211562B (en) Voice synthesis method, electronic equipment and readable storage medium
JP7034279B2 (en) Filtering model training method and speech recognition method
CN111951785B (en) Voice recognition method and device and terminal equipment
CN114067786A (en) Voice recognition method and device, electronic equipment and storage medium
US12057124B2 (en) Reducing streaming ASR model delay with self alignment
WO2011071560A1 (en) Compressing feature space transforms
US10991363B2 (en) Priors adaptation for conservative training of acoustic model
Tanaka et al. Neural speech-to-text language models for rescoring hypotheses of dnn-hmm hybrid automatic speech recognition systems
CN117877483A (en) Training method of spoken language scoring model, spoken language scoring method and related equipment
CN116805495B (en) Pronunciation deviation detection and action feedback method and system based on large language model
CN113851111A (en) Voice recognition method and voice recognition device
CN116994570A (en) Training method and device of voice recognition model, and voice recognition method and device
CN110717022A (en) Robot dialogue generation method and device, readable storage medium and robot
CN116361316A (en) Semantic engine adaptation method, device, equipment and storage medium
CN114117051A (en) Training method of part-of-speech tagging model, part-of-speech tagging method and electronic equipment
CN111383641A (en) Voice recognition method, device and controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant