CN111696531A - Recognition method for improving speech recognition accuracy by using jargon sentences - Google Patents
Recognition method for improving speech recognition accuracy by using jargon sentences Download PDFInfo
- Publication number
- CN111696531A CN111696531A CN202010467020.5A CN202010467020A CN111696531A CN 111696531 A CN111696531 A CN 111696531A CN 202010467020 A CN202010467020 A CN 202010467020A CN 111696531 A CN111696531 A CN 111696531A
- Authority
- CN
- China
- Prior art keywords
- word
- language model
- sequence
- occurrence
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
Abstract
The invention discloses a recognition method for improving the accuracy rate of voice recognition by using conversational terminology sentences, which relates to the technical field of voice recognition and provides a method for improving the accuracy rate of voice recognition by dynamically updating a language model by using sentences configured in the conversational technology; in the process of constructing the voice recognition system, the first language model is trained by using universal text resources; after customizing the dialogs of the dialogue robot, training a second language model by using the dialogue sentence texts; the final language model is fused with the first language model and the second language model, so that the speech recognition system has better accuracy rate on speech under a user-defined scene.
Description
Technical Field
The invention relates to the technical field of voice recognition, in particular to a recognition method for improving voice recognition accuracy by using jargon sentences.
Background
The development of voice recognition, semantic understanding and voice synthesis technology enables the intelligent voice conversation robot to enter daily life, and provides more and more convenient intelligent voice conversation service for users. The user can compile customized dialogs according to the requirements of the user's own scene, and create the intelligent voice conversation robot which meets the requirements of the user.
The speech recognition converts the speech spoken by the user into a corresponding text, then the semantic understanding judges the intention of the user according to the jargon sentence defined by the user and generates a response text, and finally the speech synthesis response text is converted into speech and played to the user.
The speech recognition in the existing intelligent speech dialogue robot system has universality, can be used in various scenes and is irrelevant to the type, the application field and the configuration of interactive speech technology of the intelligent speech dialogue robot. In order to be able to be used in a variety of scenarios, the speech recognition system needs to reach a balance in accuracy under these scenarios, which results in that the accuracy of the speech recognition system under certain scenarios is not too high.
In an actual intelligent voice conversation robot, a user presets a conversation scene and a speaking semantic range of the robot, and the assumption is not made in the universal voice recognition. The voice recognition system is enhanced by utilizing the candidate corpora configured in the intelligent voice conversation robot technology, and the method has important significance for improving the voice recognition accuracy and the man-machine conversation quality.
Disclosure of Invention
The invention provides a recognition method for improving the accuracy of voice recognition by using conversational terminology sentences, aiming at overcoming the defects of the prior art and improving the accuracy of voice recognition by dynamically updating a language model by using sentences configured in the conversational technology.
The invention adopts the following technical scheme for solving the technical problems:
the recognition method for improving the speech recognition accuracy rate by using the jargon sentence provided by the invention comprises the following steps:
step 1, training a first language model by using a universal text; the first language model is trained as follows:
setting i as the length of the sequence being counted, wherein i is an integer greater than or equal to 1;
when i is equal to 1, firstly, counting the 1 st word w of the word sequence1Number of occurrences C (w)1) Then count w1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w);
When i is larger than 1, firstly counting word sequences w in the universal text1、w2、…、wiNumber of sequential occurrences C (w)1,w2,...,wi) Then counting word sequences w in the text1、w2、…、wi-1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w2,...,wi-1,w);wsFor the s-th word of the sequence of words,s is an integer greater than 0 and less than (i + 1);
for a sequence of words w1,w2,...,wnThe composed sentence, n is the number of words in the sentence and the sequence probability PgeneralCalculated by the following formula:
wherein, P (w)i|w1,w2,...,wi-1) Is the conditional probability of the occurrence of the ith word, P (w)1) Is the conditional probability of the occurrence of the 1 st word, P (w)2|w1) The conditional probability of occurrence for the 2 nd word;
C(w1,w2,...,wi) For word sequences w in text1、w2、…、wiNumber of sequential occurrences, ∑wC(w1,w2,...,wi-1W) is a sequence of words w in the text1、w2、…、wi-1The sum of the number of times any word w is connected after appearing in sequence;
step 2, defining the dialect of the dialogue robot, and training a language model by adopting a conversational terminology sentence to obtain a second language model;
the second language model is a sequence probability of the jargon sentence, and specifically includes:
for a sequence of words w1,w2,...,wnComposed conversational sentences of sequence probability PdialogueCalculated by the following formula:
wherein, P (w)i|w1,w2,...,wi-1) Is the conditional probability of the occurrence of the ith word, P (w)1) Conditional probability of occurrence for the 1 st word,P(w2|w1) The conditional probability of occurrence for the 2 nd word;
step 3, fusing the first language model and the second language model to generate a final language model;
the final language model is:
by a sequence of words w1,w2,...,wnComposed sentences of sequence probability Pfinal(w1w2...wn) Calculated by the following formula;
Pfinal(w1w2...wn)=λ1Pgeneral+λ2Pdialogue
wherein λ is1And λ2For interpolation coefficients, for adjusting the first language model and the second language model at Pfinal(w1w2...wn) The weight in (1);
and 4, generating a voice recognition system by using the final voice model, and improving the accuracy of voice recognition by using the voice recognition system.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
the invention provides a method for improving the accuracy of speech recognition by dynamically updating a language model by statements configured in a dialect; in the process of constructing the voice recognition system, the first language model is trained by using universal text resources; after customizing the dialogs of the dialogue robot, training a second language model by using the dialogue sentence texts; the final language model is fused with the first language model and the second language model, so that the speech recognition system has better accuracy rate on speech under a user-defined scene.
Detailed Description
The technical scheme of the invention is further explained in detail as follows:
the language models used in current speech recognition systems are mainly statistical language models and neural network language models. It should be noted that the method proposed by the present invention is applicable not only to statistical language models but also to neural network language models.
1. Training a first language model using generic text
Speech recognition systems typically have a large amount of text corpora from various domains for training language models. The training of the generic language model is trained using these various domain text corpora that are not dialog system dependent.
To be able to adapt to a variety of scenarios, speech recognition systems typically train language models with a large corpus of text from a variety of scenarios, the text being independent of the specific dialog system, referred to as generic text.
The following describes the training and calculation steps of the first language model, taking the most common n-gram language model in the statistical language models as an example.
Assuming i is a positive integer greater than 1, in a particular implementation of the speech recognition system, i is typically set to 3 or 4. When i is 3, it is called a 3-gram language model, and when i is 4, it is called a 4-gram language model.
Firstly, counting words w in universal text1、w2、…、wiNumber of sequential occurrences C (w)1,w2,...,wi) Then, the word w in the text is counted1、w2、…、wi-1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w2,...,wi-1,w)。
For sentence w1,w2,...,wnThe sequence probability is calculated by the following formula:
wherein, P (w)i|w1,w2,...,wi-1) For the conditional probability of each word occurrence, it can be calculated by counting the above statistical methods:
2. training a second language model using user statements configured in phonetics
Firstly, counting words w in universal text1、w2、…、wiNumber of sequential occurrences C (w)1,w2,...,wi) Then, the word w in the text is counted1、w2、…、wi-1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w2,...,wi-1,w)。
For sentence w1,w2,...,wnThe sequence probability is calculated by the following formula:
wherein, P (w)i|w1,w2,...,wi-1) For the conditional probability of each word occurrence, it can be calculated by counting the above statistical methods:
3. fusing a first language model and a second language model
And the final language model is obtained by fusing the first language model and the second language model. In particular for sentence w1,w2,...,wnThe sequence probability is calculated by the following formula
Pfinal(w1w2...wn)=λ1Pgeneral(w1w2...wn)+λ2Pdialogue(w1w2...wn)
λ1And λ2For interpolation coefficients, for adjusting the common language model and the conversation language model at Pfinal(w1w2...wn) The weight in (1). In a specific implementation1And λ2The value of (c) varies from session to session.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.
Claims (1)
1. A recognition method for improving speech recognition accuracy by using jargon sentences is characterized by comprising the following steps:
step 1, training a first language model by using a universal text; the first language model is trained as follows:
setting i as the length of the sequence being counted, wherein i is an integer greater than or equal to 1;
when i is equal to 1, firstly, counting the 1 st word w of the word sequence1Number of occurrences C (w)1) Then count w1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w);
When i is larger than 1, firstly counting word sequences w in the universal text1、w2、…、wiNumber of sequential occurrences C (w)1,w2,...,wi) Then counting word sequences w in the text1、w2、…、wi-1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w2,...,wi-1,w);wsIs the s-th word of the word sequence, and s is an integer which is greater than 0 and less than (i + 1);
for a sequence of words w1,w2,...,wnThe composed sentence, n is the number of words in the sentence and the sequence probability PgeneralCalculated by the following formula:
wherein, P (w)i|w1,w2,...,wi-1) Is the conditional probability of the occurrence of the ith word, P (w)1) For the 1 st word occurrenceConditional probability of (c), P (w)2|w1) The conditional probability of occurrence for the 2 nd word;
C(w1,w2,...,wi) For word sequences w in text1、w2、…、wiNumber of sequential occurrences, ∑wC(w1,w2,...,wi-1W) is a sequence of words w in the text1、w2、…、wi-1The sum of the number of times any word w is connected after appearing in sequence;
step 2, defining the dialect of the dialogue robot, and training a language model by adopting a conversational terminology sentence to obtain a second language model;
the second language model is a sequence probability of the jargon sentence, and specifically includes:
for a sequence of words w1,w2,...,wnComposed conversational sentences of sequence probability PdialogueCalculated by the following formula:
wherein, P (w)i|w1,w2,...,wi-1) Is the conditional probability of the occurrence of the ith word, P (w)1) Is the conditional probability of the occurrence of the 1 st word, P (w)2|w1) The conditional probability of occurrence for the 2 nd word;
step 3, fusing the first language model and the second language model to generate a final language model;
the final language model is:
by a sequence of words w1,w2,...,wnComposed sentences of sequence probability Pfinal(w1w2...wn) Calculated by the following formula;
Pfinal(w1w2...wn)=λ1Pgeneral+λ2Pdialogue
wherein λ is1And λ2For interpolation coefficients, for adjusting the first language model and the second language model at Pfinal(w1w2...wn) The weight in (1);
and 4, generating a voice recognition system by using the final voice model, and improving the accuracy of voice recognition by using the voice recognition system.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010467020.5A CN111696531A (en) | 2020-05-28 | 2020-05-28 | Recognition method for improving speech recognition accuracy by using jargon sentences |
PCT/CN2021/094080 WO2021238700A1 (en) | 2020-05-28 | 2021-05-17 | Recognition method employing speech statement to improve voice recognition accuracy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010467020.5A CN111696531A (en) | 2020-05-28 | 2020-05-28 | Recognition method for improving speech recognition accuracy by using jargon sentences |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111696531A true CN111696531A (en) | 2020-09-22 |
Family
ID=72478687
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010467020.5A Pending CN111696531A (en) | 2020-05-28 | 2020-05-28 | Recognition method for improving speech recognition accuracy by using jargon sentences |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111696531A (en) |
WO (1) | WO2021238700A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021238700A1 (en) * | 2020-05-28 | 2021-12-02 | 升智信息科技(南京)有限公司 | Recognition method employing speech statement to improve voice recognition accuracy |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102880611A (en) * | 2011-07-14 | 2013-01-16 | 腾讯科技(深圳)有限公司 | Language modeling method and language modeling device |
CN103577386A (en) * | 2012-08-06 | 2014-02-12 | 腾讯科技(深圳)有限公司 | Method and device for dynamically loading language model based on user input scene |
CN103871402A (en) * | 2012-12-11 | 2014-06-18 | 北京百度网讯科技有限公司 | Language model training system, a voice identification system and corresponding method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7043422B2 (en) * | 2000-10-13 | 2006-05-09 | Microsoft Corporation | Method and apparatus for distribution-based language model adaptation |
CN107154260B (en) * | 2017-04-11 | 2020-06-16 | 北京儒博科技有限公司 | Domain-adaptive speech recognition method and device |
CN110310663A (en) * | 2019-05-16 | 2019-10-08 | 平安科技(深圳)有限公司 | Words art detection method, device, equipment and computer readable storage medium in violation of rules and regulations |
CN111696531A (en) * | 2020-05-28 | 2020-09-22 | 升智信息科技(南京)有限公司 | Recognition method for improving speech recognition accuracy by using jargon sentences |
-
2020
- 2020-05-28 CN CN202010467020.5A patent/CN111696531A/en active Pending
-
2021
- 2021-05-17 WO PCT/CN2021/094080 patent/WO2021238700A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102880611A (en) * | 2011-07-14 | 2013-01-16 | 腾讯科技(深圳)有限公司 | Language modeling method and language modeling device |
CN103577386A (en) * | 2012-08-06 | 2014-02-12 | 腾讯科技(深圳)有限公司 | Method and device for dynamically loading language model based on user input scene |
CN103871402A (en) * | 2012-12-11 | 2014-06-18 | 北京百度网讯科技有限公司 | Language model training system, a voice identification system and corresponding method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021238700A1 (en) * | 2020-05-28 | 2021-12-02 | 升智信息科技(南京)有限公司 | Recognition method employing speech statement to improve voice recognition accuracy |
Also Published As
Publication number | Publication date |
---|---|
WO2021238700A1 (en) | 2021-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xiong | Fundamentals of speech recognition | |
US7505906B2 (en) | System and method for augmenting spoken language understanding by correcting common errors in linguistic performance | |
KR101211796B1 (en) | Apparatus for foreign language learning and method for providing foreign language learning service | |
CN113439301A (en) | Reconciling between analog data and speech recognition output using sequence-to-sequence mapping | |
JP3323519B2 (en) | Text-to-speech converter | |
EP3891732A1 (en) | Transcription generation from multiple speech recognition systems | |
WO2020117507A1 (en) | Training speech recognition systems using word sequences | |
EP2003572B1 (en) | Language understanding device | |
Raux et al. | Using task-oriented spoken dialogue systems for language learning: potential, practical applications and challenges | |
CN113168828A (en) | Session proxy pipeline trained based on synthetic data | |
US20170345426A1 (en) | System and methods for robust voice-based human-iot communication | |
JP2015187684A (en) | Unsupervised training method, training apparatus, and training program for n-gram language model | |
JPH10504404A (en) | Method and apparatus for speech recognition | |
CN113488026B (en) | Speech understanding model generation method based on pragmatic information and intelligent speech interaction method | |
Georgila et al. | Cross-domain speech disfluency detection | |
Rabiner et al. | Speech recognition: Statistical methods | |
Yeung et al. | Improving automatic forced alignment for dysarthric speech transcription. | |
CN111696531A (en) | Recognition method for improving speech recognition accuracy by using jargon sentences | |
Chan et al. | Discriminative pronunciation learning for speech recognition for resource scarce languages | |
CN116933806A (en) | Concurrent translation system and concurrent translation terminal | |
JP4581549B2 (en) | Audio processing apparatus and method, recording medium, and program | |
US20170337923A1 (en) | System and methods for creating robust voice-based user interface | |
Tarján et al. | Improved recognition of Hungarian call center conversations | |
Beaufays et al. | Learning linguistically valid pronunciations from acoustic data. | |
CN104756183B (en) | In the record correction of intelligent Chinese speech dictation ambiguous characters are effectively inputted using character describer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |