CN111696531A - Recognition method for improving speech recognition accuracy by using jargon sentences - Google Patents

Recognition method for improving speech recognition accuracy by using jargon sentences Download PDF

Info

Publication number
CN111696531A
CN111696531A CN202010467020.5A CN202010467020A CN111696531A CN 111696531 A CN111696531 A CN 111696531A CN 202010467020 A CN202010467020 A CN 202010467020A CN 111696531 A CN111696531 A CN 111696531A
Authority
CN
China
Prior art keywords
word
language model
sequence
occurrence
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010467020.5A
Other languages
Chinese (zh)
Inventor
高洋洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shengzhi Information Technology Nanjing Co ltd
Original Assignee
Shengzhi Information Technology Nanjing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shengzhi Information Technology Nanjing Co ltd filed Critical Shengzhi Information Technology Nanjing Co ltd
Priority to CN202010467020.5A priority Critical patent/CN111696531A/en
Publication of CN111696531A publication Critical patent/CN111696531A/en
Priority to PCT/CN2021/094080 priority patent/WO2021238700A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models

Abstract

The invention discloses a recognition method for improving the accuracy rate of voice recognition by using conversational terminology sentences, which relates to the technical field of voice recognition and provides a method for improving the accuracy rate of voice recognition by dynamically updating a language model by using sentences configured in the conversational technology; in the process of constructing the voice recognition system, the first language model is trained by using universal text resources; after customizing the dialogs of the dialogue robot, training a second language model by using the dialogue sentence texts; the final language model is fused with the first language model and the second language model, so that the speech recognition system has better accuracy rate on speech under a user-defined scene.

Description

Recognition method for improving speech recognition accuracy by using jargon sentences
Technical Field
The invention relates to the technical field of voice recognition, in particular to a recognition method for improving voice recognition accuracy by using jargon sentences.
Background
The development of voice recognition, semantic understanding and voice synthesis technology enables the intelligent voice conversation robot to enter daily life, and provides more and more convenient intelligent voice conversation service for users. The user can compile customized dialogs according to the requirements of the user's own scene, and create the intelligent voice conversation robot which meets the requirements of the user.
The speech recognition converts the speech spoken by the user into a corresponding text, then the semantic understanding judges the intention of the user according to the jargon sentence defined by the user and generates a response text, and finally the speech synthesis response text is converted into speech and played to the user.
The speech recognition in the existing intelligent speech dialogue robot system has universality, can be used in various scenes and is irrelevant to the type, the application field and the configuration of interactive speech technology of the intelligent speech dialogue robot. In order to be able to be used in a variety of scenarios, the speech recognition system needs to reach a balance in accuracy under these scenarios, which results in that the accuracy of the speech recognition system under certain scenarios is not too high.
In an actual intelligent voice conversation robot, a user presets a conversation scene and a speaking semantic range of the robot, and the assumption is not made in the universal voice recognition. The voice recognition system is enhanced by utilizing the candidate corpora configured in the intelligent voice conversation robot technology, and the method has important significance for improving the voice recognition accuracy and the man-machine conversation quality.
Disclosure of Invention
The invention provides a recognition method for improving the accuracy of voice recognition by using conversational terminology sentences, aiming at overcoming the defects of the prior art and improving the accuracy of voice recognition by dynamically updating a language model by using sentences configured in the conversational technology.
The invention adopts the following technical scheme for solving the technical problems:
the recognition method for improving the speech recognition accuracy rate by using the jargon sentence provided by the invention comprises the following steps:
step 1, training a first language model by using a universal text; the first language model is trained as follows:
setting i as the length of the sequence being counted, wherein i is an integer greater than or equal to 1;
when i is equal to 1, firstly, counting the 1 st word w of the word sequence1Number of occurrences C (w)1) Then count w1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w);
When i is larger than 1, firstly counting word sequences w in the universal text1、w2、…、wiNumber of sequential occurrences C (w)1,w2,...,wi) Then counting word sequences w in the text1、w2、…、wi-1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w2,...,wi-1,w);wsFor the s-th word of the sequence of words,s is an integer greater than 0 and less than (i + 1);
for a sequence of words w1,w2,...,wnThe composed sentence, n is the number of words in the sentence and the sequence probability PgeneralCalculated by the following formula:
Figure BDA0002512993530000021
wherein, P (w)i|w1,w2,...,wi-1) Is the conditional probability of the occurrence of the ith word, P (w)1) Is the conditional probability of the occurrence of the 1 st word, P (w)2|w1) The conditional probability of occurrence for the 2 nd word;
Figure BDA0002512993530000022
C(w1,w2,...,wi) For word sequences w in text1、w2、…、wiNumber of sequential occurrences, ∑wC(w1,w2,...,wi-1W) is a sequence of words w in the text1、w2、…、wi-1The sum of the number of times any word w is connected after appearing in sequence;
step 2, defining the dialect of the dialogue robot, and training a language model by adopting a conversational terminology sentence to obtain a second language model;
the second language model is a sequence probability of the jargon sentence, and specifically includes:
for a sequence of words w1,w2,...,wnComposed conversational sentences of sequence probability PdialogueCalculated by the following formula:
Figure BDA0002512993530000023
wherein, P (w)i|w1,w2,...,wi-1) Is the conditional probability of the occurrence of the ith word, P (w)1) Conditional probability of occurrence for the 1 st word,P(w2|w1) The conditional probability of occurrence for the 2 nd word;
step 3, fusing the first language model and the second language model to generate a final language model;
the final language model is:
by a sequence of words w1,w2,...,wnComposed sentences of sequence probability Pfinal(w1w2...wn) Calculated by the following formula;
Pfinal(w1w2...wn)=λ1Pgeneral2Pdialogue
wherein λ is1And λ2For interpolation coefficients, for adjusting the first language model and the second language model at Pfinal(w1w2...wn) The weight in (1);
and 4, generating a voice recognition system by using the final voice model, and improving the accuracy of voice recognition by using the voice recognition system.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
the invention provides a method for improving the accuracy of speech recognition by dynamically updating a language model by statements configured in a dialect; in the process of constructing the voice recognition system, the first language model is trained by using universal text resources; after customizing the dialogs of the dialogue robot, training a second language model by using the dialogue sentence texts; the final language model is fused with the first language model and the second language model, so that the speech recognition system has better accuracy rate on speech under a user-defined scene.
Detailed Description
The technical scheme of the invention is further explained in detail as follows:
the language models used in current speech recognition systems are mainly statistical language models and neural network language models. It should be noted that the method proposed by the present invention is applicable not only to statistical language models but also to neural network language models.
1. Training a first language model using generic text
Speech recognition systems typically have a large amount of text corpora from various domains for training language models. The training of the generic language model is trained using these various domain text corpora that are not dialog system dependent.
To be able to adapt to a variety of scenarios, speech recognition systems typically train language models with a large corpus of text from a variety of scenarios, the text being independent of the specific dialog system, referred to as generic text.
The following describes the training and calculation steps of the first language model, taking the most common n-gram language model in the statistical language models as an example.
Assuming i is a positive integer greater than 1, in a particular implementation of the speech recognition system, i is typically set to 3 or 4. When i is 3, it is called a 3-gram language model, and when i is 4, it is called a 4-gram language model.
Firstly, counting words w in universal text1、w2、…、wiNumber of sequential occurrences C (w)1,w2,...,wi) Then, the word w in the text is counted1、w2、…、wi-1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w2,...,wi-1,w)。
For sentence w1,w2,...,wnThe sequence probability is calculated by the following formula:
Figure BDA0002512993530000031
wherein, P (w)i|w1,w2,...,wi-1) For the conditional probability of each word occurrence, it can be calculated by counting the above statistical methods:
Figure BDA0002512993530000032
2. training a second language model using user statements configured in phonetics
Firstly, counting words w in universal text1、w2、…、wiNumber of sequential occurrences C (w)1,w2,...,wi) Then, the word w in the text is counted1、w2、…、wi-1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w2,...,wi-1,w)。
For sentence w1,w2,...,wnThe sequence probability is calculated by the following formula:
Figure BDA0002512993530000033
wherein, P (w)i|w1,w2,...,wi-1) For the conditional probability of each word occurrence, it can be calculated by counting the above statistical methods:
Figure BDA0002512993530000041
3. fusing a first language model and a second language model
And the final language model is obtained by fusing the first language model and the second language model. In particular for sentence w1,w2,...,wnThe sequence probability is calculated by the following formula
Pfinal(w1w2...wn)=λ1Pgeneral(w1w2...wn)+λ2Pdialogue(w1w2...wn)
λ1And λ2For interpolation coefficients, for adjusting the common language model and the conversation language model at Pfinal(w1w2...wn) The weight in (1). In a specific implementation1And λ2The value of (c) varies from session to session.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (1)

1. A recognition method for improving speech recognition accuracy by using jargon sentences is characterized by comprising the following steps:
step 1, training a first language model by using a universal text; the first language model is trained as follows:
setting i as the length of the sequence being counted, wherein i is an integer greater than or equal to 1;
when i is equal to 1, firstly, counting the 1 st word w of the word sequence1Number of occurrences C (w)1) Then count w1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w);
When i is larger than 1, firstly counting word sequences w in the universal text1、w2、…、wiNumber of sequential occurrences C (w)1,w2,...,wi) Then counting word sequences w in the text1、w2、…、wi-1∑ sum of the number of times any word w is concatenated after sequential occurrencewC(w1,w2,...,wi-1,w);wsIs the s-th word of the word sequence, and s is an integer which is greater than 0 and less than (i + 1);
for a sequence of words w1,w2,...,wnThe composed sentence, n is the number of words in the sentence and the sequence probability PgeneralCalculated by the following formula:
Figure FDA0002512993520000011
wherein, P (w)i|w1,w2,...,wi-1) Is the conditional probability of the occurrence of the ith word, P (w)1) For the 1 st word occurrenceConditional probability of (c), P (w)2|w1) The conditional probability of occurrence for the 2 nd word;
Figure FDA0002512993520000012
C(w1,w2,...,wi) For word sequences w in text1、w2、…、wiNumber of sequential occurrences, ∑wC(w1,w2,...,wi-1W) is a sequence of words w in the text1、w2、…、wi-1The sum of the number of times any word w is connected after appearing in sequence;
step 2, defining the dialect of the dialogue robot, and training a language model by adopting a conversational terminology sentence to obtain a second language model;
the second language model is a sequence probability of the jargon sentence, and specifically includes:
for a sequence of words w1,w2,...,wnComposed conversational sentences of sequence probability PdialogueCalculated by the following formula:
Figure FDA0002512993520000013
wherein, P (w)i|w1,w2,...,wi-1) Is the conditional probability of the occurrence of the ith word, P (w)1) Is the conditional probability of the occurrence of the 1 st word, P (w)2|w1) The conditional probability of occurrence for the 2 nd word;
step 3, fusing the first language model and the second language model to generate a final language model;
the final language model is:
by a sequence of words w1,w2,...,wnComposed sentences of sequence probability Pfinal(w1w2...wn) Calculated by the following formula;
Pfinal(w1w2...wn)=λ1Pgeneral2Pdialogue
wherein λ is1And λ2For interpolation coefficients, for adjusting the first language model and the second language model at Pfinal(w1w2...wn) The weight in (1);
and 4, generating a voice recognition system by using the final voice model, and improving the accuracy of voice recognition by using the voice recognition system.
CN202010467020.5A 2020-05-28 2020-05-28 Recognition method for improving speech recognition accuracy by using jargon sentences Pending CN111696531A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010467020.5A CN111696531A (en) 2020-05-28 2020-05-28 Recognition method for improving speech recognition accuracy by using jargon sentences
PCT/CN2021/094080 WO2021238700A1 (en) 2020-05-28 2021-05-17 Recognition method employing speech statement to improve voice recognition accuracy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010467020.5A CN111696531A (en) 2020-05-28 2020-05-28 Recognition method for improving speech recognition accuracy by using jargon sentences

Publications (1)

Publication Number Publication Date
CN111696531A true CN111696531A (en) 2020-09-22

Family

ID=72478687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010467020.5A Pending CN111696531A (en) 2020-05-28 2020-05-28 Recognition method for improving speech recognition accuracy by using jargon sentences

Country Status (2)

Country Link
CN (1) CN111696531A (en)
WO (1) WO2021238700A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021238700A1 (en) * 2020-05-28 2021-12-02 升智信息科技(南京)有限公司 Recognition method employing speech statement to improve voice recognition accuracy

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880611A (en) * 2011-07-14 2013-01-16 腾讯科技(深圳)有限公司 Language modeling method and language modeling device
CN103577386A (en) * 2012-08-06 2014-02-12 腾讯科技(深圳)有限公司 Method and device for dynamically loading language model based on user input scene
CN103871402A (en) * 2012-12-11 2014-06-18 北京百度网讯科技有限公司 Language model training system, a voice identification system and corresponding method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7043422B2 (en) * 2000-10-13 2006-05-09 Microsoft Corporation Method and apparatus for distribution-based language model adaptation
CN107154260B (en) * 2017-04-11 2020-06-16 北京儒博科技有限公司 Domain-adaptive speech recognition method and device
CN110310663A (en) * 2019-05-16 2019-10-08 平安科技(深圳)有限公司 Words art detection method, device, equipment and computer readable storage medium in violation of rules and regulations
CN111696531A (en) * 2020-05-28 2020-09-22 升智信息科技(南京)有限公司 Recognition method for improving speech recognition accuracy by using jargon sentences

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880611A (en) * 2011-07-14 2013-01-16 腾讯科技(深圳)有限公司 Language modeling method and language modeling device
CN103577386A (en) * 2012-08-06 2014-02-12 腾讯科技(深圳)有限公司 Method and device for dynamically loading language model based on user input scene
CN103871402A (en) * 2012-12-11 2014-06-18 北京百度网讯科技有限公司 Language model training system, a voice identification system and corresponding method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021238700A1 (en) * 2020-05-28 2021-12-02 升智信息科技(南京)有限公司 Recognition method employing speech statement to improve voice recognition accuracy

Also Published As

Publication number Publication date
WO2021238700A1 (en) 2021-12-02

Similar Documents

Publication Publication Date Title
Xiong Fundamentals of speech recognition
US7505906B2 (en) System and method for augmenting spoken language understanding by correcting common errors in linguistic performance
KR101211796B1 (en) Apparatus for foreign language learning and method for providing foreign language learning service
CN113439301A (en) Reconciling between analog data and speech recognition output using sequence-to-sequence mapping
JP3323519B2 (en) Text-to-speech converter
EP3891732A1 (en) Transcription generation from multiple speech recognition systems
WO2020117507A1 (en) Training speech recognition systems using word sequences
EP2003572B1 (en) Language understanding device
Raux et al. Using task-oriented spoken dialogue systems for language learning: potential, practical applications and challenges
CN113168828A (en) Session proxy pipeline trained based on synthetic data
US20170345426A1 (en) System and methods for robust voice-based human-iot communication
JP2015187684A (en) Unsupervised training method, training apparatus, and training program for n-gram language model
JPH10504404A (en) Method and apparatus for speech recognition
CN113488026B (en) Speech understanding model generation method based on pragmatic information and intelligent speech interaction method
Georgila et al. Cross-domain speech disfluency detection
Rabiner et al. Speech recognition: Statistical methods
Yeung et al. Improving automatic forced alignment for dysarthric speech transcription.
CN111696531A (en) Recognition method for improving speech recognition accuracy by using jargon sentences
Chan et al. Discriminative pronunciation learning for speech recognition for resource scarce languages
CN116933806A (en) Concurrent translation system and concurrent translation terminal
JP4581549B2 (en) Audio processing apparatus and method, recording medium, and program
US20170337923A1 (en) System and methods for creating robust voice-based user interface
Tarján et al. Improved recognition of Hungarian call center conversations
Beaufays et al. Learning linguistically valid pronunciations from acoustic data.
CN104756183B (en) In the record correction of intelligent Chinese speech dictation ambiguous characters are effectively inputted using character describer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination