CN104756183B - In the record correction of intelligent Chinese speech dictation ambiguous characters are effectively inputted using character describer - Google Patents

In the record correction of intelligent Chinese speech dictation ambiguous characters are effectively inputted using character describer Download PDF

Info

Publication number
CN104756183B
CN104756183B CN201280075499.1A CN201280075499A CN104756183B CN 104756183 B CN104756183 B CN 104756183B CN 201280075499 A CN201280075499 A CN 201280075499A CN 104756183 B CN104756183 B CN 104756183B
Authority
CN
China
Prior art keywords
character
prompting
description
spontaneous
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201280075499.1A
Other languages
Chinese (zh)
Other versions
CN104756183A (en
Inventor
李伟
徐然
任晓琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuance Communications Inc
Original Assignee
Nuance Communications Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nuance Communications Inc filed Critical Nuance Communications Inc
Publication of CN104756183A publication Critical patent/CN104756183A/en
Application granted granted Critical
Publication of CN104756183B publication Critical patent/CN104756183B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

Describe the computer implemented method of user's disambiguation for Mandarin speech recognition input.Receive from user and inputted for the Chinese speech of automatic speech recognition.Also the spontaneous character for one or more of description phonetic entry character being received from user describes prompting.The one or more Chinese language characters for being then based on character description prompting to perform the automatic speech recognition of phonetic entry to determine corresponding to phonetic entry.

Description

In the record correction of intelligent Chinese speech dictation mould is effectively inputted using character describer Paste character
This application claims the U.S. Provisional Patent Application 61/ submitted for 29th by quoting the Augusts in 2012 being incorporated herein 694,450 priority.
Technical field
The present invention relates to the automatic speech recognition in Chinese, and more particularly to based on the description prompting of spontaneous user's character The disambiguation of Chinese character.
Background technology
Automatic speech recognition (ASR) system determines the semantic meaning of phonetic entry.In general, input voice is processed into a sequence Column of figure phonetic feature frame.Each phonetic feature frame can be considered representative existing voice letter during short Speech time window Number various features multidimensional vector.For example, can be from the cepstrum feature of the short time discrete Fourier transform frequency spectrum of voice signal (MFCC) --- short-time rating or component of allocated frequency band --- and corresponding first and second order derivative (" δ " and " δ-δ ") Obtain the multidimensional vector of each speech frame.In continuous identifying system, the speech frame of variable number is organized as representative and is followed by " pronunciation " of the voice of a period of time of pause, this loosely corresponds to the sentence or phrase said in actual life.
ASR system compares multiple input speech frame to search the statistical model of best matching phonetic feature characteristic, and connects Definite corresponding representative text relevant with statistical model or semantic meaning.Modern statistics model is state sequence model, Such as the hidden Markov model (HMM) of speech sound (being typically phoneme) is imitated using the mixing of Gaussian Profile.These systems Meter model usually represents the phoneme being referred to as in the specific context of PEL (phonetic feature), such as with known left context And/or the three-tone or phoneme of right context.State sequence model can be scaling up so that vocabulary to be shown as to the sound of sound modeling The catenation sequence of element or by catenation sequence that phrase or sentence expression are word.When statistical model be organized as together word, phrase and During sentence, the relevant information of extra language is also generally merged into the model in the form of Language Modeling.
Relevant with best match model structure word or phrase are referred to as identifying candidate item or hypothesis.System can produce single Best identified candidate item --- recognition result --- or be referred to as N-best several hypothesis list.Entitled " continuous United States Patent (USP) that the numbering of speech recognition (Continuous Speech Recognition) " is 5,794,189 and entitled The U.S. that the numbering of " speech recognition language model (Speech Recognition Language Models) " is 6,167,377 The other details on continuous speech recognition is provided in patent, the content of the two patents is incorporated herein by quoting.
Perfect accuracy in speech recognition cannot be implemented, and some words in recognition result will inevitably Need to correct.In the certain situation of such as driving, hand operation is disabled, and all corrections need only to pass through voice commands To realize.
For western language, identification correction is completed usually in word level (for example, by saying correct list again Word).If word is too equivocal or since other reasons are difficult to, user can always rely on and combine the word into syllables Input it.But Chinese word is made of the one or more tone characters that cannot be spelt, and speech recognition engine must be correct Character it is expected in ground identification.
Basic independent voice unit in Chinese is character, it is as the word role in western language Play an important role in sentence.The name that cannot be guided by language model and statistic frequency when input, address, proper noun and During trade mark, accurate character input is highly important.Moreover, it is difficult by the character input of speech or phonetic, because very much Character shares identical pronunciation.For example, as shown in Figure 1, character " Lee " shares the pronunciation of " Li " with 248 other characters.Therefore It is very difficult to give an oral account record exactly in the case where not informing context or identifies single Chinese character.
Say that the people of Chinese has received description and understood the mode of the given character in every-day language:
By using in example word, phrase or proper noun (such as the name of celebrity, brand or advertisement) Character
One or more radical parts of character it is expected by saying
One or more structural elements of character it is expected by saying
The tone description information of character it is expected by providing
General introduction
Embodiments of the present invention purpose is for the Mandarin speech recognition input based on the description prompting of spontaneous character The computer implemented arrangement of user's disambiguation.Receive from user and inputted for the Chinese speech of automatic speech recognition.Also from user The spontaneous character that description is received in one or more of phonetic entry character describes prompting.Character description prompting is then based on to come The automatic speech recognition of phonetic entry is performed with definite one or more Chinese language characters corresponding to phonetic entry.
In command mode constrains form input word can be transmitted from user or in without constraint natural language input from user Symbol description prompting.Character describes prompting and may also comprise tone description, character motion and/or character location information.Character description carries Show description or the institute for the one or more radical components that may include the example word or described character for using described character The description of one or more charcter topology key elements of the character of description.
The identification grammer and/or fuzzy matching oral account record for manipulating character description prompting can be used in automatic speech recognition Engine.
The brief description of accompanying drawing
Fig. 1 is the table for showing the part Chinese character in many different Chinese characters pronounced substantially with " Li " Lattice.
Fig. 2A -2C show the Mandarin speech recognition input using character description prompting according to the embodiment of the present invention The example of disambiguation.
Fig. 3 A-3B show another example of the character description prompting based on example word.
Fig. 4 A-4B show example of the explanation using another example of the character description prompting based on descriptive radical parts.
Fig. 5 A-5B show example of the explanation using another example of the character description prompting described based on tone.
Fig. 6 shows the example of the speech recognition architecture based on grammer for embodiments of the present invention.
Fig. 7 shows the example of the fuzzy matching oral account record engine speech recognition architecture for embodiments of the present invention Son.
It is described in detail
The various embodiment purposes of the present invention are the disambiguation of the Mandarin speech recognition based on the description prompting of spontaneous character. Use the spontaneous character description prompting for character disambiguation and the existing habit of speaking naturally of the people for saying Chinese in daily life It is used to be well matched with.
In the daily life of people of Chinese is said, people have a variety of mechanism, they by these mechanism from Many candidate items of similar pronunciation specify a specific character.One such mode is to describe the structural element of character.Example Such as, Fig. 2A -2C are shown with the disambiguation of the Mandarin speech recognition input of the character description prompting of the structural element of description character Example.Fig. 2A shows that the Chinese speech from user to the system for automatic speech recognition inputs 201.As shown in Figure 2 B, user Spontaneous character description prompting 202 is also provided, " the sub- Li " of wood describes the structural element of the first character by phonetic entry for it.System is right Afterwards based on character description prompting 202 perform phonetic entries 201 automatic speech recognition with determine correspond to phonetic entry 201 ( In this case be the character " Lee " for being shown as identifying output 203 as that shown in fig. 2 c to user) one or more Chinese languages Say character.
Another normal method for saying the character disambiguation that the people of Chinese makes to say is to include target character by saying Buzzword tells about context.For example, if speaker is briefly " Wei ", people will be unaware which character is expected , because have the pronunciation that too many word shares " Wei ", such as " prestige ", " towering ", " danger ", micro-etc..But as shown in Figure 3A, say Words person can provide character description 302 " towering is towering " of prompting, it is intended that the word such as in everyday words " towering " (meaning towering word) Accord with " towering ", so as to exclude other candidate characters.Compared with single character " towering ", everyday words " towering " more identifies clearly and easily. This is very similar to say in English " " two " or " " too " in me too " in one two three ".System and then base It is shown as in 302 execution automatic speech recognition of character description prompting with determining that phonetic entry 301 corresponds to user such as Fig. 3 B institutes The character " towering " for the identification output 303 shown.
Character description prompting may include the description of one or more radical parts of described character.For example, " mountain prefix It is towering " mean with " mountain " radical parts " towering ".Explaining by doing so, hearer can determine this character very fast, because For only it is " towering " have this radical parts and pronunciation be " Wei ".This is somewhat like combining into syllables in English:" T, W, O, two " or “T,double O,too”.In another example as shown in Figure 4 A, speaker provides the phonetic entry 401 of " Xu " and " has The Xu " of radical Chi character description prompting 402, and system be then based on character description prompting 402 perform automatic speech recognitions with Determine that phonetic entry 401 corresponds to character " Xu ", it is shown as identification output 403 to user as shown in Figure 4 B.
In some embodiments, character description prompting may also comprise the tone description information of described character.For example, As shown in Figure 5A, speaker can provide " the character of the Li " of 2 sound description prompting 502, and system is then based on character description prompting To determine that phonetic entry 501 corresponds to character " multitude ", it is shown as knowing to user as shown in Figure 5 B for 502 execution automatic speech recognitions Do not export 503.
Realizing can be (such as known comprising common buzzword by providing based on a kind of mode arranged as use Point title, name of celebrity etc.) recognition dictionary start.In addition, can be the radical parts usually by describing character And/or structural element creates description rule come the character distinguished.It is not that many characters need this rule, because people are usually only Those characters with considerably less radical parts are so operated.These resources can be based on Data Collection and from existing application User feedback and develop, and be then used to the flexible identification that design can cover the correction behavior of the most typical people for saying Chinese Grammer.Such extension identification grammer then can be added to existing identification using correct programmed logic and apply, with so as to carry For flexible and effective character input/calibration function.
In specific system architecture, user may can without constraint natural language understanding (NLU) pattern in or Prompting is described to input character using character in the command mode inputted using constraint form.Character describe prompting may include as Lower information:
Character describes
Action (such as replace, be inserted into, is additional etc.) (optional)
Position (such as by character place where/which character replaced) (optional)
The example that character description prompting is inputted in NLU patterns is " gift is replaced with sub- Lee of wood by I ".This is as saying English " I want to replace ' two ' with ' toe ' which ends with ' oe ' " is equally.Similarly, user can be with Some things such as " sub- Lee of wood " (for example, as " ' toe ' ends with ' oe ' ") are said with natural-sounding.It that case, User only provides character description, and therefore system may be assumed that it represents additional character.Or user is it may be said that such as " what is said is wood The content of sub- Lee " (" It ' s ' toe ' that ends with ' oe ' ") etc.Because the input is with " it ' s " start, and are System assumes that input is the correction of misrecognition, and because user does not provide positional information, system may assume that to replace and most be connect in sentence Dipped beam target has " two " character of most like pronunciation.
In command mode, the input format of character description prompting is limited to several stringent forms.Such as " gift is replaced It is changed to sub- Lee of wood " (as " Replace ' two ' as ' toe ' which ends with ' oe ' ").
The most of method for the people for saying Chinese can be used to search for, collect for system developer and filtering characters describe example, and From its structure description list --- such as the Toe to be ended up with " oe ", as " the Toe in big toe ".Description list can include More than 50,000 entries describe to be more than 6,000 Chinese characters.Divisible each description, and other possible portion can be built Divide description --- the Toe for example with " oe ".Final description list can be bigger by two to three than the original list of 50,000 entries Times.The list can then be built into identification grammer, and other assist syntax can also be added to identification maneuver information and position Information.Also other optional features, such as gating command or wake-up word can be merged.
Fig. 6 is shown with a specific embodiment of the identifying system architecture of the method based on grammer.User Speech input 601 describes prompting and is provided to ASR engine 602 together with input voice and character, for example, it is calculated using fuzzy matching Method and character describes 604, three specific independent syntax of action 605 and position 606 to be to perform speech recognition with being used for.Make With the information from character description prompting highest credible degree identification result 603 is selected from ASR engine 602.
Fig. 7 shows the example of fuzzy matching oral account record engine architecture.There is provided and describe 707, action 706 for character With the grammar list of position 705, and explained by syntax conversion module 708 to form transitional information A 709.Voiceband user inputs 701 describe prompting together with input voice and character is provided to Speech module 702 to form transitional information B 703.Then Relevant portion in transitional information A 709 and transitional information B 703 is obtained from oral account record engine fuzzy matching module 704 Oral account record result.Highest credible degree identification result 710 is selected from ASR engine 704, and editor's action continues.
Experiment test the result shows that accurate and input naturally and select single Chinese character highly effective, without Special Training from the user or memory.Such monocase is less able to be correctly identified by existing ASR arrangements.
For example, it can be used any conventional computer programming language such as VHDL, SystemC, Verilog, ASM all or part of Realize embodiments of the present invention in ground.The optional embodiment of the present invention can be implemented as the hardware element, other of preprogramming The combination of relevant component or hardware and software component.
Embodiment can be implemented as with computer program product on the computer systems whole or in part.It is such Realization may include to be fixed on for example on the tangible medium of computer-readable medium (such as disk, CD-ROM, ROM or fixed disk) Or meter can be transferred to via modem or other interface equipments (such as the communication adapter of network is connected on medium) The series of computation machine instruction of calculation machine system.Medium can be tangible medium (such as optics or analog communications lines) or use wireless The medium that technology (such as microwave, infrared or other transmission technologys) is realized.Series of computer instructions is embodied previously herein on this The described all or part of function of system.Those of skill in the art, which should be understood that, can be used in many computers A variety of programming languages in architecture or operating system write such computer instruction.In addition, such instruction can store In any memory devices of such as semiconductor, magnetic storage, optical memory or other memory devices, and it can be used Such as any communication technology of optics, infrared, microwave or other transmission technologys is transmitted.Computer program product as it is expected that It can be issued, be led to as the removable medium with subsidiary printed document or electronic document (such as software of close package) Cross in advance mounted in computer system (such as on system ROM or fixed disk) or by network (such as internet or WWW) from Server or bulletin board system are issued.Certainly, some embodiments of the present invention can be implemented as software (such as computer Program product) and hardware combination.The present invention other other embodiment be implemented as pure hardware or pure software (such as Computer program product).
, will be obvious to those of skill in the art although disclosing the various illustrative embodiments of the present invention It is that can make realizing the variations and modifications of some advantages of the present invention, without departing from the true scope of the present invention.

Claims (11)

1. a kind of user's disambiguation for being used for Mandarin speech recognition input using at least one hard-wired computer processor Computer implemented method, the described method includes:
Chinese speech input is received from user and is used for automatic speech recognition, wherein the phonetic entry corresponds to one or more words Symbol;
Spontaneous character description prompting is received from the user, the spontaneous character description prompting includes description and corresponds to the voice One or more of characters of input are inputted without constraint natural language;
Determine that the spontaneous character description prompting is included to using the automatic speech without constraint natural language input based on described The correction of the definite one or more Chinese language characters of identification;And
In response to determining that the spontaneous character description prompting includes the correction, one or more of Chinese language words are corrected Symbol.
2. the method as described in claim 1, wherein correcting one or more of Chinese language characters includes:
Correct being received in the spontaneous character description prompting from the user in one or more of Chinese language characters The then Chinese language character with most like pronunciation nearest from cursor.
3. the method as described in claim 1, wherein the spontaneous character description prompting includes describing the word for the action to be performed Accord with action message.
4. method as claimed in claim 3, wherein the character motion information is at least one in following item including performing Instruction:Replace one or more of Chinese language characters, the one or more new Chinese language characters of insertion and by one Or multiple new Chinese language characters are added to one or more of Chinese language characters.
5. the method as described in claim 1, wherein the spontaneous character description prompting includes describing the character bit of text position Confidence ceases.
6. the method as described in claim 1, wherein the spontaneous character description prompting includes the one or more of words of description The tone description information of the audio tone of symbol.
7. the method as described in claim 1, wherein the spontaneous character, which describes prompting, includes the use of one or more of words The example word of symbol.
8. the method as described in claim 1, wherein the spontaneous character description prompting includes one or more of characters The description of one or more radical parts.
9. the method as described in claim 1, wherein the spontaneous character description prompting includes one or more of characters The description of one or more charcter topology key elements.
10. the method as described in claim 1, is used to manipulate the spontaneous character description wherein the automatic speech recognition uses The identification grammer of prompting.
11. the method as described in claim 1, is used to manipulate the spontaneous character description wherein the automatic speech recognition uses The fuzzy matching oral account record engine of prompting.
CN201280075499.1A 2012-08-29 2012-09-07 In the record correction of intelligent Chinese speech dictation ambiguous characters are effectively inputted using character describer Expired - Fee Related CN104756183B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201261694450P 2012-08-29 2012-08-29
US61/694,450 2012-08-29
PCT/US2012/054181 WO2014035437A1 (en) 2012-08-29 2012-09-07 Using character describer to efficiently input ambiguous characters for smart chinese speech dictation correction

Publications (2)

Publication Number Publication Date
CN104756183A CN104756183A (en) 2015-07-01
CN104756183B true CN104756183B (en) 2018-05-11

Family

ID=50184066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201280075499.1A Expired - Fee Related CN104756183B (en) 2012-08-29 2012-09-07 In the record correction of intelligent Chinese speech dictation ambiguous characters are effectively inputted using character describer

Country Status (2)

Country Link
CN (1) CN104756183B (en)
WO (1) WO2014035437A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107221328B (en) * 2017-05-25 2021-02-19 百度在线网络技术(北京)有限公司 Method and device for positioning modification source, computer equipment and readable medium
CN111667828B (en) * 2020-05-28 2021-09-21 北京百度网讯科技有限公司 Speech recognition method and apparatus, electronic device, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1212403A (en) * 1997-09-19 1999-03-31 国际商业机器公司 Speech recognition method and system for identifying isolated non-relative Chinese character
CN1677328A (en) * 2004-03-29 2005-10-05 台达电子工业股份有限公司 Chinese-character-unit speech-sound inputting method and system
CN1815557A (en) * 2005-02-04 2006-08-09 台达电子工业股份有限公司 Method and device for configuring Chinese new words utilizing voice input
CN101494050A (en) * 2008-01-22 2009-07-29 台达电子工业股份有限公司 Voice identification apparatus and method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI247276B (en) * 2004-03-23 2006-01-11 Delta Electronics Inc Method and system for inputting Chinese character
TWI349925B (en) * 2008-01-10 2011-10-01 Delta Electronics Inc Speech recognition device and method thereof
CN102023995B (en) * 2009-09-22 2013-01-30 株式会社理光 Speech retrieval apparatus and speech retrieval method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1212403A (en) * 1997-09-19 1999-03-31 国际商业机器公司 Speech recognition method and system for identifying isolated non-relative Chinese character
CN1677328A (en) * 2004-03-29 2005-10-05 台达电子工业股份有限公司 Chinese-character-unit speech-sound inputting method and system
CN1815557A (en) * 2005-02-04 2006-08-09 台达电子工业股份有限公司 Method and device for configuring Chinese new words utilizing voice input
CN101494050A (en) * 2008-01-22 2009-07-29 台达电子工业股份有限公司 Voice identification apparatus and method

Also Published As

Publication number Publication date
WO2014035437A1 (en) 2014-03-06
CN104756183A (en) 2015-07-01

Similar Documents

Publication Publication Date Title
US11043213B2 (en) System and method for detection and correction of incorrectly pronounced words
EP1143415B1 (en) Generation of multiple proper name pronunciations for speech recognition
US10460034B2 (en) Intention inference system and intention inference method
US7937262B2 (en) Method, apparatus, and computer program product for machine translation
JP2001100781A (en) Method and device for voice processing and recording medium
WO2016067418A1 (en) Conversation control device and conversation control method
Kirchhoff et al. Cross-dialectal data sharing for acoustic modeling in Arabic speech recognition
JPWO2007097176A1 (en) Speech recognition dictionary creation support system, speech recognition dictionary creation support method, and speech recognition dictionary creation support program
US7502731B2 (en) System and method for performing speech recognition by utilizing a multi-language dictionary
TW202020854A (en) Speech recognition system and method thereof, and computer program product
CN110870004A (en) Syllable-based automatic speech recognition
JP2007155833A (en) Acoustic model development system and computer program
US20150073796A1 (en) Apparatus and method of generating language model for speech recognition
CN105895076B (en) A kind of phoneme synthesizing method and system
Stöber et al. Speech synthesis using multilevel selection and concatenation of units from large speech corpora
Kayte et al. Implementation of Marathi Language Speech Databases for Large Dictionary
US20040006469A1 (en) Apparatus and method for updating lexicon
CN104756183B (en) In the record correction of intelligent Chinese speech dictation ambiguous characters are effectively inputted using character describer
JP4600706B2 (en) Voice recognition apparatus, voice recognition method, and recording medium
KR100484493B1 (en) Spontaneous continuous speech recognition system and method using mutiple pronunication dictionary
AbuZeina et al. Cross-word modeling for Arabic speech recognition
KR20050101694A (en) A system for statistical speech recognition with grammatical constraints, and method thereof
KR20050101695A (en) A system for statistical speech recognition using recognition results, and method thereof
JP2006243213A (en) Language model conversion device, sound model conversion device, and computer program
KR100511247B1 (en) Language Modeling Method of Speech Recognition System

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180511

Termination date: 20210907

CF01 Termination of patent right due to non-payment of annual fee