WO2004077405A1 - Systeme de reconnaissance vocale - Google Patents
Systeme de reconnaissance vocale Download PDFInfo
- Publication number
- WO2004077405A1 WO2004077405A1 PCT/US2003/030090 US0330090W WO2004077405A1 WO 2004077405 A1 WO2004077405 A1 WO 2004077405A1 US 0330090 W US0330090 W US 0330090W WO 2004077405 A1 WO2004077405 A1 WO 2004077405A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- list
- speech
- elements
- sequence
- subunits
- Prior art date
Links
- 238000000034 method Methods 0.000 claims abstract description 83
- 238000013507 mapping Methods 0.000 claims description 16
- 230000004044 response Effects 0.000 claims description 12
- 238000012545 processing Methods 0.000 abstract description 5
- 238000013518 transcription Methods 0.000 description 6
- 230000035897 transcription Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 230000007613 environmental effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000004148 unit process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Definitions
- This invention relates to speech recognition systems. More particularly, this invention relates to speech recognition systems for a user to select a list element from a list or group of list elements.
- Many electronic applications have design processes or sequences that are speech-guided or speech-controlled by a user.
- the electronic applications include destination guidance (navigation) systems for vehicles, telephone and/or address systems, and the like. Vehicles include automobiles, trucks, boats, airplanes, and the like.
- a user provides a voice input to a speech recognition unit.
- the voice input can correspond to a list element that the user desires to select from a list or group of list elements.
- the speech recognition unit processes the voice input and selects the desired list element in response to the processed voice input.
- Speech recognition units typically process a limited number of voice inputs.
- Many speech recognition units can process only a few thousand words (voice inputs) or list elements. When there are large numbers of list elements, the speech recognition unit may not function or may not function properly without additional conditioning or processing of the voice input. The recognition performance may be too low or insufficient memory may exist.
- Many applications have an extensive number of list elements, especially when the list comprises most or all of the available list elements. Such applications include destination guidance (navigation) systems and telephone systems. Navigation and telephone systems typically include numerous city and street names. Telephone systems typically include numerous personal names. These applications may have lists with list elements numbering in the tens to hundreds of thousands. In addition, many speech recognition units may not differentiate between similar sounding list elements, especially when there are numerous list elements that sound alike.
- a speech recognition system for processing voice inputs from a user to select a list element from a list or group of list elements. Recognition procedures may be carried out on the voice input of the user such that the user only has to speak the whole word of a desired list element.
- the system allows a user to select a list element from large lists of list elements.
- the large list is clustered, e.g., compressed by summarization, to be easier handled by a recognizer.
- a first recognition procedure can separate the voice input of a whole word into at least one sequence of speech subunits.
- the subunits are used in a matching procedure to create a sub-list, e.g., the vocabulary for a second recognition procedure.
- a second recognition procedure can compare the voice input of the whole word with the vocabulary of list elements. In this way, the recognition procedures can obtain accurate and reliable speech recognition at reduced memory costs.
- Figure 2 is a flowchart of a method for recognizing speech to select a list element from a list of list elements.
- Figure 1 represents a block diagram or flow chart of a speech recognition system 1, which may be implemented in an electrical device such as a navigation system for a vehicle, a telephone system, or the like.
- the system 1 allows a user to select a list element from large lists of list elements.
- the large list is clustered, e.g., compressed by summarization, to be easier handled by a recognizer.
- the speech recognition system 1 includes an input unit 2, a first speech recognition unit 3, a first vocabulary 4, a recording unit 5, a mapping unit 6, a matching unit and database 7, a second vocabulary 8, a second speech recognition unit 9 and an output unit 10.
- a first recognition process and a second recognition process are preformed, e.g., consecutively, with two vocabularies using the same or different speech recognition units.
- a matching process generates a sub-list that is used as vocabulary for the second recognition process.
- the input unit 2 may be a microphone or similar device.
- the first and second speech recognition units 3 and 9 may be implemented by one or more microprocessors or similar devices.
- the first vocabulary 4, recording unit 5, matching unit and database 7, and second vocabulary 8 may be implemented by one or more memory devices.
- the memory device can include computer-readable data and/or instructions encoded on a computer-readable storage medium.
- the computer-readable storage medium examples include, but are not limited to, an optical storage medium, and electronic storage medium and a magnetic storage medium.
- the output unit 10 may be a speaker, display unit, combination thereof, or the like.
- the speech recognition system 1 may be implemented on a digital signal processing (DSP) or other integrated circuit (IC) chip.
- DSP digital signal processing
- IC integrated circuit
- the speech recognition system 1 may be implemented separately or with other electrical circuitry in the electrical device. While a particular configuration is shown, the speech recognition system 1 may have other configurations including those with fewer or additional components.
- the speech recognition system 1 processes speech or voice inputs from a user to select a list element from a list or group of list elements.
- the list could be any assembly of data, related or unrelated.
- the list elements are particular data entries in list.
- the list may contain navigation data for a navigation system or contact data for a telephone system.
- the navigation data may include place names and street names as list elements.
- the contact data may include personal names, place
- the user states, speaks or otherwise provides a speech or voice input to the input unit 2.
- the voice input is a full description of the desired list element.
- the whole word is the entire list element as spoken.
- the whole word is stored in recording unit 5.
- the speech recognition unit 3 receives the voice input from the input unit 2.
- the voice input may be processed acoustically to reduce or eliminate unwanted environmental noises, such as with the speech recognition units 3 and/or 9, and/or with the input unit 2.
- the speech recognition unit 3 is configured with the vocabulary 4 to separate the voice input of the user into speech subunits. For example, the speech recognition unit 3 breaks down the voice input into phonemes.
- the first speech recognition unit 3 can access the mapping unit 6 and utilize the mapping unit 6 to convert the speech subunits into characters of a character sequence. For example, the mapping unit 6 converts phonemes into letters or a letter sequence.
- the matching unit and database 7 holds the list or group of list elements to be searched. Matching unit and database 7 may hold a whole, entire, or an extensive list having any number of list elements.
- the components of the speech recognition system 1 are configured for use with any type of data as the list elements, such as characters or letters.
- the list of list elements may be installed on the matching unit and database 7 during manufacture and/or during subsequent operation of the electrical device that has the speech recognition system 1.
- the list may be downloaded from a flash memory or similar device.
- the list also may be downloaded via a communication system such as a landline and wireless radio networks, a global satellite network, and the like.
- the matching unit and database 7 uses the mapped recognition results from the mapping unit 6 and selects the best matching list elements from the list contained in the database to generate a second vocabulary 8.
- the second vocabulary 8 can include a smaller amount of list elements than the whole list contained in the matching unit and database 7.
- the speech recognition unit 9 is configured with the second vocabulary 8 to compare the list elements of the sub- list with the voice input of the user stored in the recording unit 5.
- the speech recognition unit 3 may be used as the speech recognition unit 9, and vice versa, such that the desired vocabulary 4, 8 may be used for configuring either the first speech recognition unit 3 or the second speech recognition unit 9, or both.
- the speech recognition system 1 provides an acoustical and/or optical transmission of a list element from the output unit 10 in response to the recognition result.
- the acoustical output may be provided via a speaker or the like and the optical output may be provided via a display unit or the like.
- the list elements outputted from the sub-list include elements recognized as the most likely elements in accordance the voice input of the user.
- Figure 2 is a flowchart of a process for recognizing speech to select a list element from a list of list elements.
- the user speaks SI the full description of the desired list element.
- the list element includes, for example, the name of a city or street for destination input in a navigation system or the name of a person when selecting from a telephone list.
- the voice input may be acoustically processed S2 to reduce or eliminate unwanted environmental noises and the like.
- the voice input of the user is stored S3 in the recording unit 5 for use by the second recognition process.
- the first recognition process may be performed with the first speech recognition unit 3 that is configured with a first vocabulary 4.
- the first vocabulary 4 includes speech subunits, e.g., phonetic units, used to separate the voice input into parts.
- the first speech recognition unit 3 is configured to recognize the speech subunits such as parts of phonemes, whole phonemes, letters, syllables, and the like.
- the first recognition process separates S4 the voice input of the user into the desired parts, e.g., part of a phoneme, at least one phoneme, at least one letter, at least one syllable, or the like.
- a sequence of speech subunits is constructed that includes a sequence of consecutive phoneme parts, a sequence of phonemes, a sequence of letters, a sequence of syllables or the like.
- the first speech recognition unit 3 can also generate several alternative sequences of speech subunits that typically include similar or easily confusable subunits. For example, the speech recognition unit can generate between three and five such alternate sequences.
- the at least one sequence of speech subunits is mapped S5 onto at least one sequence of consecutive characters.
- the mapped characters may be used for the matching process that matches the characters with the list elements of the list.
- the characters of the character sequences might represent phonemes, letters, or syllables or the like.
- the sequences of speech subunits are mapped onto sequences of phonemes, letters, syllables, or the like.
- the speech subunits of the at least one speech subunit sequence are represented in a way that is suitable for the matching process with the list elements of the list.
- the matching process compares the mapped character sequences and the list elements of the list and generates a sub-list S6 from the database 7 containing the full list of list elements. Therefore, the sub-list is constructed as a reduced number of elements of the full list. Mapping depends on the type of matching process utilized and the characters of the at least one character sequence and the representation of the list elements of the list to be matched.
- the matching process can either use the speech subunits themselves (that is, for the matching process the characters of the at least one character sequence correspond directly to the representation of the speech subunits) or the sequence of speech subunits may be mapped onto a character sequence (e.g., by transforming the phonemes of the sub language units to letters). If the matching process does not require the same representation of the at least one character sequence and the list elements, e.g., by using an automatic transcription, the matching process can directly construct phoneme sequences with letter sequences. The at least one sequence of sub speech subunits may be used directly as character sequence for the matching process without any additional mapping. Preferably however, the same representation is used in the matching process for both the at least one character sequence and the list elements of the list.
- the mapping process may be avoided if the representation is the same for both the at least one speech subunit sequence and the list elements, e.g., the speech subunits are phonemes and phonetic transcriptions are available for each list element. If the representations differ, the representations of the speech subunits and the list elements are mapped to be the same. For example, if the speech subunits are phonemes and the list elements are represented by letter sequences, then the phonemes are mapped to letters prior to mapping or the letters may be mapped to phonemes.
- the at least one sequence of speech subunits are mapped to at least one character sequence. Preferably, a hypothesis list of several character sequences with similar or easily confusable characters is generated from the speech subunit sequences.
- the list elements of the full list may be scored (e.g., using a percentage that the list element is a match) and the list elements with the best scores (e.g., highest probabilities) are included in the sub list S7.
- the best scores may be awarded to those list elements with the best fit to the at least one character sequence.
- the matching process includes an error tolerant filtering process of the full list or the database containing all list elements. Error tolerance is used since the at least one character sequence from the first recognition process might be erroneous.
- the speech recognition unit may have selected the wrong speech subunits or the mapping process may not accurately map the at least one speech subunit sequence to the at least one character sequence.
- a maximum number of list elements located in the sub-list can depend on the available memory of the speech recognition system and the properties of the speech recognition unit. Within these limitations and considering the size of the full list in the database and the minimum number of spoken characters (letters, phonemes, syllables, etc.) the number of elements in the sub-list may be fixed or variable as a parameter for the matching process. Preferably the number of list elements contained in the sub-list is large enough to increase the probability of the "correct" list element being contained in the sub list, and thus included in the vocabulary of the second recognition process. Typically, the sub-list contains between several hundred and several thousand list elements.
- the second recognition process is performed S8 on the recorded speech input with the second recognition unit 9 configured with a second vocabulary 8 that was generated from the sub-list.
- the stored utterance is delivered to the second recognition unit 9 configured with the sub list (the list elements of the sub-list) as vocabulary.
- the second vocabulary 8 may be generated using either the extracted entries of the list elements of the sub list (i.e., based on the text in the list elements such as ASCII-text) or using phonetic transcriptions assigned to the list elements, if available (i.e., based on phonetic transcriptions such as IPA or SAMPA transcriptions).
- the second vocabulary 9 is loaded into the second speech recognition unit 9. A higher quality and thus a higher recognition rate may be achieved if phonetic transcriptions are assigned to the list elements such as proper names (e.g., city names or names in an address book) since proper names often do not follow the normal pronunciation rules of the language.
- At least one result S9 of the second recognition process is outputted.
- the result of the second recognition process is at least one list element having the highest probability of corresponding to the voice input of the user. Preferably more than one "probable" list element is selected.
- the speech recognition unit selects the five to ten most probable list elements.
- the recognition result is displayed to the user in an appropriate form such as an an output unit or a display device in accordance with optical and/or acoustical output.
- the process is presented using "Blaustein" as an example of a list element from a list containing names of places in Germany as the list elements.
- the user speaks S 1 the voice input to an input unit or microphone of a speech recognition system.
- the voice input includes a whole word of the desired list element such as "Blaustein.”
- the voice inputted whole word e.g., "Blaustein” is stored S3 in a recording unit 5.
- Other words can also be spoken by the user that correspond to a list element that the user desires to select, such as the name of a person, such as in the case of making a selection from a telephone data base or an address data base.
- the input unit 2 or the speech recognition unit 3 or another component of the speech recognition system 1 can acoustically process S2 the voice input to reduce or eliminate unwanted environmental noises and the like.
- the speech recognition unit 3 can also separate into speech subunits S4 the outputted whole word, e.g., "Blaustein.”
- the speech recognition unit 3 may be configured with the first vocabulary 4 that contains, e.g., phonemes or other elements to aid in breaking down the whole word.
- At least one result or a plurality of resulting sequences of speech subunits may be generated from the voice input of the user. Sequences of the speech subunits may be composed of subunits that are similar to each other or that may be confused with each other.
- the sequences of the speech subunits may be mapped S5 by the mapping unit 6 to form a sequence of characters.
- the phoneme sequences "1 a: n S t a I n", “traUnStal n", “g r a U S t a I n.” may be converted into character sequences "LAHNSTEIN” and "GRAUSTEI N", which contain 9 letters, and the character sequence "T R A UNSTEIN” that contains 10 letters.
- a hypothesis list of character sequences, or letter sequences, is therefore created.
- the hypothesis list may contain character sequences having similar-sounding and/or easily confused letters.
- the matching procedure S6 compares at least one character sequence from the generated hypothesis list of character sequences with a list of elements. For example, character sequences "LAHNSTEI N,” “G R A U S T E I N” and "TRAUNSTEI N", which were configured by the mapping unit 6 may be compared with a list of elements, e.g., names of places, represented by the whole list of elements stored in the matching unit and database 7. Depending on the matching procedure used, the mapping procedure may not be necessary.
- the hypothesis list may be formed from the sequence of phonemes instead of characters.
- the comparison procedure generates a sub-list of list elements, e.g., names of places, the letters of that correspond identically to or are similar to the characters of the character sequences of the hypothesis list.
- the list elements e.g., names of places, having letters that sound similar to the letters in the character sequences may be added to the hypothesis list and accounted for in the matching procedure.
- the place names "Blaustein”, “Lahnstein”, “Traunstein”, “Graustein”, etc. can also be considered as list elements of the generated sub-list.
- the second vocabulary 8 is generated S7 in accordance with the sub-list, e.g., the list elements such as place names generated from the comparison of the whole list of place name with the hypothesis list.
- the second vocabulary 8 may be used during a second recognition procedure.
- the second speech recognition unit 9 is configured with this sub-list via the second vocabulary 8. Therefore, the second speech recognition unit 9 has access to a refined and substantially reduced list compared with the whole list of names.
- the inputted whole word is matched with a list element contained in the sub-list of the second vocabulary 8.
- the speech recognition unit 9 can access from the recording unit 5 the inputted whole word, e.g., "Blaustein.”
- the whole word is compared with the list elements contained in the sub-list, e.g., names of places such as "Blaustein", “Lahnstein”, “Traunstein”, “Graustein”, etc.
- the list elements contained in the sub-list e.g., names of places such as "Blaustein", “Lahnstein”, “Traunstein”, “Graustein”, etc.
- at least one list element is selected as the list element that most likely matches the voice input of the user. For example, at least one place name, is selected as the recognition result of the second recognition procedure.
- a plurality of list elements may be selected from the sub- list as most likely to correspond to the spoken whole word, e.g., "Blaustein,” such as “Blaustein”, “Lahnstein,” “Traunstein” and “Graustein".
- the second speech recognition unit 9 can generate five to ten list elements as the most likely elements to match the element desired by the user.
- the recognition result of the speech recognition unit 9, e.g., the four place names "Blaustein”, “Lahnstein”, “Traunstein”, “Graustein,” may be communicated to the user S9.
- the recognition results may be communicated via an acoustic output unit 10 such as a speaker.
- the recognition results may be communicated via an optical output unit such as a display.
- the user can select the desired list element from the recognition result list.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Navigation (AREA)
Abstract
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2003273357A AU2003273357A1 (en) | 2003-02-21 | 2003-09-24 | Speech recognition system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/371,982 US7392189B2 (en) | 2002-02-23 | 2003-02-21 | System for speech recognition with multi-part recognition |
US10/371,982 | 2003-02-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2004077405A1 true WO2004077405A1 (fr) | 2004-09-10 |
Family
ID=32926208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2003/030090 WO2004077405A1 (fr) | 2003-02-21 | 2003-09-24 | Systeme de reconnaissance vocale |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU2003273357A1 (fr) |
WO (1) | WO2004077405A1 (fr) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1975923A1 (fr) | 2007-03-28 | 2008-10-01 | Harman Becker Automotive Systems GmbH | Reconnaissance vocale non-native plurilingues |
EP2081185A1 (fr) | 2008-01-16 | 2009-07-22 | Harman/Becker Automotive Systems GmbH | Reconnaissance vocale sur des grandes listes à l'aide de fragments |
EP2221806A1 (fr) | 2009-02-19 | 2010-08-25 | Harman Becker Automotive Systems GmbH | Reconnaissance vocale d'une saisie de liste |
EP2259252A1 (fr) | 2009-06-02 | 2010-12-08 | Harman Becker Automotive Systems GmbH | Procédé de reconnaissance vocale pour sélectionner une combinaison d'éléments de liste via une saisie vocale |
EP3618065B1 (fr) | 2006-01-06 | 2021-05-26 | Pioneer Corporation | Appareil de reconnaissance de mots |
WO2023050541A1 (fr) * | 2021-09-28 | 2023-04-06 | 科大讯飞股份有限公司 | Procédé d'extraction de phonème, procédé et appareil de reconnaissance de la parole, dispositif et support de stockage |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0282272A1 (fr) * | 1987-03-10 | 1988-09-14 | Fujitsu Limited | Système de reconnaissance de la parole |
US5202952A (en) * | 1990-06-22 | 1993-04-13 | Dragon Systems, Inc. | Large-vocabulary continuous speech prefiltering and processing system |
EP1162602A1 (fr) * | 2000-06-07 | 2001-12-12 | Sony International (Europe) GmbH | Reconnaissance de la parole en deux passes avec restriction du vocabulaire actif |
-
2003
- 2003-09-24 AU AU2003273357A patent/AU2003273357A1/en not_active Abandoned
- 2003-09-24 WO PCT/US2003/030090 patent/WO2004077405A1/fr not_active Application Discontinuation
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0282272A1 (fr) * | 1987-03-10 | 1988-09-14 | Fujitsu Limited | Système de reconnaissance de la parole |
US5202952A (en) * | 1990-06-22 | 1993-04-13 | Dragon Systems, Inc. | Large-vocabulary continuous speech prefiltering and processing system |
EP1162602A1 (fr) * | 2000-06-07 | 2001-12-12 | Sony International (Europe) GmbH | Reconnaissance de la parole en deux passes avec restriction du vocabulaire actif |
Non-Patent Citations (1)
Title |
---|
NEUBERT F ET AL: "Directory name retrieval over the telephone in the Picasso project", PROCEEDINGS IEEE WORKSHOP ON INTERACTIVE VOICE TECHNOLOGY FOR TELECOMMUNICATIONS APPLICATIONS, XX, XX, 1998, pages 31 - 36, XP002110489 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3618065B1 (fr) | 2006-01-06 | 2021-05-26 | Pioneer Corporation | Appareil de reconnaissance de mots |
US20090210230A1 (en) * | 2007-01-16 | 2009-08-20 | Harman Becker Automotive Systems Gmbh | Speech recognition on large lists using fragments |
EP1975923A1 (fr) | 2007-03-28 | 2008-10-01 | Harman Becker Automotive Systems GmbH | Reconnaissance vocale non-native plurilingues |
EP2081185A1 (fr) | 2008-01-16 | 2009-07-22 | Harman/Becker Automotive Systems GmbH | Reconnaissance vocale sur des grandes listes à l'aide de fragments |
CN101515457B (zh) * | 2008-01-16 | 2013-01-02 | 纽昂斯通讯公司 | 利用片段对大列表进行语音识别 |
US8401854B2 (en) | 2008-01-16 | 2013-03-19 | Nuance Communications, Inc. | Speech recognition on large lists using fragments |
US8731927B2 (en) | 2008-01-16 | 2014-05-20 | Nuance Communications, Inc. | Speech recognition on large lists using fragments |
EP2221806A1 (fr) | 2009-02-19 | 2010-08-25 | Harman Becker Automotive Systems GmbH | Reconnaissance vocale d'une saisie de liste |
US8532990B2 (en) | 2009-02-19 | 2013-09-10 | Nuance Communications, Inc. | Speech recognition of a list entry |
EP2259252A1 (fr) | 2009-06-02 | 2010-12-08 | Harman Becker Automotive Systems GmbH | Procédé de reconnaissance vocale pour sélectionner une combinaison d'éléments de liste via une saisie vocale |
WO2023050541A1 (fr) * | 2021-09-28 | 2023-04-06 | 科大讯飞股份有限公司 | Procédé d'extraction de phonème, procédé et appareil de reconnaissance de la parole, dispositif et support de stockage |
Also Published As
Publication number | Publication date |
---|---|
AU2003273357A1 (en) | 2004-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7392189B2 (en) | System for speech recognition with multi-part recognition | |
US8666743B2 (en) | Speech recognition method for selecting a combination of list elements via a speech input | |
US7826945B2 (en) | Automobile speech-recognition interface | |
US6243680B1 (en) | Method and apparatus for obtaining a transcription of phrases through text and spoken utterances | |
US7043431B2 (en) | Multilingual speech recognition system using text derived recognition models | |
US6230132B1 (en) | Process and apparatus for real-time verbal input of a target address of a target address system | |
EP1936606B1 (fr) | Reconnaissance vocale multi-niveaux | |
US6208964B1 (en) | Method and apparatus for providing unsupervised adaptation of transcriptions | |
US5983177A (en) | Method and apparatus for obtaining transcriptions from multiple training utterances | |
US20050182558A1 (en) | Car navigation system and speech recognizing device therefor | |
EP2048655B1 (fr) | Reconnaissance vocale à plusieurs étages sensible au contexte | |
JP2007233412A (ja) | ユーザが定義したフレーズの話者に依存しない認識方法及びシステム | |
US20040210438A1 (en) | Multilingual speech recognition | |
EP0984430A2 (fr) | Reconnaisseur de parole à taille réduit indépendant du langage et du vocabulaire utilisant un apprentissage par éppelation des mots | |
US20070016421A1 (en) | Correcting a pronunciation of a synthetically generated speech object | |
EP1975923B1 (fr) | Reconnaissance vocale non-native plurilingues | |
US9997155B2 (en) | Adapting a speech system to user pronunciation | |
US20130080172A1 (en) | Objective evaluation of synthesized speech attributes | |
US9911408B2 (en) | Dynamic speech system tuning | |
EP1734509A1 (fr) | Méthode et système de reconnaissance de la parole | |
EP1933302A1 (fr) | Procédé de reconnaissance vocale | |
US20140067400A1 (en) | Phonetic information generating device, vehicle-mounted information device, and database generation method | |
WO2004077405A1 (fr) | Systeme de reconnaissance vocale | |
US7392182B2 (en) | Speech recognition system | |
US11361752B2 (en) | Voice recognition dictionary data construction apparatus and voice recognition apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |