WO2001001389A2 - Voice recognition method and device - Google Patents
Voice recognition method and device Download PDFInfo
- Publication number
- WO2001001389A2 WO2001001389A2 PCT/DE2000/001056 DE0001056W WO0101389A2 WO 2001001389 A2 WO2001001389 A2 WO 2001001389A2 DE 0001056 W DE0001056 W DE 0001056W WO 0101389 A2 WO0101389 A2 WO 0101389A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- keyword
- speech recognition
- recognition system
- sequence
- speech
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Definitions
- Voice control systems has been one of the main lines of development in computer technology for years. In the course of this development, considerable progress has been made and marketable voice recognition systems have been established that also prove themselves in practical use. Advanced systems of this type are also generally suitable for voice control of a computer or connected peripheral devices. Simple speech recognition systems, which, however, can only process a relatively small vocabulary, are also already being used in the areas of consumer electronics and automotive equipment, as well as in other areas in which acoustic control of devices is possible and sensible due to a limited vocabulary.
- Keyword sequences mostly have a relatively strictly defined information structure, which, when processed appropriately, enables particularly simple and reliable recognition, and they are also often associated with voice control tasks, such as entering a number code, a telephone number, a time or one date.
- voice control tasks such as entering a number code, a telephone number, a time or one date.
- the processing of such sequences takes place according to the state of the art (and to a certain extent quite successfully) in Framework of conventional speech recognition systems, for example on the basis of the known hidden Markov modeling, whereby a step-by-step output of the recognition result is also possible - for example by means of the partial traceback method.
- the invention is based on the object of specifying a method of the generic type and an apparatus for carrying out the method, which enable a more reliable, simpler and faster recognition of keyword sequences.
- the invention includes the essential idea of solving the problem of recognizing a coherent keyword sequence better and more reliably by dividing the recognition process into two or more sub-steps, in each of which a specific speech recognition system is used. This idea is based on the realization that speech recognition systems with a relatively small vocabulary can work significantly faster and more safely than
- Speech recognition systems with a large vocabulary It also proceeds from the idea that certain key word sequences that occur frequently and that are meaningful in everyday language use also have a relatively clearly defined information structure, so that conditional activation of several existing speech recognition systems, each with a specific vocabulary, in successive sub-steps depending on the acquisition result of the respective one preceding sub-step is advantageously applicable. Furthermore, the invention is based on the knowledge that, especially under adverse acoustic conditions (with loud ambient noise or relatively strong distortions), speech recognition systems small vocabulary provide much better accuracy than those with large vocabulary. The conditional use of several systems with a small vocabulary therefore increases the detection rate for keyword sequences as such and, on the other hand, reduces the rate of incorrect detections.
- the interlinked speech recognition systems are successively activated and, after solving their specific recording task and storing a recorded keyword or part of a keyword sequence, are deactivated again, whereupon another system is activated to solve its assigned recording task, a detected further keyword or another Part of a keyword sequence is stored, etc. etc.
- keyword sequences are put together in an orderly manner and output or transmitted to a corresponding control unit for the realization of a control task.
- a time window is predetermined in the speech stream, within which a second (or further one) ) The result of the registration must be available.
- this time window can be an absolute one
- Time span or a time span related to actually incoming speech signals After the window has passed in the absence of a detection result, the system first used is reactivated.
- Speech recognition systems enables a buffering of the speech data is provided.
- a process that follows the FIFO (first-in, first-out) principle continuously stores a last section of the speech stream with a predetermined length as a buffer section.
- the length of the buffer section depends on the detection speed of the first speech recognition system, which must be so long that the time period between the utterance of the keyword and its detection is buffered (with an additional security amount).
- the speech stream is processed with a delay by this buffer section in the second acquisition step, which is triggered by the presence of the result of the first acquisition step.
- a particularly important application of the invention is represented by key word sequences in which the first keyword or the first part is such that it is followed regularly by a section or part of the speech stream containing a number or numbers.
- a system specially adapted to the recognition of numbers or combinations of numbers is used as the second speech recognition system.
- the terms "number”, “telephone number”, “date”, “time” or the like can be used as the first keywords of a keyword sequence. occur, and these terms will be followed by strings of digits or certain combinations of digits / words, for the recognition of which a system with a correspondingly limited vocabulary can be activated.
- Keyword sequences Another important field of application for the voice control of computers or computer peripherals is keyword sequences, in which the first key word is one Class of devices (eg "device"), while in other parts of the sequence special devices or devices are named that are to be activated in any way.
- device Class of devices
- special devices or devices are named that are to be activated in any way.
- Voice control of other technical devices in the professional or private sector for example devices in the car or in the household (such as navigation systems, audio or video systems, household devices, telecommunications terminal devices, toys etc.) of great economic interest.
- devices in the car or in the household such as navigation systems, audio or video systems, household devices, telecommunications terminal devices, toys etc.
- FIG. 1 shows a schematic illustration of a simple embodiment of the invention in the form of a functional block diagram
- Fig. 2 is a graphical representation to illustrate the
- Fig. 3 is a schematic representation of a further embodiment in the form of a functional block diagram.
- the speech stream S is at a branch point 101 divided into two (information-equal) speech streams Sl and S2.
- the partial speech stream S1 arrives directly at the input of a first speech recognition unit 102, specifically at a first input of a first detection stage 102a, to the second input of which a first vocabulary memory 102b is connected.
- the first detection stage 102a has a control output connected to a speech recognition sequence control 103 and a data output connected to a first keyword memory 104.
- the second partial speech stream S2 arrives at the input of a ring speech buffer 105, in which the last section of the speech stream is temporarily stored and at whose output a partial speech stream S2 'delayed by the buffer speech stream section is thus output. This comes to
- a second speech recognition unit 106 which - analogous to the first speech recognition unit 102 - consists of a second acquisition stage 106a and a second vocabulary memory 106b.
- the data output of the second detection stage 106a is connected to a second keyword memory 107.
- the outputs of both keyword memories 104, 107 are connected to inputs of a sequence memory 108, the output of which also represents the output of the device 100.
- the speech recognition sequence control has two control outputs which are connected to control inputs of the first and second speech recognition units 102 and 106, respectively.
- the speech stream S (in the form of the partial speech stream S1 carrying the entire information content) is checked in the first speech recognition unit 102, which is activated by the speech recognition sequence controller 103 at the start of the recognition process, to determine whether a word stored in the first vocabulary memory 102b occurs. If such a word occurs, this is registered in the first detection unit 102a and the word in question is transferred to the first keyword memory 104 and at the same time a control signal is output to the speech recognition sequence controller 103. This thereupon deactivates the first speech recognition unit 102 and activates the second - until then inactive - speech recognition unit 106.
- the delayed partial speech stream S2 1 arrives at its input, and (like the partial speech stream S1 in the first detection unit 102) this is detected in the second detection unit 106 when a second keyword occurs of a set of words stored in the second vocabulary memory 106b.
- a second keyword is detected by the second detection stage 106a, it is output to the second keyword memory 107.
- a control signal is output to the speech recognition sequence controller 103, which then deactivates the second speech recognition unit 106 again and activates the first speech recognition unit 102 instead.
- the speech recognition sequence controller 103 controls an output of the words stored in the first and second keyword memories 104, 107 to the sequence memory 106, where they are stored in an orderly manner and are provided for output from the device 100.
- this completes the acquisition of a keyword sequence using two different speech recognition units with differentiated, respectively reduced vocabulary.
- ⁇ device> an element from a finite set of devices, e.g. "Computer”
- System time recognizer 5. System: detection of the individual device names from a predetermined supply.
- System 1 must also provide information about the (time) end point of the recognized keyword sequence.
- the recognition continues at this point, so buffering is necessary.
- the detection systems have to keep pace at least.
- FIG. 2 The function of buffering the last section of the speech stream for seamless processing by the second speech recognition unit (“System 2”) is outlined in FIG. 2.
- System 1 the time of detection of a first keyword sequence "input telephone number" by the first speech recognition unit ("System 1")
- t E the time end point of this first keyword sequence
- P h the time end point of this first keyword sequence
- P h the corresponding scanning position of the system 2 at the same time t 0 (at which it is currently being activated).
- the buffering thus clearly ensures that the time which elapses through the processing time of the system 1 until the detection of the first keyword sequence, which of course corresponds to a section of the voice stream, does not lead to a loss of voice stream data. Without the buffering, the first two digits "4" and "6" would in principle be lost for the system 2 in the example shown here and would therefore no longer be accessible to a detection.
- FIG. 3 shows a speech processing device 200 which is modified compared to the device from FIG. 1 and which is distinguished by a double cascading of speech recognition systems and a selection option for different systems in the second stage.
- the first and second stages with the components 201 to 208 are essentially the same as in the device according to FIG. 1 and are designated with corresponding reference numerals, and these components are not explained again here.
- the sequence memory 208 is designed here, as symbolized by the division with two dashed vertical lines, to accommodate a three-part keyword sequence.
- the partial signal stream S2 'from (here: first) speech buffer 205 is branched at a branch point 209 on the one hand to the second detection stage 206a and on the other hand to a second speech buffer 210.
- the third speech recognition unit 211 also contains a specific vocabulary memory 211b which is connected to a further input of the third detection stage 211a. Also analogous to the execution of the first and second Here too, the (third) detection stage is followed by a (third) keyword memory 212, which in turn is connected on the output side to the sequence memory 208.
- the assemblies 210 to 212 implement, as can be easily derived from the above explanations for FIG. 1, a third step of recognizing a keyword sequence which also corresponds to a third hierarchical level of the method.
- a selector stage 203S is connected to the output of the first detection stage (in addition to the first keyword memory 204), which is organized in the form of a lookup table and in each case assigns and records one of several available second speech recognition units to individually acquired first keywords outputs the corresponding selection signal to the speech recognition sequence control 203.
- the dash-dotted arrows projecting upward from this indicate that, in addition to the second speech recognition unit 206 shown in the figure, other speech recognition units of the second level can optionally be controlled. Of course, these, too - like the second speech recognition unit 206 shown in the figure is assigned the third speech recognition unit 211 - can again be assigned speech recognition units of the third level.
- a similar selector stage can also be provided between the second and third levels, so that a selected one of several third-party speech recognition units available is activated at this level as a function of the recognized second keyword or second part of a keyword sequence could be.
- cascading is also possible with a single buffer, the delay time of which is then variable and which tends to have to be reduced in order to implement step-keeping processing.
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00929282A EP1190413A2 (en) | 1999-06-24 | 2000-04-05 | Voice recognition method and device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19929000.8 | 1999-06-24 | ||
DE19929000 | 1999-06-24 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2001001389A2 true WO2001001389A2 (en) | 2001-01-04 |
WO2001001389A3 WO2001001389A3 (en) | 2001-03-29 |
Family
ID=7912410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/DE2000/001056 WO2001001389A2 (en) | 1999-06-24 | 2000-04-05 | Voice recognition method and device |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1190413A2 (en) |
CN (1) | CN1365487A (en) |
HU (1) | HUP0201923A2 (en) |
WO (1) | WO2001001389A2 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7162424B2 (en) | 2001-04-26 | 2007-01-09 | Siemens Aktiengesellschaft | Method and system for defining a sequence of sound modules for synthesis of a speech signal in a tonal language |
CN102374864A (en) * | 2010-08-13 | 2012-03-14 | 国基电子(上海)有限公司 | Voice navigation equipment and voice navigation method |
DE102010040553A1 (en) * | 2010-09-10 | 2012-03-15 | Siemens Aktiengesellschaft | Speech recognition method |
DE102010049869A1 (en) * | 2010-10-28 | 2012-05-03 | Volkswagen Ag | Method for providing voice interface in vehicle, involves determining hit list from stored data depending on assigned category and comparison result |
CN102708858A (en) * | 2012-06-27 | 2012-10-03 | 厦门思德电子科技有限公司 | Voice bank realization voice recognition system and method based on organizing way |
DE102013001219A1 (en) * | 2013-01-25 | 2014-07-31 | Inodyn Newmedia Gmbh | Method for voice activation of a software agent from a standby mode |
CN105912092A (en) * | 2016-04-06 | 2016-08-31 | 北京地平线机器人技术研发有限公司 | Voice waking up method and voice recognition device in man-machine interaction |
WO2022125294A1 (en) * | 2020-12-10 | 2022-06-16 | Google Llc | Hotphrase triggering based on a sequence of detections |
DE102021005206B3 (en) | 2021-10-19 | 2022-11-03 | Mercedes-Benz Group AG | Method and device for determining a multi-part keyword |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004343232A (en) | 2003-05-13 | 2004-12-02 | Nec Corp | Communication apparatus and communication method |
DE102007033472A1 (en) * | 2007-07-18 | 2009-01-29 | Siemens Ag | Method for speech recognition |
CN102332265B (en) * | 2011-06-20 | 2014-04-16 | 浙江吉利汽车研究院有限公司 | Method for improving voice recognition rate of automobile voice control system |
US9117449B2 (en) * | 2012-04-26 | 2015-08-25 | Nuance Communications, Inc. | Embedded system for construction of small footprint speech recognition with user-definable constraints |
US10229676B2 (en) | 2012-10-05 | 2019-03-12 | Avaya Inc. | Phrase spotting systems and methods |
WO2015030474A1 (en) | 2013-08-26 | 2015-03-05 | 삼성전자 주식회사 | Electronic device and method for voice recognition |
CN105302082A (en) * | 2014-06-08 | 2016-02-03 | 上海能感物联网有限公司 | Controller apparatus for on-site automatic navigation and car driving by non-specific person foreign language speech |
CN104538030A (en) * | 2014-12-11 | 2015-04-22 | 科大讯飞股份有限公司 | Control system and method for controlling household appliances through voice |
CN105261356A (en) * | 2015-10-30 | 2016-01-20 | 桂林信通科技有限公司 | Voice recognition system and method |
CN107331391A (en) * | 2017-06-06 | 2017-11-07 | 北京云知声信息技术有限公司 | A kind of determination method and device of digital variety |
CN107331396A (en) * | 2017-07-05 | 2017-11-07 | 北京云知声信息技术有限公司 | Export the method and device of numeral |
CN109003604A (en) * | 2018-06-20 | 2018-12-14 | 恒玄科技(上海)有限公司 | A kind of audio recognition method that realizing low-power consumption standby and system |
CN110211576B (en) * | 2019-04-28 | 2021-07-30 | 北京蓦然认知科技有限公司 | Voice recognition method, device and system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19635754A1 (en) * | 1996-09-03 | 1998-03-05 | Siemens Ag | Speech processing system and method for speech processing |
-
2000
- 2000-04-05 WO PCT/DE2000/001056 patent/WO2001001389A2/en not_active Application Discontinuation
- 2000-04-05 EP EP00929282A patent/EP1190413A2/en not_active Withdrawn
- 2000-04-05 CN CN00809342A patent/CN1365487A/en active Pending
- 2000-04-05 HU HU0201923A patent/HUP0201923A2/en unknown
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19635754A1 (en) * | 1996-09-03 | 1998-03-05 | Siemens Ag | Speech processing system and method for speech processing |
Non-Patent Citations (2)
Title |
---|
"Support for ViaVoice Gold for Windows 95 and NT" IBM VIAVOICE : SUPPORT, [Online] Seiten 1-11, XP002145003 Gefunden im Internet: <URL:http:/www-4.ibm.com/software/speech/s upport/faqvvg.html#5.6> [gefunden am 2000-08-15] * |
DATABASE INSPEC [Online] INSTITUTE OF ELECTRICAL ENGINEERS, STEVENAGE, GB; DE GLANVILLE H: "Speak naturally to your system and correct it when it sneezes" Database accession no. 5865722 XP002145006 & BJHC&IM-BRITISH JOURNAL OF HEALTHCARE COMPUTING & INFORMATION MANAGEMENT, MARCH 1998, BJHC, UK, Bd. 15, Nr. 2, Seite 48 ISSN: 0265-5217 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7162424B2 (en) | 2001-04-26 | 2007-01-09 | Siemens Aktiengesellschaft | Method and system for defining a sequence of sound modules for synthesis of a speech signal in a tonal language |
CN102374864A (en) * | 2010-08-13 | 2012-03-14 | 国基电子(上海)有限公司 | Voice navigation equipment and voice navigation method |
DE102010040553A1 (en) * | 2010-09-10 | 2012-03-15 | Siemens Aktiengesellschaft | Speech recognition method |
DE102010049869A1 (en) * | 2010-10-28 | 2012-05-03 | Volkswagen Ag | Method for providing voice interface in vehicle, involves determining hit list from stored data depending on assigned category and comparison result |
DE102010049869B4 (en) | 2010-10-28 | 2023-03-16 | Volkswagen Ag | Method for providing a voice interface in a vehicle and device therefor |
CN102708858A (en) * | 2012-06-27 | 2012-10-03 | 厦门思德电子科技有限公司 | Voice bank realization voice recognition system and method based on organizing way |
DE102013001219A1 (en) * | 2013-01-25 | 2014-07-31 | Inodyn Newmedia Gmbh | Method for voice activation of a software agent from a standby mode |
DE102013001219B4 (en) * | 2013-01-25 | 2019-08-29 | Inodyn Newmedia Gmbh | Method and system for voice activation of a software agent from a standby mode |
CN105912092A (en) * | 2016-04-06 | 2016-08-31 | 北京地平线机器人技术研发有限公司 | Voice waking up method and voice recognition device in man-machine interaction |
WO2022125294A1 (en) * | 2020-12-10 | 2022-06-16 | Google Llc | Hotphrase triggering based on a sequence of detections |
US11694685B2 (en) | 2020-12-10 | 2023-07-04 | Google Llc | Hotphrase triggering based on a sequence of detections |
DE102021005206B3 (en) | 2021-10-19 | 2022-11-03 | Mercedes-Benz Group AG | Method and device for determining a multi-part keyword |
Also Published As
Publication number | Publication date |
---|---|
WO2001001389A3 (en) | 2001-03-29 |
CN1365487A (en) | 2002-08-21 |
EP1190413A2 (en) | 2002-03-27 |
HUP0201923A2 (en) | 2002-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2001001389A2 (en) | Voice recognition method and device | |
DE2953262C2 (en) | ||
DE69827202T2 (en) | A method and apparatus for counting words for continuous speech recognition for use in reliable speech announcement interruption and early speech endpointing | |
DE2753277C2 (en) | Method and device for speech recognition | |
DE10015960C2 (en) | Speech recognition method and speech recognition device | |
DE69725091T2 (en) | Method and system for editing sentences during continuous speech recognition | |
DE2326517A1 (en) | METHOD AND CIRCUIT ARRANGEMENT FOR DETECTING SPOKEN WORDS | |
EP0319078A2 (en) | Method and apparatus for the determination of the begin and end points of isolated words in a speech signal | |
EP1085499A2 (en) | Spelled mode speech recognition | |
DE3238853A1 (en) | VOICE-CONTROLLABLE ACTUATOR FOR MOTOR VEHICLES | |
DE19851287A1 (en) | Data processing system or communication terminal with a device for recognizing spoken language and method for recognizing certain acoustic objects | |
EP1063633B1 (en) | Method of training an automatic speech recognizer | |
DE3215868A1 (en) | Method and arrangement for recognising the words in a continuous word chain | |
EP0834859B1 (en) | Method for determining an acoustic model for a word | |
DE19646634A1 (en) | Command entry method using speech | |
EP0760151B1 (en) | Process for recognising voice signals and device for implementing it | |
DE19514849A1 (en) | Remote control of device through communications network | |
DE3928049A1 (en) | VOICE-CONTROLLED ARCHIVE SYSTEM | |
EP0677835A2 (en) | Process to ascertain a series of words | |
DE3137314A1 (en) | Circuit arrangement for voice-controlled hands-free apparatuses | |
EP1256935A2 (en) | Training process and use of a speech recognition system, speech recognizer and training system | |
DE3935308C1 (en) | Speech recognition method by digitising microphone signal - using delta modulator to produce continuous of equal value bits for data reduction | |
DE10131157C1 (en) | Dynamic grammatical weighting method for speech recognition system has existing probability distribution for grammatical entries modified for each identified user | |
DE10253868B3 (en) | Test and reference pattern synchronization method e.g. for speech recognition system, has test pattern potential synchronization points associated with reference synchronization points | |
DE19824450C2 (en) | Method and device for processing speech signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 00809342.3 Country of ref document: CN |
|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): CN HU US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): CN HU US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2000929282 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10018843 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 2000929282 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000929282 Country of ref document: EP |