GB2383459A - Speech recognition system with confidence assessment - Google Patents

Speech recognition system with confidence assessment Download PDF

Info

Publication number
GB2383459A
GB2383459A GB0130464A GB0130464A GB2383459A GB 2383459 A GB2383459 A GB 2383459A GB 0130464 A GB0130464 A GB 0130464A GB 0130464 A GB0130464 A GB 0130464A GB 2383459 A GB2383459 A GB 2383459A
Authority
GB
United Kingdom
Prior art keywords
recognition
speech
hypotheses
hypothesis
confidence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0130464A
Other versions
GB0130464D0 (en
GB2383459B (en
Inventor
Paul St John Brittan
Roger Cecil Ferry Tucker
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HP Inc
Original Assignee
Hewlett Packard Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Co filed Critical Hewlett Packard Co
Priority to GB0130464A priority Critical patent/GB2383459B/en
Publication of GB0130464D0 publication Critical patent/GB0130464D0/en
Priority to US10/322,623 priority patent/US20030120486A1/en
Publication of GB2383459A publication Critical patent/GB2383459A/en
Application granted granted Critical
Publication of GB2383459B publication Critical patent/GB2383459B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

A speech input stream (35) is fed to a first speech recogniser (21). A confidence measure is formed (30) for each recognition hypothesis produced in output by the first speech recogniser and this confidence measure is compared (31) against an acceptability threshold. Where the confidence measure of a recognition hypothesis is below the threshold, the corresponding portion of the speech input is passed to a second speech recogniser (27) and the recognition hypothesis produced is used instead of, or as a supplement to, that output by the first speech recogniser. In a preferred embodiment, the first speech recogniser (21) is a recogniser trained to a particular user whilst the second recogniser (27) is one associated with a particular speech application currently being accessed by the user.

Description

Speech Recognition System and Method Field of the Invention
5 The present invention relates to a speech recognition system and method.
ack=ound of the Invention Speech recognition remains a difficult task to carry out with high accuracy for multiple users over a large vocabulary. Thus, the designer of a speech-based system often has to 10 choose between a speech recognizer that can be trained by a specific user to recognize a wide vocabulary of words, and a speech recognized that is capable of handling input from multiple users, without training, but only in respect of a more limited vocabulary. This choice is affected by whether the intended system is general purpose in nature roguing a large vocabulary or whether the system is only being designed for a specific application 15 where Lineally a more limited vocabulary is sufficient. The choice can be complicated by other considerations such as available processingpower. For example, whilst it is attractive to provide users ecific (user-trained) speech recognizers because of their potentially larger vocabulary and thus wider application, placing such recognizers in mobile equipment intended to be personal to the user is likely to limit the vocabulary that can be recognized 20 because of the restricted processing and memory resources normally available to mobile personal equipment; in contrast, speech recognizers intended to take input from multiple users are usually associated with network applications where large processing resources are available. 25 Because a speech system is fundamentally trying to do what humans do very well, most improvements in speech systems have come about as a result of insights into how humans handle speech input and output. Humans have become very adapt at conveying information through the languages of speech and gesture. When listening to a conversation, humans are continuously building and refining mental models ofthe concepts being convey. These 30 models are derived, not only from what is heard, but also, from how well the hearer thinks they have heard what was spoken. This distinction, between what and how well individuals have heard, is important. A measure of confidence in the ability to hear and distinguish ,,,, -
c
A - between concepts, is critical to understanding and the construction of meaningful dialogue.
In automatic speech recognition, there are clues to the effectiveness of the recognition process. The closer competing recognition hypotheses are to one-another, the more likely 5 there is concision. Likewise, the further the test data is from the trained models, the more likely errors will arise. By extracting such observations during recognition, a separate classifier can be framed on correct hypotheses - such a system is described in the paper Recognition Confidence Scoring for Use in Speech understanding Systems", TJ Hazen, T Buranialc, J PoliLroni, and S Seneff, Proc. ISCA Tutorial and Research Workshop: 10 ASR2000, Paris, France, September 2000. Figure 1 ofthe accompanying drawings depicts the system described in the paper and shows how, during the recognition of a test utterance, a speech recognizer 10, supplied with a vocabulary and grammar 11, is arranged to generate a feature vector 15 that is passed to a separate classifier 16 where a confidence score (or a simply accept/re ect decision) is generated. The downstream speech-system 15 functionality Mere represented by semantic understanding and action block 12) then uses the confidence classifier output in denying the semantic meaning of the output from the speech recognizer 10.
It is an object of the present invention to provide improved speech recognition systems.
Su arv of the Invention According to one aspect of the present invention, there is provided a speech recognition method comprising the steps of: (a) carrying out recognition of a speech input stream using a first speech recognizer to 25 derive respective first recognition hypotheses for successive portions of the input stream; (b) in carrying out step (a), determining a confidence measure for each first recognition hypothesis; (c) at least in respect of Rose portions ofthe speech input stream for which the confidence 30 measure is below an acceptability threshold, passing the speech input stream to a second speech recognizer to produce corresponding second recognition hypotheses; and >... At..,, .. , : ' '
l (c)forming an output recognition-hypothesis stream using recognition hypotheses from the first recognition hypotheses and only those second recognition hypotheses corresponding to the first recognition hypotheses that have a confidence measure below said threshold.
s According to another aspect ofthe present invention, there is provided a speechrecognition system comprising: - a first speech recognizer for carrying out recognition of a speech input stream to derive respective first recognition hypotheses for successive portions of the input stream; 10- an acceptability-determination subsystem for deriving a confidence measure for each first recognition hypothesis and comparing this measure with an acceptability threshold to determine the acceptability of the recognition hypothesis; - a second speech recognized for producing second recognition hypotheses for portions ofthe input stream; 15- a transfer arrangement for passing to the second speech recognizer at least those portions of the speech input stream for which the confidence measure is below said acceptability threshold; and - a control arrangement for forming an output recognition-hypothesis stream using recognition hypotheses from the first recognition hypotheses and only those second 20recognition hypotheses corresponding to the first recognition hypotheses that have a confidence measure below said threshold.
25 Brief Description of the Drawines
Embodiments of the invention will now be described, by way of nonlimiting example, with reference to the accompanying diagranunatic drawings, in which: Figure 1 is a diagram showing a known arrangement of a confidence classifier associated with a speech recognized; 30. Figure 2 is a diagram of a first system embodying the present invention; and Figure 3 is a diagram of a second system embodying the present invention.
O;..,
i Best Mode of Carrying Out the Invention Figure 2 shows a first embodiment of the present invention where a user 2 is using a 5 mobile appliance 20 to interact with a speech application 26 hosted by a remote resource 25. The mobile appliance has a communications interface 24 for communicating speech and data signals over a communications infrastructure 23 with a corresponding corn nunications interface 29 of the remote resource 25.
10 The form of the communications infrastructure 23 can take any form suitable for passing speech and data signals between the mobile appliance and remote resource 25. Thus, the communications infrastructure can comprise, for example, the public internet to which the resource 25 is connected, and a wireless network connected to the internal and communicating with the mobile appliance; in this case, Me speech signals are passed as 15 packetized data, at least over the internet. As another example, the communications infrastructure can simply comprise a voice network with the speech signals passed as voice signals and the data signals handled using modems.
The mobile appliance 20 has a first speech recogniser 21, this recogniser preferably being 20 one which the user can train to recognise the user's normal vocabulary. A second speech recogniser 27 is provided as part ofthe remote resource25, this recogniserpreferablybeing intended for use by multiple users without training and having a vocabulary restricted to that needed for the speech application 26 or a related domain.
25 The first recogniser 21 produces a respective recognition hypothesis for each successive portion ofthe speech input stream 35 from user 2 (these speech portions can be individual phones, words or may be complete phrases). Associated with the first recogniser is a confidence-measure unit 30 that derives a confidence measure for each recognition hypothesis produced by the first recogniser; the unit 30 operates, for example, in a manner 30 similar to that illustrated in Figure l or in any other suitable manner. The confidence measure derived for each recognition hypothesis is then compared in threshold unit 31 to an acceptability threshold to determine whether the recognition hypothesis has reached an .
. s......DTD:
l acceptable minimum confidence level. Where the recognition hypothesis produced by recogniser 21 has a confidence measure below the acceptability threshold, the corresponding speech portion that has been temporarily buffered in buffer 32, is passed (see arrow 37) via the communication interface 24, communications infrastructure 23, and 5 communications interface 29 to the speech recogniser 27 of the remote resource 25 to produce a new recogrution hypothesis for the speech portion concerned.
At least the acceptable recognition hypotheses produced by the mobileappliance recogniser 21 (that is, those that are found to have acceptable confidence measures) are 10 also passed (see arrow 36) to the remote resource 25.
At the remote resource 25, the recognition hypotheses received from the mobile appliance 20 are combined by a combiner 40 with the recognition hypotheses produced by the recogniser27 in respect ofthose speech portions for which the mobile-appliance recogniser 15 21 failed to produce an acceptable recognition hypothesis. The nature of this combining carried out by combiner 40 can be simply the adding of the recognition hypotheses output by recogniser 27 into the stream of hypotheses output byrecogniser 21 (in this case, all the recognition hypotheses produced by recogniser 21 are passed to the remote resource 25) , alternatively, the hypotheses output by recogniser 27 can take the place of the 20 corresponding hypotheses (the unacceptable hypotheses) output by recogniser 21 (in this case, the unacceptable hypotheses produced by recogniser 21 are preferably not passed to the remote resource but are cut out by a unit 33 controlled by threshold 31 as illustrated in Figure 2 - however, it is also possible to pass all the hypotheses from recogniser 21 to the remote resource and to use the combiner 40 to cut out the unacceptable ones on the basis of 25 control data passed to it from threshold unit 31, this control data being indicative of the acceptability of each hypothesis from recogniser 21).
The output ofthe combiner40 is a stream of recognition hypotheses that are passed to the speech application 26 for further processing and action (such action is likely to involve a 30 response to the user 2 using an output channel not here illustrated or described). Where multiple recognition hypotheses are provided for the same speech portion, it is the responsibility of the application 26 to determine which hypothesis to accept (based, for ... ...
example, on a high-level semantic understanding ofthe overall speech passage concerned); in this respect, it will be appreciated that, in practice, the application 26 maybe formed by multiple distinct functional elements that separate the interpretation of the recognition hypotheses from the core application logic.
is The combiner 40 can be arranged to work simply on the basis of serialising the recognition hypotheses received on its two input on a first-in f rst-out basis; however, this rune the risk of a hypothesis produced by the recogniser 27 being included out of order (as judged relative to the order ofthe corresponding speech portions in the input speech stream) either 10 because the recogniser 27 operates too slowly or because of delays in the communications infrastructure 23. It is therefore preferred to label each speech portion in the input stream with a sequence number which is also then used to label the corresponding recognition hypothesis; in this way, the combiner can correctly order the hypotheses it receives, buffering any hypotheses received out of order. In the case where the output recognition 15 hypothesis stream includes multiple hypotheses for the same speech input portion, the sequence numbers are preferably included in the output stream to enable the application 26 to recognise when such multiple hypotheses are present (other ways of indicating this are, of course possible).
20 In overall operation, the Figure 2 embodiment operates to preferentially use the mobile-
appliance speech recogniser 21 but to fall back to using the recogniser 27 at the remote resource when the mobile-appli&nce recogniser 21 produces a recognition hypothesis with an unacceptable confidence measure. By ordy passing speech signals to the remote resource in respect of the unacceptably recognised speech portions, where the speech signals are 25 passed as packetized data over the communications infrastructure the loading ofthe latter is reduced as compared to passing all the speech data.
In a variant of the Figure 2 embodiment, the recognition hypotheses generated by the remote-resource recogniser 27 can also have confidence measures produced for them. In 30 this case, the unacceptable recognition hypotheses produced by the mobile-appliance recogniser 21 are also passed to the remote resource 25 together with their corresponding confidence measures. Where the combiner is arranged simply to include the output from . ., -.,
. ;....DTD:
the fallback recogniser 27 into the stream of hypotheses from recogniser 21, the confidence scores associated with each unacceptable hypothesis from recogniser 21 and the corresponding hypothesis from recogniser 27 are included in the output recognition-
hypothesis stream from combiner 40 to facilitate the determination by application as to 5 which application to use. However, where the combiner is arranged to substitute hypotheses from the Callback recogniser 27 for corre spending ones from the recogniser 21, the combiner40 uses the confidence measures for corresponding hypotheses from the two recognisers to determine whether to accept a recognition hypothesis produced by the recogniser 27 or to use the corresponding hypothesis produced by the recogniser 21 (even 10 though this latter hypothesis failed to reach the acceptability threshold). Of course, for the application 26 or combiner 40 to be able to make use ofthe confidence measures from the two recogmsers, there needs to be a known relationship between the confidence measures produced for the two recognisers (preferably a direct correspondence); this relationship can be predetermined by cawing out comparative tests to calibrate the correspondence 15 between the confidence measures.
Figure 3 shows a second embodiment ofthe present invention; &is embodiment is similar to that of Figure 2 in that a mobile appliance 20 is provided with a speech recogniser 21 20 with associated confidence measure unit 30 and threshold unit 31, and is arranged to interact, via communications infrastructure 23, with a speech application 26 hosted by a remote resource 25 that also hosts a second speech recogniser 27.
However, in the Figure 3 embodiment all Me speech input is passed not onlyto themobile-
25 appliance recogniser 21 but also to the remote-resource recogniser 27. In addition, all the recognition hypotheses produced by the recogniser 21 are passed to a combiner 50 to which the recognition hypotheses produced by the recogniser 27 are also passed. Combiner SO further receives control data from the mobile appliance 20 in the form of acceptability data from the threshold unit 31 indicating whether the recognition hypotheses produced by We 30 recogniser 21 have respective confidence measures that reach the acceptability threshold.
_ J; _;
li The combiner 50 is arranged to replace or supplement the recognition hypotheses from the mobile-appliance recogniser that have unacceptable confidence measures, with the corresponding recognition hypotheses from the recogniser 27. As with the Figure 2 embodiment, coordination data in the form of sequence labels are preferably used to 5 identify the recognition hypotheses thereby to facilitate the operation ofthe combiner 50 in correctly sequencing the recognition hypotheses from the two recognisers.
Again, as discussed above in relation to the Figure 2 embodiment, in a variant of the Figure 3 embodiment the remote-resource recogniser 27 can have an associated confidence 10 measure unit and the combiner 50 can be arranged either to include the confidence measures in the output recognition hypotheses stream (where the unacceptable hypotheses from recogniser 21 are being supplemented by hypotheses from fallbackrecogniser 27), or to use the confidence measures to only substitute a recognition hypothesis produced by the recogniser 27 for a corresponding below-acceptable hypothesis from the recogniser 21 15 where the hypothesis produced by recogniser 27 has a better confidence measure than that of the hypothesis produced by recogniser 21.
It will be appreciated that many other variants are possible to the abovedescribed 20 embodiments. For example, the equipment incorporating recogniser 21 need not be a mobile appliance and could, for example, be a desktop computer. Furthermore, the resource including the recogniser 27 can be close to the equipment including recogniser 21 being, for example, a server on the same LAN or a resource accessible over a short-range wireless link; indeed, the recognisers 21 and 27 could be in different items of mobile 25 personal equipment (such as in a mobile phone and a PDA respectively) inter communicating via a personal area network.
The speech application 26 need not be co-located with the recogniser 27 and the combiner can be located anywhere that is convenient including with the recogniser 21, with the 30 recogniser 27 or with the application 26. Thus, for example, the recogniser 21 may be incorporated in a mobile phone along with a speech application whilst the Callback .., . . V.JO
, = - recogniser 27 is in a PDA carried by the same person as the mobile phone and communicating with the latter via a Bluetooth short-range radio link.
Multiple items of personal equipment each with a recogniser 21 can, of course, interact 5 with the same fallback recogniser 27. Furthermore, multiple fallback recognisers can be provided in a parallel arrangement each arranged to receive the speech input passed on from mobile appliance 20 (or other item incorporating recogniser 21); in this case, the output of all the fallback recognisers are passed to the combiner which may choose the best recognition hypothesis (for example, based on coordinated confidence scores produced by 10 confidence measure units associated with the fallback recognisers) or forward all hypotheses to the application.
It is also possible to provide a cascade of fallback recognisers. Thus, if the fallback recogniser 27 fails to produce a recognition hypothesis with an acceptable confidence 15 score (as judged by a confidence measure unit associated with recogniser27) for a speech portion unacceptably recognised by recogniser 21, then the recognition hypothesis output from a further recogniser can be taken into account for the speech portion concerned. Such a cascading of fallback recognisers can have any depth.
20 Each confidence measure produced by unit 30 can be a single parameter or can be made up of several parameters; in this latter case, judging whether the acceptability threshold has been met can be complicated as a good score for one parameter may be considered to compensate for a belowacceptable score in respect of another parameter. The threshold unit 31 can be progrnmrned with appropriate rules for determining whether any particular 25 combination of parameter values is sufficient to render the corresponding hypothesis as acceptable.
It will be appreciated that the functional blocks making up the mobile appliance 20 and remote resource 25 in Figures 2 and 3 will generally be implemented in program code non 30 by a corresponding processor although, of course, equivalent hardware entities can be built.

Claims (29)

- - CLAIMS
1. A speech recognition method comprising the steps of: 5 (a) carrying out recognition of a speech input stream using a first speech recognizer to derive respective first recognition hypotheses for successive portions of the input stream; (b) in carrying out step (a), determining a confidence measure for each first recognition hypothesis; 10 (c) at least in respect of those portions of the speech input stream for which the confidence measure is below an acceptability threshold, passing the speech input stream to a second speech recognizer to produce corresponding second recognition hypotheses; aIld (d)formmg an output recognition-hypothesis stream using recognition hypotheses fiom the 15 first recognition hypotheses and only those second recognition hypotheses corresponding to the first recognition hypotheses that have a confidence measure below said threshold.
2. A method according to claim 1, wherein the output recognitionhypothesis stream 20 comprises the first recognition hypotheses but with at least some of the hypotheses that have a confidence measure below said threshold replaced by the corresponding second hypotheses.
3. A method according to claim 1, wherein the output recognition-hypothesis stream 25 comprises all said first recognition hypotheses and at least some ofthe second recognition hypotheses corresponding to the first recognition hypotheses that have a confidence measure below said threshold.
4. A method according to any one of the preceding claims, wherein the first speech 30 recognizer is local to a user and the second speech recognizer is remote from the user, step (c) involving passing speech input portions to the second speech recognized over a communications infrastructure.
J - _.'
J. _.:
5. A method according to claim 4, wherein the second speech recognizer is part of a remote resource further including a speech application to which said output recognition-
hypothesis stream is supplied, step (d) being carried out at the remote resource with at least 5 the first recognition hypotheses that have corresponding confidence measures which reach said acceptability threshold, being passed to the remote resource.
6. A method according to any one of claims 1 to 3, wherein the first and second speech recognizers are included in respective items of mobile personal equipment, step (c) 10 involving passing speech input portions to the second speech recognized over a short-range communications link.
7. A method according to claim 6, wherein the item of equipment including the first speech recognized further includes a speech application to which said output recognition 15 hypothesis stream is supplied.
8. A method according to any one of the preceding claims, comprising the further steps of: (i) determining a confidence measure for each second recognition hypothesis; (ii) at least in respect of those portions of the speech input stream for which the confidence 20 measures of the corresponding second recognition hypotheses are below a second acceptabilitythreshold, passing the speech input stream to a third speech recognizer to produce corresponding third recognition hypotheses; the forming ofthe output recognition-hypothesis stream in step (d) using at least some of the third recognition hypotheses for which the corresponding first and second recognition 25 hypotheses have associated confidence measures below their respective acceptability thresholds.
9. A method according to any one of the preceding claims, wherein the first speech recognizer is trained to a user's voice and the second speech recognized is intended to 30 recognize a specific domain or application vocabulary spoken by different users without being training to their voices.
. Hi, ' '. '!
10. A method according to any one of the preceding claims, wherein in step (c) only those portions of the speech input stream for which the corresponding first recognition hypotheses have confidence measures below the acceptability threshold are passed to the second speech recognized.
11. A method according to claiml, wherein in step (c) only those portions ofthe speech input stream for which the corresponding first recognition hypotheses have confidence measures below the acceptability threshold are passed to the second speech recognized, and confidence measures are produced for the resultant second reco tionLypotheses; step (d) 10 involving including all the first and second recognition hypotheses in the output recognition-hypothesis stream together with confidence measures at least for the second recognition hypotheses and the corresponding first recognition hypotheses.
12. A method according to claiml, wherein in step (c) only those portions ofthe speech 15 input stream for which the corresponding first recognition hypotheses have confidence measures below the acceptability threshold are passed to the second speech recognized, and confidence measures are produced for the resultant second recognition hypotheses; step (d) involving replacing a first recognition hypothesis with the corresponding second recognition hypothesis only when the confidence measures associated with the two 20 hypotheses indicate at least a degree more confidence in the second recognition hypothesis as compared to the corresponding first recognition hypothesis.
13. A method according to claim 1, wherein in step (c) all portions of the speech input stream are passed to the second speech recogruzer, and in step (d) all those first recognition 25 hypotheses that have confidence measures below said acceptability threshold are replaced by the corresponding second hypotheses in the output recognition-hypothesis stream.
14. A method according to claim 1, wherein in step (c) all portions of the speech input stream are passed to the second speech recognizer and confidence measures are produced 30 for the second recognition hypotheses, step (d) involving replacing a first recognition hypothesis with a corresponding second recognition hypothesis in the output recognition-
hypothesis stream only when the confidence measures associated with the two hypotheses !. ,: :..:. A) ';..
.,
: indicate at least a degree more confidence in the second recognition hypothesis as compared to the corresponding first recognition hypothesis.
15. A method according to claim I, wherein in step (c) all portions of the speech input 5 stream are passed to the second speech recognizer and confidence measures are produced for the second recognition hypotheses, step (d) involving including in the output recognition-hypothesis stream: - all the first recognition hypotheses, - the second recognition hypotheses for which the confidence measures of the 10 corresponding first recognition hypotheses are below their acceptability threshold, and - the confidence measures at least for the included second recognition hypotheses and the corresponding fast recognition hypotheses.
16. A speech recognition system comprising: 15 - a first speech recognizer for carrying out recognition of a speech input stream to derive respective fast recognition hypotheses for successive portions of the input stream; an acceptability-determination subsystem for deriving a confidence measure for each first recognition hypothesis and comparing this measure with an acceptability threshold to determine the acceptability of the recognition hypothesis; 20 - a second speech recog uzer for producing second recognition hypotheses for portions ofthe input stream; - a transfer arrangement for passing to the second speech recognizer at least those portions of the speech input stream for which the confidence measure is below said acceptability threshold; and 25 - a control arrangement for forming an output recognition-hypothesis stream using recognition hypotheses from the first recognition hypotheses and only those second recognition hypotheses corresponding to the first recognition hypotheses that have a confidence measure below said threshold.
30
17. A system according to claim 16, wherein the control arrangement is operative to form the output recognition-hypothesis stream by using the first recognition hypotheses but with at least some of the hypotheses that have a confidence measure below said threshold ..... )., , -.,, J
J
replaced by the corresponding second hypotheses.
18. A system according to claim 16, wherein the control arrangement is operative to form the output recognition-hypothesis stream by including all said first recognition hypotheses 5 and at least some of the second recognition hypotheses corresponding to the first recognition hypotheses that have a confidence measure below said threshold.
19. A system according to any one of claims 16 to 18, wherein the first speech recognized is local to a user and the second speech recognizer is remote from the user, the transfer 10 arrangement being operative to pass speech input portions to the second speech recognized over a communications infrastructure.
20. A system according to claim 19, further comprising a remote resource comprising said second speech recognizes and a speech application to which said output recognition 15 hypothesis stream is supplied, the transfer arrangement being operative to pass to the remote speech-based resource at least the first recognition hypotheses that have corresponding confidence scores which reach said acceptabilitythreshold, and the control arrangement comprising means for forming the output recognition-hypothesis stream at the remote resource.
21. A system according to any one of claims 16 to 18, further comprising first and second items of personal mobile equipment respectively including said first and second recognizers, the said first and second items of equipment each filer including a short-
range communication subsystem by which speech input portions can be passed from the 25 first to the second item of equipment.
22. A system according to claim 21, wherein the first item of equipment further includes a speech application, the control arrangement being operative to pass said output recognition-hypothesis stream to the speech application.
23. A system according to any one of claims 16 to 22, wherein the first speech recognizer is trainable to a user's voice and the second speech recognizer is intended to recognize a ., : -;
. 3 L Jet t _ J sea ..
cod :^ specific domain or application vocabulary spoken by different users without being training to their voices.
24. A system according to any one of claims 16 to 23, wherein the transfer arrangement is 5 operative to pass to the second speech recognized only those portions of the speech input stream for which the confidence measure is below the acceptability threshold.
25. A system according to claim 16, wherein the transfer arrangement is operative to pass to the second speech recognizes only those portions of the speech input stream for which 10 the confidence measure is below the acceptability threshold, the system further comprising a further acceptability-determination subsystem for determining a confidence measure for each second recognition hypothesis, and the control arrangement being operative to form the output recognition-hypothesis stream by including all the first and second recognition hypotheses together with confidence measures at least for the second recognition 15 hypotheses and the corresponding first recognition hypotheses.
26. A system according to claim 16, wherein the transfer arrangement is operative to pass to the second speech recognized only those portions of the speech input stream for which the confidence measure is below the acceptability threshold, the system further comprising 20 alurtheracceptability-determination subsystem for determine confidence measure for each second recognition hypothesis, and the control arrangement being operative to form the output recognition-hypothesis stream by taking the first recognition hypotheses and replacing a first recognition hypothesis with the corresponding second recognition hypothesis only when the confidence measures associated with the two hypotheses indicate 25 at least a degree more confidence in the second recognition hypothesis as compared to the corresponding first recognition hypothesis.
27. A system according to claim 16, wherein the transfer arrangement is operative to pass all portions of the speech input stream to the second speech recognized, the control 30 arrangement being operative to form the output recognition-hypothesis stream by replacing all those first recognition hypotheses that have confidence measures below said acceptability threshold by the corresponding second hypotheses.
:. '..!
,. .. }:,...
28. A system according to claim 16, further comprising a further acceptability-
determination subsystem for determining a confidence measure for each second recognition hypothesis, the transfer arrangement being operative to pass all portions ofthe 5 speech input stream to the second speech recognizer, and the control arrangement being operative to form the output recognition-hypothesis stream by replacing a first recognition hypothesis with a corresponding second recognition hypothesis only when the confidence measures associated with the two hypotheses indicate at least a degree more confidence in the second recognition hypothesis as compared to the corresponding first recognition 10 hypothesis.
29. A system according to claim 16, further comprising a further acceptability-
detennination subsystem for detenn ng a confidence measure for each second recognition hypothesis, the transfer arrangement being operative to pass all portions ofthe 15 speech input stream to the second speech recognized, and the control arrangement being operative to form the output recognition-hypothesis stream by including: - all the first recognition hypotheses, - the second recognition hypotheses for which the confidence measures of the corresponding first recognition hypotheses are below their acceptabilityth eshold, and 20 - the confidence measures at least for the included second recognition hypotheses and the corresponding first recognition hypotheses.
.. .,.,, ......CLME:
GB0130464A 2001-12-20 2001-12-20 Speech recognition system and method Expired - Fee Related GB2383459B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB0130464A GB2383459B (en) 2001-12-20 2001-12-20 Speech recognition system and method
US10/322,623 US20030120486A1 (en) 2001-12-20 2002-12-19 Speech recognition system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0130464A GB2383459B (en) 2001-12-20 2001-12-20 Speech recognition system and method

Publications (3)

Publication Number Publication Date
GB0130464D0 GB0130464D0 (en) 2002-02-06
GB2383459A true GB2383459A (en) 2003-06-25
GB2383459B GB2383459B (en) 2005-05-18

Family

ID=9928013

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0130464A Expired - Fee Related GB2383459B (en) 2001-12-20 2001-12-20 Speech recognition system and method

Country Status (2)

Country Link
US (1) US20030120486A1 (en)
GB (1) GB2383459B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005119642A2 (en) 2004-06-02 2005-12-15 America Online, Incorporated Multimodal disambiguation of speech recognition
EP2329491A2 (en) * 2008-08-29 2011-06-08 Multimodal Technologies, Inc. Hybrid speech recognition
WO2012116110A1 (en) * 2011-02-22 2012-08-30 Speak With Me, Inc. Hybridized client-server speech recognition
EP2522012A1 (en) * 2010-05-27 2012-11-14 Nuance Communications, Inc. Efficient exploitation of model complementariness by low confidence re-scoring in automatic speech recognition
US20140358537A1 (en) * 2010-09-30 2014-12-04 At&T Intellectual Property I, L.P. System and Method for Combining Speech Recognition Outputs From a Plurality of Domain-Specific Speech Recognizers Via Machine Learning
US9626355B2 (en) 1998-12-04 2017-04-18 Nuance Communications, Inc. Contextual prediction of user words and user actions
US9786273B2 (en) 2004-06-02 2017-10-10 Nuance Communications, Inc. Multimodal disambiguation of speech recognition

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7003463B1 (en) 1998-10-02 2006-02-21 International Business Machines Corporation System and method for providing network coordinated conversational services
US7712053B2 (en) 1998-12-04 2010-05-04 Tegic Communications, Inc. Explicit character filtering of ambiguous text entry
US7720682B2 (en) 1998-12-04 2010-05-18 Tegic Communications, Inc. Method and apparatus utilizing voice input to resolve ambiguous manually entered text input
US7679534B2 (en) 1998-12-04 2010-03-16 Tegic Communications, Inc. Contextual prediction of user words and user actions
US8583440B2 (en) 2002-06-20 2013-11-12 Tegic Communications, Inc. Apparatus and method for providing visual indication of character ambiguity during text entry
DE10341305A1 (en) * 2003-09-05 2005-03-31 Daimlerchrysler Ag Intelligent user adaptation in dialog systems
US8589156B2 (en) * 2004-07-12 2013-11-19 Hewlett-Packard Development Company, L.P. Allocation of speech recognition tasks and combination of results thereof
US9224394B2 (en) * 2009-03-24 2015-12-29 Sirius Xm Connected Vehicle Services Inc Service oriented speech recognition for in-vehicle automated interaction and in-vehicle user interfaces requiring minimal cognitive driver processing for same
US20060095266A1 (en) * 2004-11-01 2006-05-04 Mca Nulty Megan Roaming user profiles for speech recognition
US7827032B2 (en) * 2005-02-04 2010-11-02 Vocollect, Inc. Methods and systems for adapting a model for a speech recognition system
US8200495B2 (en) 2005-02-04 2012-06-12 Vocollect, Inc. Methods and systems for considering information about an expected response when performing speech recognition
US7865362B2 (en) * 2005-02-04 2011-01-04 Vocollect, Inc. Method and system for considering information about an expected response when performing speech recognition
US7949533B2 (en) * 2005-02-04 2011-05-24 Vococollect, Inc. Methods and systems for assessing and improving the performance of a speech recognition system
US7895039B2 (en) * 2005-02-04 2011-02-22 Vocollect, Inc. Methods and systems for optimizing model adaptation for a speech recognition system
US8041570B2 (en) * 2005-05-31 2011-10-18 Robert Bosch Corporation Dialogue management using scripts
US20070294122A1 (en) * 2006-06-14 2007-12-20 At&T Corp. System and method for interacting in a multimodal environment
US8214208B2 (en) * 2006-09-28 2012-07-03 Reqall, Inc. Method and system for sharing portable voice profiles
US8626152B2 (en) 2008-01-31 2014-01-07 Agero Connected Sevices, Inc. Flexible telematics system and method for providing telematics to a vehicle
EP2216775B1 (en) * 2009-02-05 2012-11-21 Nuance Communications, Inc. Speaker recognition
US8346549B2 (en) * 2009-12-04 2013-01-01 At&T Intellectual Property I, L.P. System and method for supplemental speech recognition by identified idle resources
US9070360B2 (en) * 2009-12-10 2015-06-30 Microsoft Technology Licensing, Llc Confidence calibration in automatic speech recognition systems
US8983845B1 (en) * 2010-03-26 2015-03-17 Google Inc. Third-party audio subsystem enhancement
US10032455B2 (en) 2011-01-07 2018-07-24 Nuance Communications, Inc. Configurable speech recognition system using a pronunciation alignment between multiple recognizers
US8914290B2 (en) 2011-05-20 2014-12-16 Vocollect, Inc. Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment
US20130080172A1 (en) * 2011-09-22 2013-03-28 General Motors Llc Objective evaluation of synthesized speech attributes
US9620122B2 (en) * 2011-12-08 2017-04-11 Lenovo (Singapore) Pte. Ltd Hybrid speech recognition
KR101961139B1 (en) 2012-06-28 2019-03-25 엘지전자 주식회사 Mobile terminal and method for recognizing voice thereof
US9715879B2 (en) * 2012-07-02 2017-07-25 Salesforce.Com, Inc. Computer implemented methods and apparatus for selectively interacting with a server to build a local database for speech recognition at a device
KR20150063423A (en) 2012-10-04 2015-06-09 뉘앙스 커뮤니케이션즈, 인코포레이티드 Improved hybrid controller for asr
JP5677650B2 (en) * 2012-11-05 2015-02-25 三菱電機株式会社 Voice recognition device
JP5921756B2 (en) * 2013-02-25 2016-05-24 三菱電機株式会社 Speech recognition system and speech recognition device
US11393461B2 (en) 2013-03-12 2022-07-19 Cerence Operating Company Methods and apparatus for detecting a voice command
US9978395B2 (en) 2013-03-15 2018-05-22 Vocollect, Inc. Method and system for mitigating delay in receiving audio stream during production of sound from audio stream
DE102014200570A1 (en) * 2014-01-15 2015-07-16 Bayerische Motoren Werke Aktiengesellschaft Method and system for generating a control command
US9552817B2 (en) * 2014-03-19 2017-01-24 Microsoft Technology Licensing, Llc Incremental utterance decoder combination for efficient and accurate decoding
JP5996152B2 (en) * 2014-07-08 2016-09-21 三菱電機株式会社 Speech recognition system and speech recognition method
US9911410B2 (en) * 2015-08-19 2018-03-06 International Business Machines Corporation Adaptation of speech recognition
US10044798B2 (en) 2016-02-05 2018-08-07 International Business Machines Corporation Context-aware task offloading among multiple devices
US10484484B2 (en) 2016-02-05 2019-11-19 International Business Machines Corporation Context-aware task processing for multiple devices
WO2017138934A1 (en) 2016-02-10 2017-08-17 Nuance Communications, Inc. Techniques for spatially selective wake-up word recognition and related systems and methods
EP3754653A1 (en) 2016-06-15 2020-12-23 Cerence Operating Company Techniques for wake-up word recognition and related systems and methods
US10714121B2 (en) 2016-07-27 2020-07-14 Vocollect, Inc. Distinguishing user speech from background speech in speech-dense environments
CN111971742B (en) 2016-11-10 2024-08-20 赛轮思软件技术(北京)有限公司 Language independent wake word detection
US10971157B2 (en) 2017-01-11 2021-04-06 Nuance Communications, Inc. Methods and apparatus for hybrid speech recognition processing
US10607606B2 (en) 2017-06-19 2020-03-31 Lenovo (Singapore) Pte. Ltd. Systems and methods for execution of digital assistant
US20190043487A1 (en) * 2017-08-02 2019-02-07 Veritone, Inc. Methods and systems for optimizing engine selection using machine learning modeling
US11087766B2 (en) * 2018-01-05 2021-08-10 Uniphore Software Systems System and method for dynamic speech recognition selection based on speech rate or business domain
US11322148B2 (en) * 2019-04-30 2022-05-03 Microsoft Technology Licensing, Llc Speaker attributed transcript generation
US11443734B2 (en) * 2019-08-26 2022-09-13 Nice Ltd. System and method for combining phonetic and automatic speech recognition search

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2348035A (en) * 1999-03-19 2000-09-20 Ibm Speech recognition system
EP1158491A2 (en) * 2000-05-23 2001-11-28 Vocalis Limited Personal data spoken input and retrieval

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2457A (en) * 1842-02-12 Machine fob cutting shingles
US5144672A (en) * 1989-10-05 1992-09-01 Ricoh Company, Ltd. Speech recognition apparatus including speaker-independent dictionary and speaker-dependent
AU5803394A (en) * 1992-12-17 1994-07-04 Bell Atlantic Network Services, Inc. Mechanized directory assistance
TW323364B (en) * 1993-11-24 1997-12-21 At & T Corp
ZA948426B (en) * 1993-12-22 1995-06-30 Qualcomm Inc Distributed voice recognition system
US5666400A (en) * 1994-07-07 1997-09-09 Bell Atlantic Network Services, Inc. Intelligent recognition
US5687287A (en) * 1995-05-22 1997-11-11 Lucent Technologies Inc. Speaker verification method and apparatus using mixture decomposition discrimination
US5719921A (en) * 1996-02-29 1998-02-17 Nynex Science & Technology Methods and apparatus for activating telephone services in response to speech
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files
US6122613A (en) * 1997-01-30 2000-09-19 Dragon Systems, Inc. Speech recognition using multiple recognizers (selectively) applied to the same input sample
US5966691A (en) * 1997-04-29 1999-10-12 Matsushita Electric Industrial Co., Ltd. Message assembler using pseudo randomly chosen words in finite state slots
US7058573B1 (en) * 1999-04-20 2006-06-06 Nuance Communications Inc. Speech recognition system to selectively utilize different speech recognition techniques over multiple speech recognition passes
US6789061B1 (en) * 1999-08-25 2004-09-07 International Business Machines Corporation Method and system for generating squeezed acoustic models for specialized speech recognizer
US7016835B2 (en) * 1999-10-29 2006-03-21 International Business Machines Corporation Speech and signal digitization by using recognition metrics to select from multiple techniques
US6836758B2 (en) * 2001-01-09 2004-12-28 Qualcomm Incorporated System and method for hybrid voice recognition
US6898567B2 (en) * 2001-12-29 2005-05-24 Motorola, Inc. Method and apparatus for multi-level distributed speech recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2348035A (en) * 1999-03-19 2000-09-20 Ibm Speech recognition system
EP1158491A2 (en) * 2000-05-23 2001-11-28 Vocalis Limited Personal data spoken input and retrieval

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9626355B2 (en) 1998-12-04 2017-04-18 Nuance Communications, Inc. Contextual prediction of user words and user actions
WO2005119642A2 (en) 2004-06-02 2005-12-15 America Online, Incorporated Multimodal disambiguation of speech recognition
EP1751737A2 (en) * 2004-06-02 2007-02-14 America Online, Inc. Multimodal disambiguation of speech recognition
EP1751737A4 (en) * 2004-06-02 2008-10-29 America Online Inc Multimodal disambiguation of speech recognition
US9786273B2 (en) 2004-06-02 2017-10-10 Nuance Communications, Inc. Multimodal disambiguation of speech recognition
EP2329491A4 (en) * 2008-08-29 2012-11-28 Multimodal Technologies Llc Hybrid speech recognition
EP2329491A2 (en) * 2008-08-29 2011-06-08 Multimodal Technologies, Inc. Hybrid speech recognition
EP2522012A1 (en) * 2010-05-27 2012-11-14 Nuance Communications, Inc. Efficient exploitation of model complementariness by low confidence re-scoring in automatic speech recognition
US20140358537A1 (en) * 2010-09-30 2014-12-04 At&T Intellectual Property I, L.P. System and Method for Combining Speech Recognition Outputs From a Plurality of Domain-Specific Speech Recognizers Via Machine Learning
WO2012116110A1 (en) * 2011-02-22 2012-08-30 Speak With Me, Inc. Hybridized client-server speech recognition
US9674328B2 (en) 2011-02-22 2017-06-06 Speak With Me, Inc. Hybridized client-server speech recognition
US20170229122A1 (en) * 2011-02-22 2017-08-10 Speak With Me, Inc. Hybridized client-server speech recognition
US10217463B2 (en) * 2011-02-22 2019-02-26 Speak With Me, Inc. Hybridized client-server speech recognition

Also Published As

Publication number Publication date
GB0130464D0 (en) 2002-02-06
US20030120486A1 (en) 2003-06-26
GB2383459B (en) 2005-05-18

Similar Documents

Publication Publication Date Title
GB2383459A (en) Speech recognition system with confidence assessment
US10600414B1 (en) Voice control of remote device
US20200251107A1 (en) Voice control of remote device
KR101859708B1 (en) Individualized hotword detection models
US10074363B2 (en) Method and apparatus for keyword speech recognition
US11990133B2 (en) Automated calling system
US7437291B1 (en) Using partial information to improve dialog in automatic speech recognition systems
JP4838351B2 (en) Keyword extractor
CN101548313B (en) Voice activity detection system and method
US20190355352A1 (en) Voice and conversation recognition system
US8862468B2 (en) Leveraging back-off grammars for authoring context-free grammars
CN112581938B (en) Speech breakpoint detection method, device and equipment based on artificial intelligence
CN103377651A (en) Device and method for automatic voice synthesis
JPWO2009104332A1 (en) Utterance division system, utterance division method, and utterance division program
Fujie et al. Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system.
Fügen et al. Tight coupling of speech recognition and dialog management-dialog-context dependent grammar weighting for speech recognition.
CN107886940A (en) Voiced translation processing method and processing device
US7853451B1 (en) System and method of exploiting human-human data for spoken language understanding systems
KR20110065916A (en) Interpretation system for error correction and auto scheduling
KR20180127020A (en) Natural Speech Recognition Method and Apparatus
KR20210000802A (en) Artificial intelligence voice recognition processing method and system
US11563708B1 (en) Message grouping
WO2023107244A1 (en) Multiple wakeword detection
US12001260B1 (en) Preventing inadvertent wake in a speech-controlled device
US11741989B2 (en) Non-verbal utterance detection apparatus, non-verbal utterance detection method, and program

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20111220