EP2005416A2 - Verfahren und systeme zur beurteilung und verbesserung der leistung eines spracherkennungssystems - Google Patents

Verfahren und systeme zur beurteilung und verbesserung der leistung eines spracherkennungssystems

Info

Publication number
EP2005416A2
EP2005416A2 EP07759805A EP07759805A EP2005416A2 EP 2005416 A2 EP2005416 A2 EP 2005416A2 EP 07759805 A EP07759805 A EP 07759805A EP 07759805 A EP07759805 A EP 07759805A EP 2005416 A2 EP2005416 A2 EP 2005416A2
Authority
EP
European Patent Office
Prior art keywords
word
performance
recognition
user
utterance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP07759805A
Other languages
English (en)
French (fr)
Inventor
Keith Braho
Jeffrey Pike
Amro El-Jaroudi
Lori Pike
Michael Laughery
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vocollect Inc
Original Assignee
Vocollect Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/539,456 external-priority patent/US7827032B2/en
Priority claimed from US11/688,920 external-priority patent/US7895039B2/en
Priority claimed from US11/688,916 external-priority patent/US7949533B2/en
Application filed by Vocollect Inc filed Critical Vocollect Inc
Priority to EP20130187267 priority Critical patent/EP2685451A3/de
Priority to EP13187263.2A priority patent/EP2711923B1/de
Priority to EP19203259.7A priority patent/EP3627497A1/de
Publication of EP2005416A2 publication Critical patent/EP2005416A2/de
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Definitions

  • the invention relates to speech recognition and more particularly to assessing and improving the performance of a speech recognition system.
  • Speech recognition systems have simplified many tasks particularly for a user in the workplace by permitting the user to perform hands-free communication with a computer as a convenient alternative to communication via conventional peripheral input/output devices.
  • a warehouse or inventory worker user could wear a wireless wearable terminal having a speech recognition system that permits communication between the user and a centra! computer system so that the user can receive work assignments and instructions from the central computer system.
  • the user could also communicate to the central computer system information such as data entries, questions, work progress reports and work condition reports.
  • a user can be directed (through an instruction from the central computer system or visually by means of a display) to a particular work area that is labeled with a multiple-digit number (check- digit) such as "1-2-3" and asked to speak the check-digit. The user would then respond with the expected response "1 -2-3". (Note that a "check-digit" can be any word or sequence of words, and is not limited to digits.) [0004] Other such examples of communication between a user and speech recognition system are described in U.S. Patent Application No.
  • 2003/0154075 and include environments where a wearable or portable terminal is not required such as in an automobile or a telephone system; environments that are not in a warehouse such as in a managed care home, nursing home, pharmacy, retail store, and office; voice-controlled information processing systems that process, for example, credit card numbers, bank account numbers, social security numbers and personal identification numbers; other applications such as command and control, dictation, data entry and information retrieval applications; and speech recognition system features such as user verification, password verification, quantity verification, and repeat/acknowledge messages.
  • the inventions presented here can be used in those applications. In using a speech recognition system, manuaS data entry is eliminated or at the least reduced, and users can perform their tasks faster, more accurately and more productiveiy. Exampie Speech Recognition Errors
  • Errors can be made by a speech recognition system however, due to for example background noise or a user's unfami ⁇ arity or misuse of the system.
  • the errors made by a system can be classified into various types.
  • a metric an error rate (which can be defined as the percentage or ratio of observations with speech recognition errors over the number of observations of the system and which can be determined over a window of time and/or data and per user) is often used to evaluate the number and types of errors made by a speech recognition system and is thus useful in evaluating the performance of the system.
  • An observation can be defined as any speech unit by which speech recognition may be measured.
  • An observation may be a syllable, a phoneme, a single word or multiple words (such as in a phrase, utterance or sentence).
  • the observations input to the system may be counted or the observations output by the system may be counted.
  • an accuracy rate (which can be defined as the percentage or ratio of correct observations of the system over the number of observations of the system and which can be determined over a window of time and/or date and per user) can be used to evaluate the performance of the system.
  • Recognition rates can be defined in a variety of other ways, such as a count of observations with errors divided by a length of time, a count of correct observations divided by a period of time, a count of observations with errors divided by a number of transactions, a count of correct observations divided by a number of transactions, a count of observations with errors after an event has occurred (such as apparatus being powered on or a user starting a task), or a count of correct observations after an event has occurred, to name a few. Therefore, a recognition rate (which can be an error rate, an accuracy rate, a rate based upon the identification or counting of observations with errors or correct observations, or other type of recognition rate known to those skiiied in the art) is useful in evaluating the performance of the system.
  • a recognition rate can be determined for a word or for various words among a set of words, or for a user or multiple users. Identification of a system's errors can be done by comparing a reference transcription of a user's input speech to the hypothesis generated by the system (the system's interpretation of the user's input speech). Furthermore, as known to those skilled in the art, the comparison can be time-aligned or text-aligned.
  • One type of speech recognition error is a substitution, in which the speech recognition system's hypothesis replaces a word that is in the reference transcription with an incorrect word. For example, if system recognizes "1 -5-3" in response to the user's input speech "1 -2-3", the system made one substitution; substituting the '5' for the '2 ⁇
  • Another type of speech recognition error is a deletion, in which the speech recognition system's hypothesis lacks a word that is in the reference transcription. For example, if system recognizes "1-3" in response to the user's input speech "1-2- 3", the system deleted one word, the '2'.
  • One variation of the deletion error is a deletion due to recognizing garbage, in which the system erroneously recognizes a garbage model instead of recognizing an actual word.
  • Another variation of the deletion error is a deletion due to a speech misdetection, where the system fails to detect that the audio input to the system contains speech and as a result does not submit features of the audio input to the system's search algorithm.
  • Another type of deletion occurs when the system rejects a correct observation due to a low confidence score.
  • deletion error is a deietion due to a rejected substitution, where a search algorithm of the speech recognition generates a substitution, which is later rejected by an acceptance aSgorithm of the system.
  • a merge the speech recognition system recognizes two spoken words as one. For example, the user says "four-two" and the system outputs "forty".
  • a garbage model refers to the general class of models for sounds that do not convey information. Examples may include for example models of breath noises, "urn”, “uh”, sniffles, wind noise, the sound of a pallet dropping, the sound of a car door slamming, or other general model such as a wildcard that is intended to match the input audio for any audio that doesn't match a model in the library of models.
  • insertion errors are aiso common when noise is mistakenly recognized as speech.
  • an error or correct observation in contrast to determining that an actual error or correct observation occurred by comparing a system's hypothesis to a reference transcript, an error or correct observation can be estimated or deemed to have occurred based on system behavior and user behavior.
  • This application describes methods for determining a recognition rate, wherein the recognition rate is an estimate based on estimated errors or estimated correct observations deemed to have occurred after evaluating system and user behavior. Accordingly, one can estimate or evaluate the performance level of the speech recognition system by detecting in this manner the various errors committed by or correct observations of the system.
  • One way to detect a speech recognition error is based on feedback a user provides to the speech recognition system. Feedback can be requested by the speech recognition system.
  • the system could ask the user to confirm the system's hypothesis by asking the user for example "Did you say 1 -5-3?", and if the user responds "no", it indicates that the system made an error recognizing "1-5-3".
  • Another type of feedback is based on a user's emotion detected by speech recognition. For example, if the system recognizes in the user's input speech that the user is sighing or saying words indicating aggravation, it may indicate that an error occurred.
  • Yet another type of feedback is based on a user's correction command to the system, such as the user speaking "back-up” or "erase”, or the user identifying what word was spoken (which could be from a list of possible words displayed by the system).
  • a performance assessment does not only provide helpful information to a user or a supervisor; a performance assessment can be used to improve the adaptation of a speech recognition system.
  • a speech recognition system can improve its performance over time, as more speech samples are processed by a system, by improving its acoustic models through training or other learning or adaptation algorithms. At the same time, it is useful to prevent the system from adapting in an undesirable way, thereby resulting in a system that performs worse than it did prior to adaptation or a system that degrades over time.
  • Adapting models can use significant computational, storage, and/or power resources to create the adapted models and radio transmission energy to transmit the new models to a server.
  • Example embodiments of the invention disclosed herein can control the adaptation of a speech recognition system to avoid inefficient use of resources and to avoid adapting away from well-performing models, by controlling or adjusting adaptation based on a performance assessment of the system.
  • FIG. 1 illustrates a illustrates a view of multiple portable terminals, each used by a user and each being monitored by a management console, according to an example embodiment of the invention
  • FlG. 2 illustrates a schematic view of a speech recognition system, according to an example embodiment of the invention
  • FIG. 3 is a flowchart illustrating a method for assessing and improving the performance of a speech recognition system, according to an example embodiment of the invention
  • FIG. 4 illustrates a display on a portable terminal, according to an example embodiment of the invention
  • FlG. 5 illustrates a display on a management console, according to an example embodiment of the invention.
  • FIG. 6 is a flowchart illustrating a method for controlling model adaptation based on a performance assessment, according to an example embodiment of the invention
  • FIG. 7 is a flowchart illustrating a method for model adaptation, according to an example embodiment of the invention.
  • FIGs. 8-10 are flowcharts illustrating methods for estimating an error rate, according to example embodiments of the invention.
  • FiG. 1 illustrates an example embodiment of the invention, for an inventory or warehouse environment, including multiple portable terminals 115 (each having processing circuitry and/or software to implement one or more speech recognition methods disclosed herein) used by users 105 having headsets 120.
  • the speech recognition system is located in headset 120, eliminating the need for terminal 115.
  • a user can speak in a spoken language, through a microphone in the headset 120 and the audio information is converted by the terminal 115 to a useabie digital format to be transferred back to a management console 125.
  • Terminal 1 15, using an RF communication card, can communicate with console 125, through a wireless connection 130, employing for example an IEEE 802.11 standard.
  • Console 125 has a display for monitoring the speech recognition systems of the portable terminals 1 15 by someone such as a supervisor or a professional services support person.
  • U.S. Patent Application Serial No. 10/671 ,142 entitled “Apparatus and Method for Detecting User Speech", incorporated herein by reference, provides further details for implementing such a system.
  • FIG. 1 illustrates the benefits of communicating one or more performance assessments of use of speech recognition system(s) by individual or multiple people.
  • terminals 115 include a display so that a user in the workplace can view an individual performance assessment and if the assessment is poor (based on a predetermined standard), view instructions for taking corrective action.
  • terminals 1 15 (with or without a display) produce audio responses to the user to report the performance assessment and instructions for possible corrective action or actions
  • console 125 networked to terminals 1 15 provides a location to view performance assessment(s) of one or more speech recognition systems and user's use of the systems in the workplace.
  • terminals 1 15 are connected to a larger network ⁇ such as an intranet) that includes PCs with web browsers, so that performances assessments (of use of system by individual or multiple users) can be viewed at any PC or any terminal 115 connected to the network.
  • Performance assessments can be aggregated, consolidated or otherwise organized on console 125, at a location other than where the speech recognition systems and users are located, so that another person (such as a supervisor or professional services support person) can evaluate the performance of speech recognition systems and the users using the systems as whole.
  • multiple performance assessments displayed on console 125 allow a supervisor to compare a particular speech recognition system and user's use of the system against other systems and users using the other systems.
  • a method for assessing a performance of a speech recognition system includes determining a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and the grade is based on a recognition rate and at least one recognition factor.
  • the approach may be implemented as an apparatus, including a processor adapted to determine a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and the grade is based on a recognition rate and at least one recognition factor.
  • a method for model adaptation for a speech recognition system may include determining a performance assessment of the system, corresponding to either recognition of instances of word or recognition of instances of various words among a set of words.
  • the method may further include adjusting an adaptation, of a mode! for the word or various models for the various words, based on the performance assessment.
  • the approach may be implemented as an apparatus, which may include all or a subset of the following: a processor adapted to determine a performance assessment of the system, corresponding to either recognition of instances of word or recognition of instances of various words among a set of words.
  • the apparatus may further include a controller adapted to adjust an adaptation of the mode! for the word or various models for the various words, based on the performance assessment.
  • a method for improving performance of a speech recognition system includes determining a performance of the system, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, and determining a corrective action based on the performance, to improve the performance.
  • the method may further include communicating the corrective action to the user or performing the corrective action.
  • the approach may be implemented as an apparatus, including a processor adapted to determine a performance of the system, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, and adapted to determine a corrective action based on the performance, to improve the performance.
  • the processor may further be adapted to communicate the corrective action to the user or to perform the corrective action.
  • a method for assessing a performance of a speech recognition system includes determining a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, and the grade indicates a level of the performance of the system and the grade is based on a count of observations with errors or a count of correct observations and at least one recognition factor.
  • the approach may be implemented as an apparatus, including a processor that determines a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, and the grade indicates a level of the performance of the system and the grade is based on a count of observations with errors or a count of correct observations and at least one recognition factor.
  • FIG. 2 illustrates a schematic view of a speech recognition system, according to an example embodiment of the invention.
  • a speech signal such as from a system user or from a data storage device, may be captured by a speech input device 202 in a variety of conventional ways.
  • a microphone or other electro-acoustical device senses speech input from a user and converts it into an analog voltage signal 203 that then is forwarded to a signal processor 204.
  • Signal processor 204 converts the analog speech input 203 into a digitized stream of data 205 that can be separated into separate units for analysis. Alternatively, this audio data from device 202 can be retrieved from a data storage device.
  • Signal processor 204 also generates a speech-to-noise ratio value.
  • the signal processor 204 divides the digital stream of data that is created into a sequence of time-slices, or frames 205, each of which is then processed by a feature generator 206, thereby producing features (vector, matrix, or otherwise organized set of numbers representing the acoustic features of the frames) 207.
  • features vector, matrix, or otherwise organized set of numbers representing the acoustic features of the frames
  • LPC Linear Predictive Coding
  • a speech recognition search algorithm function 208 realized by an appropriate circuit and/or software in the system 200 analyzes the features 207, using probabiiistic models provided through 222 from a library of suitable models 210, in an attempt to determine what hypothesis to assign to the speech input captured by input device 202.
  • the search algorithm 208 compares the features 207 generated in the generator 206 with reference representations of speech, or speech models, in library 210 in order to determine the word or words that best match the speech input from device 202. Part of this recognition process is to assign a confidence factor for the speech to indicate how closely the sequence of features 207 used in the search algorithm 208 matches the closest or best-matching models in library 210. As such, a hypothesis consisting of one or more vocabulary items and associated confidence factors 21 1 is directed to an acceptance algorithm 212. If the confidence factor is above a predetermined acceptance threshold, then the acceptance algorithm 212 makes a decision 218 to accept the hypothesis as recognized speech.
  • Performance assessment module 224 (which may be implemented in a processor) determines or estimates a performance assessment.
  • the performance assessment may be a recognition rate, a grade, or any other type of performance assessment of the speech recognition system.
  • a recognition rate may be an error rate, which can be defined as the percentage or ratio of observations with speech recognition errors over the number of observations of the system and the error rate can be determined over a window of time (e.g. predetermined length of time) and/or data (e.g. predetermined number of utterances input to the system).
  • An observation can be defined as any speech unit by which speech recognition may be measured.
  • An observation may be a syllable, a phoneme, a single word or multiple words (such as in a phrase, utterance or sentence).
  • the recognition rate can be a word error rate, the percentage or ratio of speech recognition errors over the number of words input into the system.
  • the recognition rate may also be an accuracy rate, which can be defined as the percentage or ratio of correct observations by the system over the number of observations of the system, and the accuracy rate can be determined over a window of time (e.g. predetermined length of time) and/or data (e.g. predetermined number of utterances input to the system).
  • a window of time e.g. predetermined length of time
  • data e.g. predetermined number of utterances input to the system.
  • An utterance is a spoken phrase of at least one word such as T or "1-2-3".
  • the recognition rate may be a count of observations with errors divided by a length of time, a count of correct observations divided by a length of time, a count of observations with errors divided by a number of transactions, a count of correct observations divided by a number of transactions, a count of observations with errors after an event has occurred (such as apparatus being powered on or a user starting a task), or a count of correct observations after an event has occurred. Therefore, a recognition rate can be an error rate, an accuracy rate, a rate based upon the identification or counting of observations with errors or correct observations, or other type of recognition rate known to those skilled in the art.
  • the recognition rate can be determined or estimated in the following ways: per user; over a number of users; per word; over for a set of words; or per a group of consecutively spoken words, such as an utterance, phrase or sentence.
  • recognition rate determined by module 224 can be based on actual errors, correct observations and observations as determined from comparing the system's hypothesis to the reference transcript or based on estimates of these deemed to have occurred after evaluating system and user behavior, as discussed later in this application. Therefore, the recognition rate determination can be a recognition rate estimation.
  • Inputs to module 224 needed to calculate a recognition rate are those needed for a recognition rate calculation used for a particular application.
  • inputs include a hypothesis and confidence factor 21 1 with its associated timing information and expected response 214. (U.S. Patent Application Serial No. 1 1/051 ,825, and the BACKGROUND section of this present appiication describes scenarios in which an expected response from a user is processed by a speech recognition system.)
  • the performance assessment by the performance assessment module 224 may also be a grade, which can be defined as an assessment of the performance of the speech recognition system when used by a particular user.
  • Inputs to module 224 needed to determine or estimate the grade depend on the particular application in which the system is being used.
  • inputs include a speech-to-noise ratio 219 and the number of words in an utterance input to the speech recognition system.
  • Example Embodiments of a Performance Report Generator [00038]
  • Performance assessment module 224 outputs performance assessments 223 to performance report generator 225.
  • Performance report generator 225 outputs a report of the performance assessment and suggestions to a user for improving the performance of the speech recognition system.
  • performance assessment module 224 also outputs performance assessments 223 to model adaptation and control module 217.
  • Mode! adaptation and control module 217 (which may be implemented as a hardware or software controller or control mechanism) controls or adjusts the adaptation of models, inputs to module 217 are those need to for the particular control of model adaptation desired for a particular application, in an example embodiment, inputs are a hypothesis 211 and features 207.
  • Module 217 determines when to adapt a certain model or models (including when to adapt or withhold adaptation) and which utterances to use to adapt the models.
  • module 217 adapts models by using the transcription (generated by the speech recognition system) of the utterance and the features 207 observed by the recognition system corresponding to the utterance. In controlling or adjusting adaptation, module 217, determines the criteria to be met before adaptation is ordered. Furthermore, once adaptation is to proceed, module 217 may determine whether the existing models are replaced with new models created with the new features only or whether the existing models are just adapted using information from both the new features and the existing features of the existing models. Module 217 outputs adapted models 221 to the library 210 of models.
  • model adaptation and control module 217 uses the performance assessments 223 from performance assessment module 224 to control model adaptation of models.
  • the speech recognition system prevents adaptation from causing recognition accuracy to get worse when it's at an acceptable level and avoids inefficient use of computational, storage and/or power resources.
  • FIGs. 3, 6-10 are flow charts illustrating methods according to example embodiments of the invention.
  • the techniques illustrated In these flow charts may be performed sequentially, in parallel or in an order other than that which is described. It should be appreciated that not all of the techniques described are required to be performed, that additional techniques may be added, and that some of the illustrated techniques may be substituted with other techniques.
  • Example Embodiments of Performance Assessment and Report Generation [00042] FIG. 3 is a flowchart illustrating a method for assessing and improving the performance of a speech recognition system for the recognition of a word, according to example embodiments of the invention.
  • this method can also be used to assess the performance of multiple systems and/or for recognition of at least a subset of the words in a vocabulary of a system (such as recognition of the digits in the vocabulary).
  • the method can be performed by a performance assessment module ⁇ such as 224 of FIG. 2) and a performance report generator (such as 225 of FIG. 2).
  • a recognition rate is determined.
  • the recognition rate is an error rate.
  • the recognition rate is an accuracy rate.
  • the recognition rate can be determined or estimated or in the following ways: over a window of time; over a window of data observed by the system; per user; over a number of users; per word; over for a set of words; or per a group of consecutively spoken words, such as an utterance, phrase or sentence, in the following discussion, the recognition rate corresponds to recognition of instances of a single word (such as the digit '1').
  • the recognition rate may be a combined recognition rate, corresponding to recognition of the instances of various words ⁇ such as the words '1 ', '2' and '3', for all digits, or for ail words in the vocabulary of the system).
  • a score is calculated for recognition of the word.
  • an accuracy rate is used for the recognition rate and the score is calculated using the equation: score - ⁇ 100 - 500 * (1 - accuracy rate)) + 5 * ( 2 - uttlen ) + ⁇ 25 - SNR ) (2) where, uttien is an average number of words in a multi-word utterance, and SNR is an average speech-to-noise ratio during the multi-word utterances (which is limited to the range of 21-28 dB in an example embodiment).
  • recognition rates can be used, such as a recognition rate based on a count of observations with errors or observations.
  • An example embodiment score calculation considers one or more of the following recognition factors: recognition rate, error rate, accuracy rate, the average number of words in a multi-word utterance (uttlen), the speech-to-noise ratio (SNR) and any other recognition factors as would be known to those skilled in the art.
  • recognition rate can depend on the number of words in an utterance.
  • a recognition rate that is an utterance error rate typically increases with the number of words in an utterance and a recognition rate that is an utterance accuracy rate typically decreases with the number of words in an utterance.
  • One reasoning behind considering the speech-to-noise ratio is that recognition errors typically increase in a high-noise environment and so the calculation allows the score to be adjusted in view of this.
  • Other example embodiment score calculations can consider other recognition factors such as a background noise level, the number of words in the vocabulary of a speech recognition system, perplexity, grammar complexity or confusability, or any other measure of difficulty of performing a speech recognition task.
  • a grade is assigned to the score as follows:
  • the grades are not letters, but are other indications of a rating, such as numbers (e.g. '1 ', '2', '3' and '4 ! ), symbols (such as ' ⁇ ', ' ⁇ ', ' ⁇ ' and '!!!'), colors or bars. Examples of calculated scores and assigned grades using (1 ) and (3) respectively are shown in Table 1 :
  • grade calculations can consider other recognition factors such as those identified above for the score calculation, and a measure or measures of performance of a system or systems used by one or more users,
  • the system automatically generates corrective action suggestions (if any) for the user, based on one or more scores or grades.
  • the system can generate the suggestions for example by using a predefined standard, table, formula or algorithm that considers the score and/or grade and other factors (such as the recognition factor, an environmental factor or corresponding scores and/or performance assessments for systems used by other users in a similar environment) to yield a suggested corrective action. For example, if the grade for a word is less than the grades for recognition of words of systems used by other users in a similar environment, the generated corrective action suggestion could be to instruct the user that he or she should perform an action that causes the model or set of models ⁇ for the word or words associated with the low grade) to be modified.
  • the user may retrain, adapt, or otherwise modify the model to improve performance.
  • Other examples of corrective actions include instructing the user to: wait until the system is done speaking before starting to speak, replace a microphone, speak louder, adjust the position of the microphone relative to the user's mouth, move to an that is quieter than the current environment, and/or replace or remove the windscreen from the microphone.
  • 325 is not performed and instead, upon receiving score and/or grade information for a user or multiple users, a supervisor or professional services support person considers the information and other factors (such as environmental factors or corresponding scores and/or grades for systems used by other users in a similar environment) to personally provide a suggested corrective action to a user or users.
  • factors such as environmental factors or corresponding scores and/or grades for systems used by other users in a similar environment
  • a report of the performance assessment of the system is generated.
  • An example report for a user, showing grades for the particular user and the number of times each word has been observed, "count”, is as foSlows: Individual Report for User 1
  • grades were calculated and reported for each of the words O', '1 ', '2 ⁇ '3', '4', '5', '6', T, '8', and '9' for the user, "User 1". Also reported is an automatically generated suggested corrective action to this user to "Retrain Word 5".
  • the systems used by the multiple users may be similar systems and/or may be operated in similar environments.
  • grades were calculated and reported for each of the words O', T, '2', '3', '4', '5 ! , '6', T, ( 8 ⁇ and '9 1 for the users "User 1" and "User 2". Also reported is an automatically generated suggested corrective action for User 1 to "Retrain Word 5" and no corrective action is suggested for User 2.
  • FIG. 4 illustrates portable terminal 400, having an example embodiment display 410 showing grades for recognition of various words by a speech recognition system used by a particular user.
  • FIG. 5 illustrates a management console 500, having an example embodiment display 510 showing grades for recognition of various words by systems used by multiple users.
  • either a computer station or a portable terminal displays either type of report.
  • a display ⁇ ke 510 is shown on a web browser of a PC that is connected to a larger network (such as an intranet) in which various user's speech recognition systems are networked, so that performances assessments of the various systems can be viewed at the PC.
  • corrective action can be automatically initiated or the user can be instructed to take corrective action.
  • An example of automated initiation of corrective action is the initiation of a retraining session, upon the calculation of a poor grade for the recognition of a particular word.
  • a user can be instructed to take corrective action through an alert mechanism of the portable terminal or through a supervisor or professional services support person.
  • Example alert mechanisms of the portable terminal are physical, visual or sound indicators such as a light on the portable terminal (such as 405 in FIG. 4), a vibrating portable terminal, a displayed message, or a spoken instruction from the portable terminal.
  • FIG. 6 is a flow chart illustrating a method 600 for controlling or adjusting model adaptation, according to an example embodiment of the invention. It can be executed by components of a speech recognition system, such as the modules illustrated in FIG. 2. At 605, input speech is received by the speech recognition system.
  • initial speech processing is performed (such as processing of the input speech performed by the signal processor 204, feature generator 206 and speech recognition search algorithm 208 of FlG. 2) for at least one word.
  • a performance assessment corresponding to either recognition of instances of a word or for recognition of the instances of various words is determined (by for example performance assessment module 224 of FIG. 2).
  • the performance assessment can be based on recognition errors for the word '1 ', for the words '1 ', '2' and '3', for all digits, or for al ⁇ words in the vocabulary of the system.
  • the performance assessment can be updated based on instances previousiy and currently input to the system.
  • a determination is made whether to adapt (by for example the model adaptation control module 217 of FlG. 2) a model for the word or various models for the various words, based on the performance assessment. For example, a determination can be made to adapt the model for the word '1 ' based on a performance assessment for the word '1 ⁇ in another example, a determination can be made to adapt all words that are digits, based on a combined performance assessment for a!l of the digits. If it was determined that the modei(s) should not be adapted, next is 605. If the model(s) should be adapted, it is adapted in 625. After 625 is executed, control returns to 605.
  • Model adaptation in 625 can be performed in the background with control returning to 605 immediately.
  • the speech recognition system can continue to receive and process speech while the models are being adapted.
  • a performance assessment is compared to a performance assessment threshold to control model adaptation, in other words, an example embodiment makes a comparison of a performance assessment to a performance assessment threshold and adapts at least one mode! or withholds adapting the model based on the comparison. For example, if the assessment threshold is 'C, and an assessment is 'D ⁇ a model associated with the assessment is determined to be adapted (by for example model adaptation and contra! module 217 of FIG. 2.
  • the performance assessment threshold can be a predetermined value, settable by a user, a dynamic value, or it can be adjusted upwardly or downwardly.
  • the assessment threshold can be based on factors that affect the achievable performance level of the speech recognition system and those that determine an acceptable performance level for the application in which the system is used.
  • the assessment threshold can be based on a performance assessment of a set of users of like systems, a number of words in an utterance input to the speech recognition system, based on environmental factors (such as background noise level, speech-to-noise ratio, or a measurement of the user's speech level), based on the perplexity of the grammar of a speech recognition system, based on the confusability of the words in the vocabulary or based on a number of words in the vocabulary of a speech recognition system.
  • environmental factors such as background noise level, speech-to-noise ratio, or a measurement of the user's speech level
  • FlG. 7 is a flow chart illustrating a method 700 for model adaptation, according to an example embodiment of the invention. It can be executed by a component of a speech recognition system, such as the model adaptation and control module 217 of FlG. 2, after a decision has been made to adapt.
  • the features observed by a speech recognition system corresponding to an input utterance are aligned with the states in the models for the words of the utterance (by for example using the Baum-Welch re-estimation algorithm).
  • the statistics for example, means and variances
  • these va ⁇ ues are mixed into the models with an appropriate weighting to maintain an appropriate balance between previous training data and new features.
  • new models are created by using the observed features of an input utterance and the existing features of the original models, and the statistics associated with each, are used to create the new models.
  • new statistics might be weighted in various fashions to tailor their effect on the original statistics in the model.
  • only the new observed features, and information therefrom, are utilized to create the new model.
  • the adaptation could be performed using data from a single user or multiple users. For example, only speech data from an individual user might be used to perform the adaptation, generating a model that is adapted and performs well for that user.
  • the error rate can be based on any one or combination of the various speech recognition errors discussed in this present application, such as those in the BACKGROUND section of this present application and those discussed below.
  • the error rate can be the ratio of insertion errors over words input to the system.
  • the error rate can be the ratio of insertion, substitution and deletion errors over the words input to the system.
  • the error rate can be the combination of the low confidence rate and the substitution rates discussed below.
  • error rates discussed below are based on estimated errors, which are deemed to have occurred based on evaluating system behavior, the expected response and/or user behavior. Thus, these estimated error rates provide an advantage of not requiring a reference transcript of words input to the system and comparison of the system's hypotheses corresponding to the words input to the system.
  • an identification or count of occurrences of possible errors made by a speech recognition system can be used to determine an estimate of a low confidence rate or an estimate of an error rate.
  • FIG. 8 is a flow chart illustrating a method 800 for identifying errors, which can be executed by components of a speech recognition system, such as the performance assessment module 224 of FlG. 2.
  • the low confidence rate is the rate at which a word is recognized with a confidence factor within a certain range corresponding to low confidence that the system recognized the word correctly.
  • the low confidence rate is the frequency at which a word was recognized by the speech recognition system with a confidence factor that is relatively low depending on the recognizer and application in which the speech recognition system is used.
  • a low confidence rate does not necessarily measure errors by the speech recognition system, but the low confidence rate (or a fraction of its value) can be used in addition to or in place of error rate estimates where error rates (or error rate estimates) are used.
  • the confidence factor for a hypothesized word is determined. (This confidence factor can be generated by search algorithm 208 of FlG. 2 and supplied to the performance assessment module 224 of FIG. 2.)
  • the confidence factor is compared with a range of values corresponding to low confidence that the system recognized the word correctly for the application in which the system is used. If at 810 it is determined that the confidence factor is outside of the iow confidence range, control is returned to 805.
  • an example embodiment which uses a low confidence rate, also considers when a word is from a hypothesis generated by the system that matches an expected response in counting errors for an error rate estimation.
  • An expected response can be defined as a response that the system expects to receive from the user, as a result of the application in which the system is used).
  • a matching algorithm of the system normally requires that the system's hypothesis is accepted only if a confidence factor for the hypothesis exceeds an acceptance threshold. However, when the system's most likely hypothesis matches an expected response, the hypothesis is more favorably treated so that the hypothesis may be accepted by the system. The reasoning behind the favorable treatment despite the relatively low confidence factor is that a hypothesis matching an expected response usually indicates a high probability of correct recognition.
  • the error rate is a iow confidence rate
  • responses that match the expected response and have a relatively low confidence factor for the application in which the system is used are counted as errors for an error rate estimation.
  • a recognition error may not have actually occurred (because the system's hypothesis was correctly accepted due to the hypothesis matching the expected response as described in referenced U.S. Patent Application Serial No. 1 1 /051 ,825)
  • a word with a relatively low confidence is counted as an error for an error rate estimation due to the relatively low confidence factor.
  • the range of confidence factors for which a word is counted as a low confidence could be, for example, between the adjusted acceptance threshold and the original, unadjusted acceptance threshold. More generally, the confidence factor thresholds or range for the counting low confidence errors do not need to match the acceptance threshold and adjusted acceptance threshold in the referenced patent application. The range could be between two other thresholds, including a high confidence threshold, which is higher than the acceptance threshold and indicates the boundary between low and high confidence. In this example embodiment, the range of confidence factors used for the low confidence rate is determined based on the application in which the speech recognition system is used. Substitution Rate
  • an identification or count of occurrences of possible substitution errors made by a speech recognition system can be used to determine an estimate of a substitution error rate or an estimate of an error rate.
  • the substitution rate is the rate at which substitution errors (such as the substitution errors defined in the BACKGROUND section of this present application) are made by a system.
  • a hypothesis generated by the speech recognition system is compared to an expected response and a substitution error occurs if the system replaces a word in the expected response with a word an incorrect word in the hypothesis.
  • the error rate is based on a recognition error made by the speech recognition system that is identified after comparing the speech recognition system's decision on its hypothesis of at least two consecutive or proximate utterances.
  • the decision can occur after the speech recognition system has processed the incoming utterances (such as at 218 of FIG. 2, after the acceptance algorithm in 212 of FlG. 2 is executed).
  • the recognition error can be, for example, to reject the system's hypothesis of an incoming utterance, after which the user repeats the utterance, in response to the system's response or lack of one.
  • the recognition error can be to substitute a word that speech recognition system is unable to recognize, with another word or "garbage" word, in the speech recognition system output.
  • FIGs. 9-10 illustrate example embodiment methods to estimate these types of error rates. Reject and Repeat
  • FIG. 9 is a flow chart illustrating a method 900 of an example embodiment error rate for identifying possible occurrences of errors made by a speech recognition system.
  • the count of the possible occurrences of errors can be used to determine an estimate of an error rate.
  • Method 900 can be executed by a component of a speech recognition system, such as error rate calculation module 210 of FlG. 2.
  • the determination of whether the speech recognition system made an error is made when the speech recognition system receives at least two consecutive or proximate utterances.
  • the system behavior and user behavior is as foilows: the system rejects its hypothesis of the first utterance; the user repeats the first utterance in the second utterance; and the system accepts its hypothesis of the second utterance.
  • the first and second hypotheses generated by the system substantially match.
  • the hypotheses match word-for- word but a hypothesis may or may not also include a recognized model that is considered to be negligible for this particular error determination.
  • a hypothesis could include a recognized model indicating a user's breath or sigh and these recognized models may or may not be considered negligible for this particular error determination.
  • the determination of whether a recognized model is negligible depends upon the particular speech recognition system and the application in which it is used.
  • An example is as follows: a user speaks a first utterance "1 -2-3", the system correctly recognizes it (i.e.
  • the words in the first and second hypotheses are compared word-for-word to find if they match. For example, if the first hypothesis is "one-two- three' 1 and the second hypothesis is "one-three-three", there is a mismatch. If the hypotheses match word-for-word, there is a high probability that an incorrect rejection error has occurred, with the reasoning that the user repeated himself and the speech recognizer recognized the second utterance correctly. If the hypotheses match word-for-word, next is 925. Otherwise, control returns to 905. At 925, the error count is incremented and then control returns to 905. The error count in 925 may then be combined with counts of other error types to generate an overall error rate. Substitute and Repeat
  • FiG. 10 is a flow chart illustrating a method 1000 of an example embodiment for identifying possible occurrences of errors made by a speech recognition system.
  • the count of the possible occurrences of errors can be used to determine an estimate of an error rate or an estimate for part of an error rate.
  • Method 1000 can be executed by a component of a speech recognition system, such as error rate module 210 of FiG. 2. in this embodiment, determination of whether the speech recognition system made an error is made when the speech recognition system receives at least two consecutive or proximate utterances and the system substitutes a word in its hypothesis of the first utterance and recognizes and accepts all of the words in its hypothesis of the second utterance.
  • An example is as follows: a user speaks a first utterance "1 -2-3"; the system misrecognizes it (e.g. generates a hypothesis "1 -5-3") and accepts its hypothesis; the user repeats "1 -2-3" in a second utterance within a proximity of the first utterance; the system correctly recognizes it (i.e. generates a hypothesis "1 -2-3") and accepts its hypothesis.
  • a rationale behind this method of detecting errors is that if the two utterances are spoken consecutively or within a proximity of each other, and if the system accepts its hypothesis of the second utterance, then the system likely made a substitution In its hypothesis of the first utterance.
  • the heuristics include checking for one or more of the foilowing possible conditions: there were no intervening utterances that indicate that the first utterance was correctly recognized by the system; the two utterances being compared represent the same piece of information being entered into the system, for example, the two utterances being compared occurred at the same position in the dialogue between the user and the recognition system or in response to the same prompt; the two utterances were spoken within a predetermined amount of time or, in other words, the time between the two utterances being compared was short enough to suggest that the user was repeating the initiai utterance.
  • These verifications improve the accuracy of the estimate of the substitution error rate and can include one or more of the following: verifying that the utterances were spoken consecutively or within a proximity of each other; verifying that the system's hypotheses of the utterances contain multiple words ; verifying that the system's hypotheses of the utterances contain ail accepted words; verifying that the user was prompted for the same information by the system both times; verifying that the first hypothesis does not match the expected response (if there is one); verifying that the second hypothesis does match the expected response (if there is one); and checking for a condition indicating a substitution error occurred (such as those described above).
  • the words in the system's hypotheses of the first and second utterances are compared word-for-word to see if they match. If the hypotheses do not match word-for-word, next is 1020. Otherwise, control returns to 1005. At 1020, if the verifications pass, next is 1025. Otherwise, control returns to 1005.
  • the words in system's hypotheses of the first and second utterances are compared word-for-word to find how closely they match. For example, if the first hypothesis is "1 -2-3" and the second hypothesis is "1-5-3", there is a mismatch of one word. In this case, the '5' was substituted for the '2'.
  • hypotheses do not match word-for-word but do mostly match, (e.g. the hypotheses match except for one word)
  • it is a reasonable conclusion that a word substitution error has occurred with the reasoning that the system performed verifications such as checking for at least one condition indicating a substitution error occurred, the user repeated the same utterance, the system recognized the second utterance correctly, and the system incorrectly substituted in its hypotheses of the first utterance.
  • the definition of "mostly match" depends upon the application.
  • an application that uses five-word hypotheses or utterances may define "mostly match" as matching word-for-word except for two words.) if the hypotheses mostly match word-for-word, next is 1030 where the error count is incremented followed by control returning to 1005. The error count in 1030 may then be combined with counts of other error types to generate an overall error rate. [00072]
  • the same approach as in the previous paragraph can be used to identify deletion due to garbage errors where a content word is recognized as garbage in a first utterance, then correctly recognized in the next utterance. By comparing the recognition results of the two utterances and using verifications such as those described above, one can detect the error.
  • the system recognized and rejected the '2' in its hypothesis of the first utterance the system made a deletion due to rejected substitution error.
  • the method for detecting this type of error is similar to that described in the discussion of FIG. 10, with the difference that the system's hypothesis of the first utterance does not need to contain all accepted words. Correction Rate
  • a correction rate at which a user provides feedback to the system can be used as an estimate of an error rate or an estimate for part of an error rate.
  • the reasoning behind using a correction rate to estimate an error rate or estimate part of an error rate is that when a correction is commanded to the system, it may indicate that an error occurred. Examples of user feedback are described in the BACKGROUND section of this present application.
  • the correction rate can include the rate at which the user indicates that the system made a mistake.
  • the user may provide feedback in response to the system requesting feedback, such as asking the user to confirm a hypothesis generated by the system or asking the user to identify what word was spoken by the user.
  • the feedback may include a word indicating aggravation by the user or the feedback may be a correction command to the system, such as "back-up" or "erase”.
  • the recognition rate (error rate, accuracy rate or other type of recognition rate)
  • considerations can be made for the amount of time and data needed to determine or estimate a recognition rate that is useful for the application in which the speech recognition system is used.
  • the recognition rate is determined or estimated for speech input to the speech recognition system over a predetermined period of time.
  • the recognition rate is determined or estimated for speech input to the speech recognition system over a predetermined number of utterances, words, or hypotheses.
  • the recognition rate is determined or estimated from hypotheses of utterances coiiected over a moving or sliding window or a collection period that is dynamic in period of time and/or size of data.
  • the recognition rate is determined or estimated over a period when useful data has been collected.
  • a moving or sliding window can cover a collection of data taken from equal periods in noisy environment and a quiet environment to offset any favoring by the speech recognition system in one of those environments.
  • Other examples of moving, sSiding windows are those that collect data only during recent use (e.g. the last half-hour) of the speech recognition system, collecting data for time spent by a particular user (e.g.
  • Jt can be understood by those skilled in the art that in other example embodiments of the invention, other recognition rates can be used in place of a word recognition rate, such as a syllable recognition rate, a phoneme recognition rate, a phrase recognition rate, an utterance recognition rate, and a sentence recognition rate.
  • a syllable recognition rate can be used in place of a word recognition rate, such as a syllable recognition rate, a phoneme recognition rate, a phrase recognition rate, an utterance recognition rate, and a sentence recognition rate.
  • an utterance recognition rate can be defined as the percentage or ratio of either correctly recognized utterances or utterances with errors made by a system over the number of utterances input to the system.
  • the invention in its various example embodiments, may be impSemented directly in the software of a speech recognition system. That is, the improvements are actually part of the speech recognition system.
  • the invention does not have to be built into the speech recognition system. Rather, the invention or parts of the invention may be implemented in a separate program or application, which may be utilized by a speech recognition system to provide the benefits of the invention. In other words, separate applications or software modules may be utilized to handle any of the steps in FiG. 3 in accordance with the principles of the invention.
  • an appiication may interface with a speech recognition system to determine a performance assessment and/or control when and how models are adapted.
  • the instructions on the machine accessible or machine readable medium may be used to program a computer system such as for example, a PC, celi phone, industrial mobile computer, PDA, electronic headset or other electronic device to perform the methods described herein.
  • the machine-readable medium may include, but is not limited to, nonvolatile memory, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks or other type of media/machine-readable medium suitable for storing or transmitting electronic instructions.
  • departures may be made from the application in which the invention is described without departing from the spirit and scope of the invention.
  • the example speech recognition system described herein has focused on wearable terminals.
  • the principles of the invention are applicable to other speech recognition environments as well.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Telephonic Communication Services (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)
EP07759805A 2006-04-03 2007-03-30 Verfahren und systeme zur beurteilung und verbesserung der leistung eines spracherkennungssystems Ceased EP2005416A2 (de)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20130187267 EP2685451A3 (de) 2006-04-03 2007-03-30 Verfahren und Systeme zur Beurteilung und Verbesserung der Leistung eines Spracherkennungssystems
EP13187263.2A EP2711923B1 (de) 2006-04-03 2007-03-30 Verfahren und Systeme zur Beurteilung und Verbesserung der Leistung eines Spracherkennungssystems
EP19203259.7A EP3627497A1 (de) 2006-04-03 2007-03-30 Verfahren und systeme zur beurteilung und verbesserung der leistung eines spracherkennungssystems

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US78860606P 2006-04-03 2006-04-03
US78862206P 2006-04-03 2006-04-03
US78862106P 2006-04-03 2006-04-03
US11/539,456 US7827032B2 (en) 2005-02-04 2006-10-06 Methods and systems for adapting a model for a speech recognition system
US11/688,920 US7895039B2 (en) 2005-02-04 2007-03-21 Methods and systems for optimizing model adaptation for a speech recognition system
US11/688,916 US7949533B2 (en) 2005-02-04 2007-03-21 Methods and systems for assessing and improving the performance of a speech recognition system
PCT/US2007/065615 WO2007118029A2 (en) 2006-04-03 2007-03-30 Methods and systems for assessing and improving the performance of a speech recognition system

Related Child Applications (3)

Application Number Title Priority Date Filing Date
EP13187263.2A Division EP2711923B1 (de) 2006-04-03 2007-03-30 Verfahren und Systeme zur Beurteilung und Verbesserung der Leistung eines Spracherkennungssystems
EP19203259.7A Division EP3627497A1 (de) 2006-04-03 2007-03-30 Verfahren und systeme zur beurteilung und verbesserung der leistung eines spracherkennungssystems
EP20130187267 Division EP2685451A3 (de) 2006-04-03 2007-03-30 Verfahren und Systeme zur Beurteilung und Verbesserung der Leistung eines Spracherkennungssystems

Publications (1)

Publication Number Publication Date
EP2005416A2 true EP2005416A2 (de) 2008-12-24

Family

ID=38353024

Family Applications (7)

Application Number Title Priority Date Filing Date
EP13187263.2A Active EP2711923B1 (de) 2006-04-03 2007-03-30 Verfahren und Systeme zur Beurteilung und Verbesserung der Leistung eines Spracherkennungssystems
EP12173408.1A Active EP2541545B1 (de) 2006-04-03 2007-03-30 Verfahren und Systeme zur adaption eines Modells für ein Spracherkennungssystem
EP07759840A Active EP2005418B1 (de) 2006-04-03 2007-03-30 Verfahren und systeme zur adaption eines modells für ein spracherkennungssystem
EP19203259.7A Pending EP3627497A1 (de) 2006-04-03 2007-03-30 Verfahren und systeme zur beurteilung und verbesserung der leistung eines spracherkennungssystems
EP07759805A Ceased EP2005416A2 (de) 2006-04-03 2007-03-30 Verfahren und systeme zur beurteilung und verbesserung der leistung eines spracherkennungssystems
EP20130187267 Withdrawn EP2685451A3 (de) 2006-04-03 2007-03-30 Verfahren und Systeme zur Beurteilung und Verbesserung der Leistung eines Spracherkennungssystems
EP07759818A Ceased EP2005417A2 (de) 2006-04-03 2007-03-30 Verfahren und systeme zur optimierten modellanpassung für ein spracherkennungssystem

Family Applications Before (4)

Application Number Title Priority Date Filing Date
EP13187263.2A Active EP2711923B1 (de) 2006-04-03 2007-03-30 Verfahren und Systeme zur Beurteilung und Verbesserung der Leistung eines Spracherkennungssystems
EP12173408.1A Active EP2541545B1 (de) 2006-04-03 2007-03-30 Verfahren und Systeme zur adaption eines Modells für ein Spracherkennungssystem
EP07759840A Active EP2005418B1 (de) 2006-04-03 2007-03-30 Verfahren und systeme zur adaption eines modells für ein spracherkennungssystem
EP19203259.7A Pending EP3627497A1 (de) 2006-04-03 2007-03-30 Verfahren und systeme zur beurteilung und verbesserung der leistung eines spracherkennungssystems

Family Applications After (2)

Application Number Title Priority Date Filing Date
EP20130187267 Withdrawn EP2685451A3 (de) 2006-04-03 2007-03-30 Verfahren und Systeme zur Beurteilung und Verbesserung der Leistung eines Spracherkennungssystems
EP07759818A Ceased EP2005417A2 (de) 2006-04-03 2007-03-30 Verfahren und systeme zur optimierten modellanpassung für ein spracherkennungssystem

Country Status (3)

Country Link
EP (7) EP2711923B1 (de)
JP (4) JP5426363B2 (de)
WO (3) WO2007118029A2 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928829B2 (en) 2005-02-04 2018-03-27 Vocollect, Inc. Methods and systems for identifying errors in a speech recognition system

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8595642B1 (en) 2007-10-04 2013-11-26 Great Northern Research, LLC Multiple shell multi faceted graphical user interface
US8958848B2 (en) * 2008-04-08 2015-02-17 Lg Electronics Inc. Mobile terminal and menu control method thereof
JP2010128015A (ja) * 2008-11-25 2010-06-10 Toyota Central R&D Labs Inc 音声認識の誤認識判定装置及び音声認識の誤認識判定プログラム
EP2246729A1 (de) 2009-04-30 2010-11-03 Essilor International (Compagnie Générale D'Optique) Verfahren zur Beurteilung eines optischen Merkmals eines Brillenglasdesigns
DE102010001788A1 (de) 2010-02-10 2011-08-11 Forschungsverbund Berlin e.V., 12489 Skalierbarer Aufbau für laterale Halbleiterbauelemente mit hoher Stromtragfähigkeit
US10269342B2 (en) 2014-10-29 2019-04-23 Hand Held Products, Inc. Method and system for recognizing speech using wildcards in an expected response
US9984685B2 (en) 2014-11-07 2018-05-29 Hand Held Products, Inc. Concatenated expected responses for speech recognition using expected response boundaries to determine corresponding hypothesis boundaries
CN105336342B (zh) * 2015-11-17 2019-05-28 科大讯飞股份有限公司 语音识别结果评价方法及系统
JP7131362B2 (ja) * 2018-12-20 2022-09-06 トヨタ自動車株式会社 制御装置、音声対話装置及びプログラム
CN111754995B (zh) * 2019-03-29 2024-06-04 株式会社东芝 阈值调整装置、阈值调整方法以及记录介质
KR102547001B1 (ko) 2022-06-28 2023-06-23 주식회사 액션파워 하향식 방식을 이용한 오류 검출 방법
CN117437913B (zh) * 2023-12-18 2024-03-19 深圳昱拓智能有限公司 一种自适应近远场的离线语音命令词识别方法、系统及介质

Family Cites Families (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4882757A (en) 1986-04-25 1989-11-21 Texas Instruments Incorporated Speech recognition system
JPS63179398A (ja) * 1987-01-20 1988-07-23 三洋電機株式会社 音声認識方法
JPS644798A (en) * 1987-06-29 1989-01-09 Nec Corp Voice recognition equipment
JP2817429B2 (ja) * 1991-03-27 1998-10-30 松下電器産業株式会社 音声認識装置
US5182502A (en) 1991-05-06 1993-01-26 Lectron Products, Inc. Automatic headlamp dimmer
US5182505A (en) 1991-06-19 1993-01-26 Honeywell Inc. Aircraft control surface position transducer
FI97919C (fi) * 1992-06-05 1997-03-10 Nokia Mobile Phones Ltd Puheentunnistusmenetelmä ja -järjestelmä puheella ohjattavaa puhelinta varten
JP3710493B2 (ja) * 1992-09-14 2005-10-26 株式会社東芝 音声入力装置及び音声入力方法
JP3083660B2 (ja) * 1992-10-19 2000-09-04 富士通株式会社 音声認識装置
JPH0713591A (ja) * 1993-06-22 1995-01-17 Hitachi Ltd 音声認識装置および音声認識方法
TW323364B (de) * 1993-11-24 1997-12-21 At & T Corp
JP2886117B2 (ja) * 1995-09-11 1999-04-26 株式会社エイ・ティ・アール音声翻訳通信研究所 音声認識装置
US6212498B1 (en) * 1997-03-28 2001-04-03 Dragon Systems, Inc. Enrollment in speech recognition
FR2769118B1 (fr) * 1997-09-29 1999-12-03 Matra Communication Procede de reconnaissance de parole
JPH11175096A (ja) * 1997-12-10 1999-07-02 Nec Corp 音声信号処理装置
US6606598B1 (en) * 1998-09-22 2003-08-12 Speechworks International, Inc. Statistical computing and reporting for interactive speech applications
DE69829187T2 (de) * 1998-12-17 2005-12-29 Sony International (Europe) Gmbh Halbüberwachte Sprecheradaptation
US6922669B2 (en) 1998-12-29 2005-07-26 Koninklijke Philips Electronics N.V. Knowledge-based strategies applied to N-best lists in automatic speech recognition systems
US6507816B2 (en) * 1999-05-04 2003-01-14 International Business Machines Corporation Method and apparatus for evaluating the accuracy of a speech recognition system
JP2001042886A (ja) * 1999-08-03 2001-02-16 Nec Corp 音声入出力システムおよび音声入出力方法
JP3908878B2 (ja) * 1999-09-27 2007-04-25 日本放送協会 連続音声認識装置の音素認識性能測定装置
JP4004716B2 (ja) * 2000-05-31 2007-11-07 三菱電機株式会社 音声パターンモデル学習装置、音声パターンモデル学習方法、および音声パターンモデル学習プログラムを記録したコンピュータ読み取り可能な記録媒体、ならびに音声認識装置、音声認識方法、および音声認識プログラムを記録したコンピュータ読み取り可能な記録媒体
JP2001343994A (ja) * 2000-06-01 2001-12-14 Nippon Hoso Kyokai <Nhk> 音声認識誤り検出装置および記憶媒体
EP1199704A3 (de) * 2000-10-17 2003-10-15 Philips Intellectual Property & Standards GmbH Auswahl der alternativen Wortfolgen für diskriminative Anpassung
DE10119284A1 (de) * 2001-04-20 2002-10-24 Philips Corp Intellectual Pty Verfahren und System zum Training von jeweils genau einer Realisierungsvariante eines Inventarmusters zugeordneten Parametern eines Mustererkennungssystems
JP2002328696A (ja) * 2001-04-26 2002-11-15 Canon Inc 音声認識装置および音声認識装置における処理条件設定方法
GB2375211A (en) * 2001-05-02 2002-11-06 Vox Generation Ltd Adaptive learning in speech recognition
US6941264B2 (en) 2001-08-16 2005-09-06 Sony Electronics Inc. Retraining and updating speech models for speech recognition
JP3876703B2 (ja) * 2001-12-12 2007-02-07 松下電器産業株式会社 音声認識のための話者学習装置及び方法
US7103542B2 (en) * 2001-12-14 2006-09-05 Ben Franklin Patent Holding Llc Automatically improving a voice recognition system
US7386454B2 (en) * 2002-07-31 2008-06-10 International Business Machines Corporation Natural error handling in speech recognition
JP4304952B2 (ja) * 2002-10-07 2009-07-29 三菱電機株式会社 車載制御装置、並びにその操作説明方法をコンピュータに実行させるプログラム
JP2005017603A (ja) * 2003-06-25 2005-01-20 Nippon Telegr & Teleph Corp <Ntt> 音声認識率推定方法及び音声認識率推定プログラム
JP3984207B2 (ja) * 2003-09-04 2007-10-03 株式会社東芝 音声認識評価装置、音声認識評価方法、及び音声認識評価プログラム
TWI225638B (en) * 2003-09-26 2004-12-21 Delta Electronics Inc Speech recognition method
JP2005173157A (ja) * 2003-12-10 2005-06-30 Canon Inc パラメータ設定装置、パラメータ設定方法、プログラムおよび記憶媒体
JP2005283646A (ja) * 2004-03-26 2005-10-13 Matsushita Electric Ind Co Ltd 音声認識率推定装置
JP2005331882A (ja) * 2004-05-21 2005-12-02 Pioneer Electronic Corp 音声認識装置、音声認識方法、および音声認識プログラム
CN1965218A (zh) * 2004-06-04 2007-05-16 皇家飞利浦电子股份有限公司 交互式语音识别系统的性能预测
JP4156563B2 (ja) * 2004-06-07 2008-09-24 株式会社デンソー 単語列認識装置
JP2006058390A (ja) * 2004-08-17 2006-03-02 Nissan Motor Co Ltd 音声認識装置
US7243068B2 (en) * 2004-09-10 2007-07-10 Soliloquy Learning, Inc. Microphone setup and testing in voice recognition software
JP4542974B2 (ja) * 2005-09-27 2010-09-15 株式会社東芝 音声認識装置、音声認識方法および音声認識プログラム

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
None *
See also references of WO2007118029A2 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928829B2 (en) 2005-02-04 2018-03-27 Vocollect, Inc. Methods and systems for identifying errors in a speech recognition system

Also Published As

Publication number Publication date
EP2541545A3 (de) 2013-09-04
JP5270532B2 (ja) 2013-08-21
WO2007118030A3 (en) 2008-01-10
JP6121842B2 (ja) 2017-04-26
WO2007118032A3 (en) 2008-02-07
EP2711923B1 (de) 2019-10-16
EP2005418A2 (de) 2008-12-24
EP2005418B1 (de) 2012-06-27
JP2013232017A (ja) 2013-11-14
EP2541545B1 (de) 2018-12-19
EP2005417A2 (de) 2008-12-24
JP2009532743A (ja) 2009-09-10
JP2009532744A (ja) 2009-09-10
JP5426363B2 (ja) 2014-02-26
JP5576113B2 (ja) 2014-08-20
JP2009532742A (ja) 2009-09-10
WO2007118030A2 (en) 2007-10-18
EP2541545A2 (de) 2013-01-02
EP2711923A2 (de) 2014-03-26
EP2711923A3 (de) 2014-04-09
WO2007118029A3 (en) 2007-12-27
EP2685451A2 (de) 2014-01-15
WO2007118029A2 (en) 2007-10-18
EP2685451A3 (de) 2014-03-19
EP3627497A1 (de) 2020-03-25
WO2007118032A2 (en) 2007-10-18

Similar Documents

Publication Publication Date Title
US7949533B2 (en) Methods and systems for assessing and improving the performance of a speech recognition system
EP2711923B1 (de) Verfahren und Systeme zur Beurteilung und Verbesserung der Leistung eines Spracherkennungssystems
US9928829B2 (en) Methods and systems for identifying errors in a speech recognition system
US7895039B2 (en) Methods and systems for optimizing model adaptation for a speech recognition system
US7949523B2 (en) Apparatus, method, and computer program product for processing voice in speech
KR100826875B1 (ko) 온라인 방식에 의한 화자 인식 방법 및 이를 위한 장치
EP2309489B1 (de) Verfahren und System zur Berücksichtigung von Informationen über eine erwartete Antwort bei der Durchführung von Spracherkennung
US20140156276A1 (en) Conversation system and a method for recognizing speech
JP2008233229A (ja) 音声認識システム、および、音声認識プログラム
CN111370030A (zh) 语音情感检测方法与装置、存储介质、电子设备
CN113330513A (zh) 语音信息处理方法及设备
JP2003330491A (ja) 音声認識装置および音声認識方法ならびにプログラム
JP4408665B2 (ja) 音声認識用発話データ収集装置、音声認識用発話データ収集方法、及びコンピュータプログラム
CN111354358B (zh) 控制方法、语音交互装置、语音识别服务器、存储介质和控制系统
WO2018169772A2 (en) Quality feedback on user-recorded keywords for automatic speech recognition systems

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20081031

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR

17Q First examination report despatched

Effective date: 20110926

DAX Request for extension of the european patent (deleted)
APBK Appeal reference recorded

Free format text: ORIGINAL CODE: EPIDOSNREFNE

APBN Date of receipt of notice of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA2E

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

APAF Appeal reference modified

Free format text: ORIGINAL CODE: EPIDOSCREFNE

APBT Appeal procedure closed

Free format text: ORIGINAL CODE: EPIDOSNNOA9E

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20190225