CA2375589A1 - Method and apparatus for determining user satisfaction with automated speech recognition (asr) system and quality control of the asr system - Google Patents

Method and apparatus for determining user satisfaction with automated speech recognition (asr) system and quality control of the asr system Download PDF

Info

Publication number
CA2375589A1
CA2375589A1 CA002375589A CA2375589A CA2375589A1 CA 2375589 A1 CA2375589 A1 CA 2375589A1 CA 002375589 A CA002375589 A CA 002375589A CA 2375589 A CA2375589 A CA 2375589A CA 2375589 A1 CA2375589 A1 CA 2375589A1
Authority
CA
Canada
Prior art keywords
asr
user
voice
voice user
behavioural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002375589A
Other languages
French (fr)
Inventor
James Craig
Andrew Osburn
Carter Cockerill
Jeremy Bernard
Mark Boyle
David Burns
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Diaphonics Inc
Original Assignee
Diaphonics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Diaphonics Inc filed Critical Diaphonics Inc
Priority to CA002375589A priority Critical patent/CA2375589A1/en
Publication of CA2375589A1 publication Critical patent/CA2375589A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/227Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology

Abstract

An apparatus for determining user satisfaction using automated speech recognition (ASR) systems is disclosed. The apparatus comprises: means for assessing the voice user emotional state based upon the voice characteristics;
means for assessing the voice user behavioural pattern based upon the current, and previous, interactions with the ASR application; means for decision-modelling the overall voice user experience based upon the emotional state and behaviour pattern; and a real-time adaptation means of the voice user interface to match the individual based upon the QC in ASR assessment of the voice user experience. A method of determining user satisfaction using automated speech recognition (ASR) systems is also disclosed.

Description

Method and Apparatus for Determining User Satisfaction with Automated Speech Recognition Systems Field of the Invention The present invention relates to a method of determining user satisfaction with automated speech recognition systems and an apparatus using the method. The invention is concerned with gauging user satisfaction with, and quality control (QC) of, Automated Speech Recognition Systems (ASR). The method and apparatus of the invention draw upon analysis of Speaker Emotion, historical user behaviour, and statistical methods in order to estimate the degree of user satisfaction with an ASR system (QC in ASR). The QC in ASR system operates within the Public Switch Telephone Network (PSTN) and integrates components that are based in telephony services, automated speech recognition, automated assessment of speaker emotion, and automated speaker behaviour profiling.
Background and Summary of the Invention Despite ever-increasing usage of the Web, companies still receive more than 70% of their orders through the appliance their customers prefer - the telephone. Thanks to widespread adoption of cellular telephones, voice remains the preferred means of doing business. There will be more than 2 Billion telephones by 2003, a figure that dwarfs the projected 200 Million Web-enabled computers. For the foreseeable future, voice will continue to be the dominant mode for exchanging business information.
Automated speech recognition (ASR) is a technology that allows a computer to recognize and interpret human speech, much as the computer would recognize a typed command. ASR has been around several decades, but only relatively recent improvements in software and computing power have made it a compelling business tool.
The key ASR benefits for callers are as follows:
- The ability to conduct transactions and receive information at any time over any phone: Customers do not require any special equipment or Internet access.
- A simple, user-friendly speech interface: Natural-language speech recognition flattens out the frustrating hierarchy or tree structure associated with touch-tone menus, making ASR a more pleasant, efficient and effective system.
- 24/7 availability: The customer is not limited to store or call-center hours.
For companies using ASR, key benefits include:
- Real-time integration with business systems: No re-keying of data associated with manual transaction and customer care processing.
- Reduced requirements for Customer Service Agents: Customers are never put on hold, and agents can focus on other tasks and more complex transactions.
- Lower administrative costs: ASR allows companies to bring in customers over an automated channel.
- Improved customer service levels: Implementing ASR will help eliminate hold times, and the system is available 2417.
The challenge with ASR is designing and developing applications that will emulate the natural human dialogue process. A good ASR application provides a natural and intuitive dialogue flow that allows the User to interact with the system without question or concern. Achieving this level of ASR application sophistication is very difficult in practice.
One of the most important elements missing from current ASR systems is the ability to not only understand what the user is saying but also the dialogue context, meaning, and manner in which the speech is conveyed. In order to fully assess the user satisfaction and interaction with the ASR application a great deal more information is required. The QC in ASR system meets this challenge by providing a full and robust system for gauging the User-ASR experience.
This is accomplished by not only recognising the speech but also by assessing the User emotional state and individual behavioural characteristics. This assessment is then used to adapt the Voice User Interface to better meet the needs of the individual user. Therefore, the ASR application can be tailored real-time to match the individual and thereby begin to emulate a much more human dialogue process.
There are several commercially available Quality Control and ASR
Monitor software based tools existing in the marketplace today. These tools draw upon data in ASR application logs in order to identify problem areas in the applications such as bottlenecks, poor dialogue flows etc. These tools assist in tuning the applications to smooth out or redefine dialogues that may be misunderstood or misleading. These tools are very rudimentary in scope and do not conduct any assessment of the Voice User experience.
There are currently no other similar methods or processes in place to assess the Voice User experience with an ASR in an automated fashion.
According to the present invention, the QC in ASR integrates the following areas of technology:
- Automated speech recognition - Automated analysis and assessment of speaker emotion based upon utterances made by the speaker during the use of an ASR system - Automated user behaviour profiling based upon ASR usage logs and historical user profile data - Decision matrix that takes as input all data regarding the User-ASR
experience, and determines a User satisfaction level, i.e., estimates the ease with which the User interacted and was satisfied with ASR system.
- Algorithms that will automatically and dynamically adapt the ASR voice intertace (i.e. the dialogue flow) to match the individual User needs based upon the outputs from the ASR level of satisfaction decision matrix.
- Statistical methods for aggregating User satisfaction and behavioural data Features of the invention and advantages associated therewith are as follows:
- The QC in ASR does not rely on a single data source but rather combines a number of unique methods to assess the User-ASR experience.
- Three distinct voice and speech components are analyzed (Context and Discourse Structure, Prosody, and Paralinguistic Features) in order to assess the User Emotional State - User Emotional State is assessed real-time, ie while the User-ASR
interaction is ongoing - The User behavioral pattern and history is stored, accessed, and updated, based upon each interaction of the User with the ASR. This allows the ASR to know in advance the User preferences and abilities and to tailor the Voice User Interface appropriately.
- The data from the User emotional assessment and behavioural pattern are used as inputs to a decision matrix in order to determine an assessment of the overall User experience and satisfaction level - The User emotional assessment and behavioural pattern data are used to dynamically adapt the ASR Voice User Interface to conform to the needs of the individual User - Statistical methods are employed in order to assess the effectiveness of the ASR Voice User Interface Further understanding of other features, aspects, advantages of the present invention will be realized by reference to the following description, appended claims, and accompanying drawings.
Brief Description of the Drawings Embodiments) of the present invention will be described with reference to the accompanying drawings, wherein:
Fig. 1 schematically illustrates the components of the QC in ASR system in accordance with the present invention;
Fig. 2 is a schematic presentation of the voice user emotion assessment in Fig. 1; and Fig. 3 is a schematic flow chart showing the QC in ASR process flow in accordance with the present invention.
Detailed Description of the Preferred Embodiments) 5 Fig. 1 schematically illustrates the components of the QC in ASR system in accordance with one embodiment of the invention.
Each component in Fig. 1 will be explained below.
1.0 Automated Speech Recognition Application This component can include any ASR based application implemented using contemporary speech recognition engines.
ASR applications have the ability to record the utterances made by the speaker. These utterances, which are saved in standard audio file formats such as .wav, .vox, etc, can then be used as inputs to the Voice User Emotion Assessment 3.0 component as shown in Fig. 1.
The ASR application also builds a log file for every session conducted with a user. The log file contains a great deal of information regarding the user session including such data as invalid responses, re-prompts, time-outs, gender, etc. The log file data are used as inputs to the Voice User Behavioural Assessment 5.0 component.
2.0 ASR Log Files and Utterances This component represents the data source of ASR Log files and User utterances. As discussed above, the utterance and log files contain the source data used by both the Voice User Emotion and Behavioural Assessment components 3.0 and 5Ø
3.0 Voice User Emotion Assessment This component takes as an input the User utterance file and processes the voice data in order to assess the User emotional state. Several distinct voice and speech components are analysed: For example, Context and Discourse Structure, Prosody, and Paralinguistic Features. This component is the most complex within the QC in ASR system and is described in further detail below. The Voice User Emotion Assessment data is updated with every User utterance and passed to the Voice User Level of Satisfaction Matrix.
4.0 Voice User Level of Satisfaction Matrix This component takes as an input the results of the Voice User Emotion Assessment component 3.0 and the Voice User Behavioural Assessment component 5Ø The decision matrix consists of a set of algorithms that determines an estimate of the overall User satisfaction level based upon the emotional and behavioural assessments. As the emotional and behavioural assessments are continually being updated throughout the course of the User-ASR interaction, the decision matrix is also continually updating the determination of the user satisfaction level.
The estimated User level of satisfaction is frequently updated and passed to the ASR Voice User Interface Adaptation Component 6Ø
5.0 Voice User Behavioural Assessment This component takes as an input the ASR log files and processes the contained data in order to assess the Voice User Behavioural pattern. The behavioural pattern describes the manner in which the User is able to interact with and navigate the ASR. For example, a novice User that is unfamiliar with the dialogue flow or a User that has demonstrated difficulty in using the ASR
require a more directed and robust dialogue. Experienced users who have demonstrated that they can move quickly through the ASR require a more terse and brief dialogue flow. Therefore, the User behavioural pattern is built over a period of time based upon each interaction of the User with the ASR.
Each time the User accesses the ASR, the behavioural pattern is created and/or updated as appropriate to reflect the User capabilities and, thereby, reflect the Users individual needs.

The Voice User Behavioural Assessment data is updated, as Log File data is available, and then passed to the Voice User Level of Satisfaction Matrix.
6.0 ASR Voice User Interface Adaptation Component This component takes as an input the User Level of Satisfaction data from component 4Ø Based upon the determined level of satisfaction the Voice User Interface within the ASR is updated dynamically to meet the immediate User needs. In this manner the real-time determination of the User-ASR
interaction experience is determined and acted upon in order to conform and tailor the ASR Voice User Interface to meet the individual and immediate User requirements.
7.0 Voice User Historical Behavioural Data Component ~ This component represents the data source for User behavioural data. A
database record is created for an individual the first time they access the ASR.
The record contains information regarding the individual's interaction with the ASR and reflects their level of satisfaction and ease of use during each interaction. Each successive time the User accesses the ASR the historical profile is queried in order to tailor the Voice User interface to meet the individual needs. Upon termination of the User-ASR interaction, the behavioural profile is amended as required.
The Voice User Emotion Assessment Component 3.0 will be described below in greater detail, in conjunction with Fig. 2.
As noted above, this component 3.0 is very sophisticated within the QC in ASR system. The purpose of the component is to process the User spoken utterance files with the objective of attempting to determine the speaker emotional state. The results can be sufi'icient to indicate if the User has had a "negative" experience as opposed to a "positive" one.

To achieve this objective there are many features and characteristics of the human voice which can be derived and analysed in order to determine an assessment of the User emotional state. The distinct voice and speech components that are analysed are, for example, Context and Discourse Structure 3.1, Prosody 3.2, and Paralinguistic Features 3.3 as illustrated in Fig.
2.
Each components of the Voice User Emotion Assessment is further detailed as follows:
3.1 Context and Discourse Structure Context and Discourse Structure give consideration to the overall meaning of a sequence of words rather than looking at specific words in isolation. Different words can mean different things depending upon the context in which they are spoken. Therefore, one has to consider the overall discourse and structure of the dialogue flow in order to fully assess the meaning and emotion contained therein.
Techniques used to derive context and structure will consider the rise and fall of voice intonation and computing the probability of a certain word based upon the previous words that have been spoken.
3.2 Prosody Prosodic features of voice are reflected in vocal effects such as variations in pitch, volume, duration, and tempo among others. Of the three voice components, prosody in voice holds the greatest potential for determination of conveyed emotion. Prosodic features are extracted from a voice sample through digital signal processing techniques. The prosodic features are determined and then analysed in order to attempt to classify the user emotion.
Often several voice samples are required in order to derive an emotional state.
3.3 Paralinguistic Features Paralinguistic features or voice are separated into two types of classifications. The first is voice quality that reflects different voice modes such as whisper, falsetto, and huskiness, among others. The second is voice qualifications that include non-verbal cues such as laugh, cry, tremor, and fitter.
As with prosody, these voice features can be extracted through digital signal processing techniques. Paralinguistic features are then analysed in order to attempt to classify the user emotion.
Fig. 3 is a schematic flow chart showing the QC in ASR process flow in accordance with the present invention.
According to the present invention, the QC in ASR process flow is as follows:
1. User calls the ASR application.
2. The ASR, through standard means such as account number, password/PIN, voice biometric characteristics etc, identifies the caller.
3. The User behavioural profile is retrieved from the User Behavioural database and the ASR Voice User Interface is initially configured based upon the User profile. If the User is accessing the ASR for the first time then a new User Behavioural database record is created and a default Voice User Interface is configured.
4. The ASR interacts with the User and, each time a User response is provided, an utterance file is recorded and a Log File entry is made.
5. The Voice User Emotional Assessment component processes the utterance files and the Voice User Behavioural Assessment component processes the log files.
6. Step 5 is iterative and will be repeated each time a Voice User response is provided.
7. The User Emotional and Behavioural Assessment data are passed to the Voice User Level of Satisfaction Decision Matrix. The data are processed in order to determine the immediate user level of satisfaction.
8. The User level of satisfaction data are passed to the ASR Voice User Adaptation Component. Based on the user satisfaction level, the Voice User Interface can be immediately tailored to match the requirements of the User at that specific time.
9. Upon completion of the User-ASR interaction, the User Behavioural data record is updated.
While the present invention has been described with reference to several specific embodiments, the description is illustrative of the invention and is not to be construed as limiting the invention. Various modifications and variations may occur to those skilled in the art without departing from the true spirit and scope of the invention as defined by the appended claims.

Claims (2)

1. An apparatus for determining user satisfaction using automated speech recognition (ASR) systems, the apparatus comprising:
(a) means for assessing the voice user emotional state based upon the voice characteristics;
(b) means for assessing the voice user behavioural pattern based upon the current, and previous, interactions with the ASR application;
(c) means for decision-modelling the overall voice user experience based upon the emotional state and behaviour pattern; and (d) a real-time adaptation means of the voice user interface to match the individual based upon the QC in ASR assessment of the voice user experience.
2. An apparatus according to claim 1, further comprising a database storage of historical voice user behavioural data, and decision modelling algorithms employed to assess and weigh the data elements from the emotional state and behavioural pattern assessments in order to achieve an overall determination regarding the voice user experience.
CA002375589A 2002-03-08 2002-03-08 Method and apparatus for determining user satisfaction with automated speech recognition (asr) system and quality control of the asr system Abandoned CA2375589A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA002375589A CA2375589A1 (en) 2002-03-08 2002-03-08 Method and apparatus for determining user satisfaction with automated speech recognition (asr) system and quality control of the asr system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA002375589A CA2375589A1 (en) 2002-03-08 2002-03-08 Method and apparatus for determining user satisfaction with automated speech recognition (asr) system and quality control of the asr system

Publications (1)

Publication Number Publication Date
CA2375589A1 true CA2375589A1 (en) 2003-09-08

Family

ID=27810538

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002375589A Abandoned CA2375589A1 (en) 2002-03-08 2002-03-08 Method and apparatus for determining user satisfaction with automated speech recognition (asr) system and quality control of the asr system

Country Status (1)

Country Link
CA (1) CA2375589A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1659572A1 (en) * 2004-11-18 2006-05-24 Deutsche Telekom AG Dialogue control method and system operating according thereto
CN105096943A (en) * 2014-04-24 2015-11-25 杭州华为企业通信技术有限公司 Signal processing method and device
CN113808621A (en) * 2021-09-13 2021-12-17 地平线(上海)人工智能技术有限公司 Method and device for marking voice conversation in man-machine interaction, equipment and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1659572A1 (en) * 2004-11-18 2006-05-24 Deutsche Telekom AG Dialogue control method and system operating according thereto
CN105096943A (en) * 2014-04-24 2015-11-25 杭州华为企业通信技术有限公司 Signal processing method and device
CN105096943B (en) * 2014-04-24 2019-04-19 杭州华为企业通信技术有限公司 The method and apparatus of signal processing
CN113808621A (en) * 2021-09-13 2021-12-17 地平线(上海)人工智能技术有限公司 Method and device for marking voice conversation in man-machine interaction, equipment and medium

Similar Documents

Publication Publication Date Title
US11380327B2 (en) Speech communication system and method with human-machine coordination
US20190253558A1 (en) System and method to automatically monitor service level agreement compliance in call centers
US10623573B2 (en) Personalized support routing based on paralinguistic information
US7184539B2 (en) Automated call center transcription services
US8626509B2 (en) Determining one or more topics of a conversation using a domain specific model
US7013005B2 (en) System and method for prioritizing contacts
US7606714B2 (en) Natural language classification within an automated response system
US7609829B2 (en) Multi-platform capable inference engine and universal grammar language adapter for intelligent voice application execution
US7398212B2 (en) System and method for quality of service management with a call handling system
US7542902B2 (en) Information provision for call centres
US8687792B2 (en) System and method for dialog management within a call handling system
US20110106527A1 (en) Method and Apparatus for Adapting a Voice Extensible Markup Language-enabled Voice System for Natural Speech Recognition and System Response
CA2530867A1 (en) Behavioral adaptation engine for discerning behavioral characteristics of callers interacting with an vxml-compliant voice application
US20220101830A1 (en) Improving speech recognition transcriptions
US8189762B2 (en) System and method for interactive voice response enhanced out-calling
US6675143B1 (en) Automatic language identification
Soujanya et al. Personalized IVR system in contact center
WO2002089112A1 (en) Adaptive learning of language models for speech recognition
US7885391B2 (en) System and method for call center dialog management
CA2375589A1 (en) Method and apparatus for determining user satisfaction with automated speech recognition (asr) system and quality control of the asr system
Natarajan et al. Speech-enabled natural language call routing: BBN Call Director
KR102370437B1 (en) Virtual Counseling System and counseling method using the same
CN114155845A (en) Service determination method and device, electronic equipment and storage medium
CN114328867A (en) Intelligent interruption method and device in man-machine conversation
Gallardo et al. On the impact of voice encoding and transmission on the predictions of speaker warmth and attractiveness

Legal Events

Date Code Title Description
EEER Examination request
FZDE Dead