EP0991990A1

EP0991990A1 - Access-controlled computer system with automatic speech recognition

Info

Publication number: EP0991990A1
Application number: EP98930984A
Authority: EP
Inventors: Dirk Van Compernolle; Scott Garlick
Original assignee: Lernout and Hauspie Speech Products NV
Current assignee: Lernout and Hauspie Speech Products NV
Priority date: 1997-06-27
Filing date: 1998-06-25
Publication date: 2000-04-12
Also published as: CA2288183A1; WO1999000719A1; AU8125198A; JP2002507298A

Abstract

An access control system for a computer system uses text inputs derived from automatic recognition of spoken inputs to provide computing access to authorized users. A plurality of speech-enabled terminals each receive speech from a user. An automatic speech recognizer derives text from speech provided by the specific user at a given one of the speech-enabled terminals. An identification comparator compares a non-keyboard user identification provided by a specific user with data stored in a user identification data base to determine if the specific user is an authorized user. A profile loader is provided to load a user-specific profile pertinent to the specific user into the automatic speech recognizer; and, finally, a system access provider.

Description

ACCESS-CONTROLLED COMPUTER SYSTEM WITH AUTOMATIC SPEECH RECOGNITION

Technical Field The present invention relates to access-controlled computer systems, and more particularly to such systems with automatic speech recognition.

Summary of the Invention

In accordance with a preferred embodiment of the present invention, there is provided a computer system for operation based on text inputs derived from automatic recognition of speech of a plurality of users providing spoken inputs. In this embodiment the system has a plurality of speech - enabled terminals, each terminal for receiving speech from a user. An identification comparator compares a non-keyboard user identification provided by a specific user with data stored in a user identification data base to determine if the specific user is an authorized user. Additionally, an automatic speech recognizer derives text from speech provided by the specific user at a given one of the speech-enabled terminals. A profile loader is provided to load a user-specific profile pertinent to the specific user into the automatic speech recognizer; and, finally, a system access provider provides computing access to the specific user if determined to be authorized by the identification comparator. In further embodiments, the non-keyboard user identification is spoken by the specific user or alternatively contained in a digitally encoded card presented by the specific user. When the user identification is spoken, the user identification database may contain voice prints of authorized users, and the identification comparator compares a voice print derived from the spoken user identification with voice prints in the user identification database. Alternatively, or in addition, the user identification database may contain passwords of authorized users and the spoken user identification is a password; the user identification database may optionally store the passwords as text derived from utterances using automatic speech recognition.

In a further embodiment, each speech-enabled terminal is coupled to a telephone line permitting a user to access the computer system by telephone. Alternatively, or in addition, each speech-enabled terminal is associated with a separate local processor and a local automatic speech recognizer.

Also in a further embodiment, the system may additionally include a user verification comparator for comparing a user verification provided by the specific user with data stored in a user verification data base to verify the identity of the specific user as an authorized user. In this embodiment, the system access provider provides computing access to the specific user only if the identity of the specific user has been verified as an authorized user by the verification comparator. The user verification may be spoken by the specific user, or alternatively may, for example, be entered by means of a keyboard or by other non-keyboard means . In accordance with another embodiment of the present invention, there is provided a method for providing access to a computer system that has an automatic speech recognizer and operates based on text inputs derived from automatic recognition of speech from a plurality of users provided via a plurality of speech-enabled terminals. In this embodiment, the access is provided to a specific user at a given one of the speech-enabled terminals. The method includes the following steps: a. receiving a non-keyboard user identification provided by the specific user; b. comparing the user identification with data stored in a user identification data base to determine whether the specific user is an authorized user; and c. if the specific user has been determined to be an authorized user, i. loading a user-specific profile into the automatic speech recognizer that is pertinent to the specific user; and ii. providing access to the system to the specific user via the given speech-enabled terminal. Further and related embodiments include the steps of receiving a user verification provided by the specific user and comparing the user verification with data stored in a user verification data base to verify the identity of the specific user as an authorized user. In this embodiment, the step of providing access to the system to the specific user is performed only if the identity of the specific user has been verified as an authorized user. The user identification and the user verification may be implemented as discussed above.

In another embodiment of the present invention, there is provided a method for providing access by a plurality of users to a computer system having automatic recognition of speech provided at a plurality of speech- enabled terminals. The method includes the steps of a. generating a prompt for a spoken user identification over one of the plurality of speech-enabled terminals; b. comparing the spoken user identification with data stored in a user identification data base to determine if the user is an authorized user; c. providing an automatic speech recognizer for deriving text from speech provided at a speech-enabled terminal; d. loading user-specific profiles pertinent to a user who has been determined to be authorized into the automatic speech recognizer; and e. providing access to the system to a user who has been determined to be authorized.

Brief Description of the Drawings The foregoing aspects of the invention will be more readily understood by reference to the following detailed description taken with the accompanying drawings in which:

Fig. 1 is a block diagram of a system to which the present invention is applicable; Fig. 2 is a block diagram of a method in accordance with a preferred embodiment of the invention;

Fig. 3 is a block diagram of a method, similar to that of Fig. 2, but also providing user verification; and

Fig. 4 is a block diagram of a preferred embodiment of a system in accordance with a preferred embodiment of the invention corresponding generally to the method of Fig. 3.

Detailed Description of Specific Embodiments Fig. 1 is a block diagram of a system to which the present invention is applicable. A computer system 11 has a plurality of speech-enabled terminals 131 , 132, ... 133 to accommodate up to an arbitrary number n of users. The computer system includes an automatic speech recognizer 12. The speech recognizer may be hard wired or it may be implemented as a process running in the computer system 1 1. The computer system may include a general purpose computer in which the process is defined by a computer program loaded into the computer. Each of the speech-enabled terminals 131 , 132, ... 133 may be associated with a separate local processor; in such a configuration, each local processor may be associated with a local automatic speech recognizer. The local automatic speech recognizer may, but need not necessarily, be implemented as a process running in the local processor. Each of the speech-enabled terminals 131 , 132, ... 133 may also (or alternatively) be coupled to a telephone line permitting a user to access the computer system by telephone.

Fig. 2 is a block diagram of a method, in accordance with a preferred embodiment of the invention, for providing access by a plurality of users to a computer system having automatic recognition of speech provided at a plurality of speech-enabled terminals. In accordance with step 21 , a non- keyboard user identification is provided over one of the speech-enabled terminals 131, 132, ... 133 of Fig. 1. The user identification may, for example, be spoken via microphone or be contained in a digitally encoded card presented by the user to a card reader. In accordance with step 22, the user identification is compared with data stored in a user identification data base to determine if the user is an authorized user. In one embodiment, the user identification database contains voice prints of authorized users, and in step 22, the comparison is between a voice print derived from the spoken user identification with voice prints in the user identification database. Alternatively, or in addition, the user identification database contains passwords of authorized users and the spoken user identification is a password; in such a case, the spoken input may be processed initially by a speech recognizer associated with the system. If in step 23, the comparison shows that there is not a match, and if in step 24 the match has not failed three times, the method permits the receipt of a new user identification in step 21 , and the comparison of step 22 is reinitiated. If in step 24 the match has failed three times, the method terminates in step 27. If there is a match, then in step 25, a user-specific profile is loaded for automatic speech recognition of the user who has been determined to be authorized. In step 26, access to the system is provided to the user who has been determined to be authorized. It will be appreciated that the level of access to the system may be controlled on a user-by-user basis in accordance with standard techniques known in the art for computer networks; thus the level of access by a user to the system may be subject to control, and the user will be given the level of access for which the system has been configured to provide to such user.

Fig. 3 is a block diagram of a method, similar to that of Fig. 2, but also providing user verification. Steps 31 , 32, 33, 34, 35, 36 and 37 of Fig. 3 are comparable to steps 21 , 22, 23, 24, 25, 26, and 27 of Fig. 2 previously discussed. However, if a user identification has been matched as a result of the comparison step 32, the method next proceeds with a user verification procedure. In step 381 , the user verification is received. Such a verification may be spoken, or alternatively may be entered by keyboard, or may be provided by other non-keyboard arrangements. In step 382, the user verification is compared with data in a user verification data base. If there is a match in step 383, then the user-specific profile for automatic speech recognition is loaded into the recognizer (step 35), and computing access is provided to the user at the level for which the user is authorized (step 36). If there is no match, in step 384, the method tests to determine whether there have been three consecutive failed matches, and if so, the method terminates; otherwise processing is repeated back at step 31.

Fig. 4 is a block diagram of a preferred embodiment of a system in accordance with a preferred embodiment of the invention corresponding generally to the method of Fig. 3. The system includes speech-enabled terminals 41 from which may be received a spoken user identification. An automatic speech recognizer 46 has two modes of operation. In a first mode, the recognizer can operate in speaker-independent fashion to recognize a spoken user identification, which is fed to the identification comparator 42, and a user verification, which is fed to verification comparator 43. In a second mode, the recognizer can operate in speaker-dependent fashion, utilizing additional information in the form of a user-specific profile for a user that has been identified by the identification comparator 42. (Alternatively, an applicable one of the user-specific profiles can be loaded after an unverified user identification has been made by the identification comparator 42, and the recognizer can operate in a speaker-dependent mode when feeding text to the verification comparator.) The comparators 42 and 43 may in fact be implemented in an integrated arrangement. For example, a user speaking his name (or a password uniquely identifying him) can provide a basis for both identification as well as verification using speaker-dependent templates. Further material on speech recognizer design may be found, for example, in Rabiner and Juang, Fundamentals of Speech Recognition, 1993, which is hereby incorporated herein by reference. In operation, a user identification is received by the identification comparator 42. As described above in connection with previous figures, the identification can take many forms, including speech (which is here converted to text by the recognizer 46 to furnish an input to the comparator 42) or a digitally encoded card that is read by a card reader. A user identification that has been received by the identification comparator is stored in storage region 421. Thereafter identification comparator 42 compares the user identification that is stored in storage region 421 with user identification data in the identification data base that is stored in storage region 422. If the comparator 42 makes a successful match, then operation of the verification comparator 43 is triggered. Here the verification comparator receives a text input from the recognizer 46, and the input is stored in region 431 and compared with data in the verification data base that is stored in region 432. If the verification comparator 43 determines that a match exists, then the profile loader 44 is caused to access and load one of the user-specific profiles (corresponding to the user who was matched by the comparators 42 and 43) stored in region 441 into the recognizer 46. In this fashion there is provided enhanced recognition of speech of the user who has been determined to be authorized. Finally, system access provider 45 is caused to provide access to the system to a user who has been determined to be authorized in accordance with the user authorization levels stored in region 451. It is contemplated that the identification comparator 42, the verification comparator 43, the automatic speech recognition user-specific profile loader 44, and the system access provider 45 may (but need not necessarily) be implemented as processes running in a general purpose computer. Indeed, the processes need not be running on the same computer. For example, some or all of the system access provider 45 may be implemented on a server handling system access for all users of the system. The present invention is also applicable to any speech recognition system, capable of speaker adaptation, that has more than one user. Each user of such a system has a potentially unique user-specific profile resulting from speaker adaptation. In a multi-user environment, such a profile is subject to the risk of undesirable modification as a result of adaptation to the speech of a person other than the user. An embodiment of the present invention prevents undesirable modification of a user-specific profile for speech recognition by requiring user identification before at least one of modifying a user-specific profile (for example, at the beginning of an adaptation session) or saving a modification of a user-specific profile (for example, at the end of an adaptation session).

As shown in Fig. 4, such an embodiment has a speech profile adaptor 442 in communication with the automatic speech recognizer 46, for modifying the user-specific profiles 441. An adaptation controller 443 responsive to the identification comparator 42 controls the speech profile adaptor 442. When an authorized user has been identified by the identification comparator 42, the adaptation controller 443 enables operation of the speech profile adaptor 442 to permit updating of the user-specific profile. In a further embodiment, both user and identification and verification are required to enable the speech profile adaptor 442. The identification may be by the keyboard, or, as in the embodiments described previously, the identification may be a non-keyboard identification, as by speech or digitally encoded card.

The embodiments previously discussed for both identification and verification assume explicit actions taken by the user. More specifically, in the case of speech, the access control system uses voice print matching of user provided speech. In practice, such voice print matching can employ either text dependent verification, or text independent verification. Text dependent verification is the use of passwords and associated voice prints already described. Text independent verification operates with freely selected user speech input during normal use of the automatic speech recognition system. In a preferred embodiment, this implicit verification is implemented as a safeguard mechanism in combination with the explicit text dependent password and voice print technique.

With respect to Fig. 4, the text independent verification (i.e., implicit verification) occurs in the background during system use as spoken text input progresses. Thus, after the user access system has recognized an authorized user and enabled system access as previously described, the identification comparator 42 and verification comparator 43 continue to monitor the operation of the automatic speech recognizer 46. As the recognizer 46 recognizes input speech as text within its recognition vocabulary, the identification comparator 42 and/or verification comparator 43 compares the incoming speech signal associated with the recognized text to stored user identification data 422 and/or stored user verification data 432. If the characteristics of the incoming speech fail to match the stored characteristics for the recognized user within an acceptability threshold, the identification comparator 42 and/or verification comparator 43 then disables the system access provider 45. In addition (or as a consequence of disabling the system access provider 45), the speech profile adaptor 442 is prevented from modifying the user-specific speech profiles 441. Then, the automatic speech recognizer 46 operates only to perform the user verification and access method previously described.

This embodiment is especially useful in a situation where an authorized user has accessed the system and momentarily walks away. Text independent verification running in the application background prevents another, possibly unfriendly, user from invoking or altering the authorized user's speech profile. This effectively prevents use of the system by such an unauthorized person.

Claims

What is claimed is:

1. A computer system, for operation based on text inputs derived from automatic recognition of speech of a plurality of users providing spoken inputs, the system comprising: a. a plurality of speech-enabled terminals, each having an input for receiving speech from a user; b. an identification comparator for comparing a non-keyboard user identification provided by a specific user with data stored in a user identification data base to determine if the specific user is an authorized user; c. an automatic speech recognizer for deriving text from speech provided by the specific user at a given one of the speech-enabled terminals; d. a profile loader for loading a user-specific profile pertinent to the specific user into the automatic speech recognizer; and e. a system access provider for providing computing access, via the automatic speech recognizer, to the specific user if determined to be authorized by the identification comparator.

2. A computer system according to claim 1 , wherein the non- keyboard user identification is spoken by the specific user.

3. A computer system according to claim 1 , wherein the non- keyboard user identification is contained in a digitally encoded card presented by the specific user.

4. A computer system according to claim 2, wherein the user identification database contains voice prints of authorized users, and the identification comparator compares a voice print derived from the spoken user identification with voice prints in the user identification database.

5. A computer system according to claim 2, wherein the user identification database contains passwords of authorized users and the spoken user identification is a password.

6. A computer system according to claim 5, wherein the user identification database stores the passwords as text derived from utterances using automatic speech recognition.

7. A computer system according to claim 1 , further comprising a user verification comparator for comparing a user verification provided by the specific user with data stored in a user verification data base to verify the identity of the specific user as an authorized user, wherein the system access provider provides computing access to the specific user only if the identity of the specific user has been verified as an authorized user by the verification comparator.

8. A computer system according to claim 2, further comprising a user verification comparator for comparing a user verification provided by the specific user with data stored in a user verification data base to verify the identity of the specific user as an authorized user, wherein the system access provider provides computing access to the specific user only if the identity of the specific user has been verified as an authorized user by the verification comparator.

9. A computer system according to claim 7, wherein the user verification is spoken by the specific user.

10. A computer system according to claim 1 , wherein each speech- enabled terminal is coupled to a telephone line permitting a user to access the computer system by telephone.

11. A computer system according to claim 1 , wherein each speech- enabled terminal is associated with a separate local processor and a local automatic speech recognizer.

12. A computer system according to claim 1 1 , wherein each speech - enabled terminal is coupled to a telephone line permitting a user to access the computer system by telephone.

13. A computer system according to claim 1 , wherein: the identification comparator further includes a text-independent comparator coupled to the automatic speech recognizer and to the system access provider for determining whether characteristics of the speech provided by the specific user fail to match characteristics of the data stored in the user identification database for the specific user within an acceptability threshold, and in the event of a determination of acceptable match, causing the system access provider to preclude computing access to the specific user.

14. A computer system according to claim 7, wherein: the user verification comparator further includes a text-independent comparator coupled to the automatic speech recognizer and to the system access provider for determining whether characteristics of the speech provided by the specific user fail to match characteristics of the data stored in the user identification database for the specific user within an acceptability threshold, and in the event of a determination of acceptable match, causing the system access provider to preclude computing access to the specific user.

15. A method for providing access to a computer system that has an automatic speech recognizer and operates based on text inputs derived from automatic recognition of speech from a plurality of users provided via a plurality of speech-enabled terminals, the access provided to a specific user at a given one of the speech-enabled terminals, the method comprising: a. receiving a non-keyboard user identification provided by the specific user; b. comparing the user identification with data stored in a user identification data base to determine whether the specific user is an authorized user; and c. if the specific user has been determined to be an authorized user, i. loading a user-specific profile into the automatic speech recognizer that is pertinent to the specific user; and ii. providing access to the system to the specific user via the given speech-enabled terminal.

16. A method according to claim 15, further comprising before substep (ii) of step (c), i. receiving a user verification provided by the specific user and ii. comparing the user verification with data stored in a user verification data base to verify the identity of the specific user as an authorized user; and wherein substep (ii) of step (c) is performed only if the identity of the specific user has been verified as an authorized user.

17. A method according to claim 15, wherein the non-keyboard user identification is spoken by the specific user.

18. A method according to claim 16, wherein the non-keyboard user identification is spoken by the specific user.

19. A method according to claim 15, wherein the non-keyboard user identification is contained in a digitally encoded card presented by the specific user.

20. A method according to claim 16, wherein the non-keyboard user identification is contained in a digitally encoded card presented by the specific user.

21. A method according to claim 16, wherein the user verification is spoken by the specific user.

22. A method according to claim 15, further comprising: d. monitoring operation of the automatic speech recognizer; and e. if characteristics of the speech provided by the specific user fail to match characteristics of the data stored in the user identification database for the specific user within an acceptability threshold, disabling the automatic speech recognizer.

23. A computer system, for operation based on text inputs derived from automatic recognition of speech of a plurality of users providing spoken inputs, the system comprising: a. a speech-enabled terminal, having an input for receiving speech from a user; b. an identification comparator for comparing a non-keyboard user identification provided by a specific user with data stored in a user identification data base to determine if the specific user is an authorized user; c. an automatic speech recognizer for deriving text from speech provided by the specific user at the speech-enabled terminal; d. a speaker adaptor for modifying a user-specific profile, pertinent to the specific user, that is used with the automatic speech recognizer; and e. an adaptation inhibitor for preventing at least one of modifying the user-specific profile or saving a modification of the user-specific profile unless the specific user is determined to be authorized by the identification comparator.

24. A computer system according to claim 23, wherein the non- keyboard user identification is spoken by the specific user.

25. A computer system according to claim 23, wherein the non- keyboard user identification is contained in a digitally encoded card presented by the specific user.

26. A computer system according to claim 24, wherein the user identification database contains voice prints of authorized users, and the identification comparator compares a voice print derived from the spoken user identification with voice prints in the user identification database.

27. A computer system according to claim 24, wherein the user identification database contains passwords of authorized users and the spoken user identification is a password.

28. A computer system according to claim 27, wherein the user identification database stores the passwords as text derived from utterances using automatic speech recognition.

29. A computer system according to claim 23, further comprising a user verification comparator for comparing a user verification provided by the specific user with data stored in a user verification data base to verify the identity of the specific user as an authorized user, wherein the adaptation inhibitor prevents at least one of modifying the user-specific profile or saving a modification of the user-specific profile unless the identity of the specific user has been verified as an authorized user by the verification comparator.

30. A computer system according to claim 23, wherein: the identification comparator further includes a text-independent comparator coupled to the automatic speech recognizer and to the system access provider for determining whether characteristics of the speech provided by the specific user fail to match characteristics of the data stored in the user identification database for the specific user within an acceptability threshold, and in the event of a determination of acceptable match, performing at least one of causing the system access provider to preclude computing access to the specific user and preventing the speaker adaptor from modifying the user- specific profile .

31. A computer system according to claim 29, wherein: the user verification comparator further includes a text-independent comparator coupled to the automatic speech recognizer and to the system access provider for determining whether characteristics of the speech provided by the specific user fail to match characteristics of the data stored in the user identification database for the specific user within an acceptability threshold, and in the event of a determination of acceptable match, performing at least one of causing the system access provider to preclude computing access to the specific user and preventing the speaker adaptor from modifying the user- specific profile .

32. A method, for providing access by a plurality of users to a computer system having automatic recognition of speech provided at a plurality of spoken inputs, the method comprising: a. generating a prompt for a spoken user identification over one of the plurality of spoken inputs; b. comparing the spoken user identification with data stored in a user identification data base to determine if the user is an authorized user; c. providing an automatic speech recognizer for deriving text from speech provided at a speech-enabled terminal; d. loading user-specific profiles pertinent to a user who has been determined to be authorized into the automatic speech recognizer; and e. providing access to the system to a user who has been determined to be authorized.