EP1147513A1

EP1147513A1 - Security and user convenience through voice commands

Info

Publication number: EP1147513A1
Application number: EP99967596A
Authority: EP
Inventors: John Schier
Original assignee: Alcatel USA Sourcing Inc
Current assignee: Alcatel USA Sourcing Inc
Priority date: 1998-12-29
Filing date: 1999-12-27
Publication date: 2001-10-24
Also published as: AU2385700A; WO2000039789A1

Abstract

A system of providing access identifies a purported authorized user from inherent information supplied by the user at the time of access. The user simply issues a verbal request (52), which is compared with previously stored templates to both identify the desired command (54) using speech recognition techniques and to confirm the identity of the user (56) using speaker recognition techniques.

Description

SECURITY AND USER CONVENIENCE THROUGH VOICE COMMANDS

BACKGROUND OF THE INVENTION

1. TECHNICAL FIELD This invention relates in general to security systems and, more particularly, to using voice commands to improve security and user convenience.

2. DESCRIPTION OF THE RELATED ART

Security is a major issue facing many businesses today, in one form or another. In many cases, security of the physical plant is of utmost importance. To protect the physical plant, many different types of locks may be employed. The most common lock, of course, is a key based lock. Keys are easily lost, stolen and copied and, therefore, do not provide a complete solution. Similarly, electronic locks which open responsive to a magnetic strip or token provide the advantage that the locks can be easily reprogra med in the case where a card is stolen. Other types of electronic locks use a keypad in which the user enters a unique combination. These are also easily reprogrammed. In all such cases, however, unauthorized users can generally obtain entry quite easily, either through theft, observation, or guesswork.

Just as important is the matter of information security. Many businesses now store almost all company data on computers. The sensitive data which may be stored on a computer includes technical data, business data and personnel data.

Much of this data is accessible to select groups of employees. To access the data, an employee may enter an employee number which is public and a password which is private. Once verification is complete, the user is allowed to view the data. This method provides several shortcomings. First, if the user is not particular about logging off prior to leaving for any reason, others could gain access to modify the data. Second, passwords can be gained through observation or guesswork by others. Third, for infrequently used passwords are often forgotten, thus creating administration difficulties.

To enhance security, many companies now use speaker verification techniques. Using speaker verification, a previously stored "voice print" provided by an authorized user at the time of enrollment is compared with a voice print obtained at the time access is requested. A correlation between the stored voice print and the currently spoken voice print provides a score indicating similarities between the speech patterns of the authorized user and the purported authorized user. If the score exceeds a predetermined threshold, it is assumed that the purported authorized user is, in fact, the authorized user. Tests have shown that speaker verification is very accurate. However, current methods of using speaker verification, described hereinbelow, have rendered it less than convenient in many situations.

Other means of verification, such as electronic finger print analysis or retinal scan analysis have also been employed for highly secure access. While the cost of such sophisticated verification apparatus may be justified for a single point of access (such as a main door to an office complex), these methods cannot be used in a cost effective manner where there are many points of access, such as for each computer on a network or each telephone in a system.

Therefore, a need has arisen for a method and apparatus for providing security, without inconveniencing the user. BRIEF SUMMARY OF THE INVENTION

In the present invention, a set of one or more authorized users is identified by inherent criteria in response to an action by a purported authorized user. A verbal utterance is received from the purported authorized user indicating a desired action and speaker verification is performed on the verbal utterance to confirm the identity of the purported authorized user as one of the set of authorized users. Speech recognition is also performed on the verbal utterance to identify the desired action. If the purported authorized user is identified as one of the set of authorized users, the action is performed.

The present invention provides significant advantages over the prior art.

First, the invention reduces the inconvenience to the user by eliminating the step of explicit identification as provided by the prior art. Second, the identity confirmation and identification of the desired action are combined into a single step. Hence, the system is more convenient for users, without sacrificing security or efficiency.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

Figure 1 illustrates a flow chart describing a prior art system for speaker verification and voice activated commands;

Figure 2 illustrates a block diagram showing a preferred embodiment of the present invention;

Figure 3 illustrates a first method of operation using the system of Figure 2; and

Figure 4 illustrates a second method of operation using the system of Figure 2.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is best understood in relation to Figures 1-4 of the drawings, like numerals being used for like elements of the various drawings.

Figure 1 illustrates a prior art system used by a long distance telephone company for verifying authorization to use long distance services. In block 10, the caller explicitly speaks an identifying code into the phone, typically an account number. The account number is decoded using speaker independent voice recognition (since the caller is not yet known) in block 12, using well known techniques. If the decoded information relates to a valid user (i.e., if the decoded account number is a valid account number) in decision block 14, the spoken identification is also compared to verification information (a voice print of the authorized user) previously stored in association with the account number (or other unique identification code) to verify that the speaker is, in fact, the person authorized to charge to the account number in block 16. Otherwise, if the decoded information in block 12 did not identify an authorized user, the caller is returned to block 10 and asked to repeat the identification code (the system may terminate after a predetermined number of failed attempts to log on).

The speaker verification process of block 16 will render a score determining whether there is a high probability that the purported authorized user is in fact the claimed authorized user. If, in decision block 18, the system determines that the speaker is the person associated with the identification code of block 12, the caller is allowed to use the system to issue commands to use a service in block 20. In the case of a long distance telephone company, the caller could make the verbal command "call John Doe." This verbal command is identified using speaker dependent (sometimes in conjunction with speaker independent) voice recognition in block 22. The command would then be executed in block 24. In the present example, the system would identify the command "call" (possibly using speaker independent voice recognition techniques) as a request for a telephone connection and would compare the utterance "John Doe" with a number of templates stored for the identified user (using speaker dependent voice recognition techniques). When the best match is found, the system will typically confirm the action by providing audio asking the user "would you like to call John Doe?" where the phrase "John Doe" is a recording made in the identified speaker's own voice. If the user says yes, the telephone number associated with the template is used to make a connection in block 24.

A system of the type described above can be frustrating for users since it requires that the user memorize an identification code and because it uses multiple steps to verify the authority of the user prior to allowing access to services.

Figure 2 illustrates a block diagram of a system which provides a more efficient manner of verifying authority and responding to access to services. This system could be used in a variety of services such as physical access (door locks), electronic services (such as voice activated dialing) or access to data. The system 30 includes a microphone 32 for receiving utterances from a user 34. The microphone 32 could be, for example, the receiver of a telephone handset, a microphone connected to a computer, or a microphone in the housing of a locking mechanism. The microphone 32 is connected to gatekeeper 36, which includes speaker recognition logic and speaker verification logic, both of which can be implemented using well known available techniques. The speech recognition and speech verification processing can be either remote or local depending upon the needs or the application. Gatekeeper 36 is coupled to data bank 38 which stores information on a plurality of "verbal icons" 40. Each authorized user may have information relating to one or more verbal icons stored in memory bank 38. Gatekeeper 36 also outputs service commands, to other subsystems; these service commands would vary upon the environment in which system 30 is used. For example, for use in connection with a lock, the gatekeeper 36 may issue a command to open or close the lock. In a telephone system, the gatekeeper may issue a command to provide a connection to a specified telephone number. In a computer system or network, the gatekeeper 36 may issue a command to provide access to a database, or to allow the user to execute a program.

Each verbal icon 40 contains the information necessary to perform speech recognition, speech verification and to initiate a service. For example, when a user enrolls a new verbal icon such as "call John Doe" a template is generated in conjunction with well known voice recognition techniques. This template is stored as part of the verbal icon. This template can be used in conjunction with speech recognition techniques (to identify the command) and speaker verifications techniques (to confirm the identity of the user). Further, the service information, indicating an action and or data such as a telephone number or a database filename, is also stored as part of the verbal icon.

Operation of system 30 is best understood in connection with the flow charts of Figures 3 and 4. Figure 3 illustrates a flow chart where a user is uniquely identified by inherent criteria. In block 50, the user accesses the gatekeeper and, by doing so, inherently provides identification. For example, a gatekeeper 36 could be accessed at a unique telephone number for a long distance service. In this case, the user the identification process would be transparent to the user. Similarly, a request for access of a database coming from a computer on a network could identify the user through identification of the computer's network address.

Without taking any intermediary verification steps, the user provides a verbal request in block 52 by speaking a verbal icon (for which a counterpart verbal icon 40 has already been entered into the data bank 38). In a long distance service, for example, this may entail stating "call John Doe" into the telephone receiver. In a computer network, the user may say "open the personnel database" or "open the accounts receivable database" into the microphone coupled to the computer.

Templates from the data bank 38 are used in conjunction with the speaker dependent voice recognition system to identify which, if any, verbal icons was spoken in block 54. The same utterance which is used to identify the verbal icon is also used in conjunction with the template of the same spoken icon to perform speaker verification in block 56.

If the speaker verification logic of the gatekeeper 36 determines that the user is an imposter in decision block 58, the request is rejected in block 60. Otherwise the gatekeeper passes information to initiate the service in block 62.

A long distance service could benefit from this system. As an example, a caller could place a call to a personal toll or toll free destination telephone number which was uniquely identified with the caller. This number would serve to connect him or her to the gatekeeper system 36. Since the phone number is associated with the caller, it also provides the gatekeeper 36 with the caller's claimed identity. The caller then speaks a pre-established verbal icon to request a service, such as "call John Blake" (a voice activated dialing command which serves to place a call to the telephone number associated with the command). The gatekeeper 36 compares the utterance with verbal icons 40 enrolled in conjunction with the claimed user and determines which icon was spoken by the purported authorized user. Once the verbal icon is identified, speaker verification techniques compare the utterance with the template for the identified spoken icon to determine whether the purported authorized user is in fact the authorized user. Verbal icons which may be used to initiate other services in the method of Figure 3 could include "get my calendar" or "show portfolio" to allow access to data stored on a computer, "retrieve voice mail" or "get dial tone."

In certain instances it is not possible to associate a unique identifying code with an access by a user. For example, a computer in the accounts receivable department of a company may be used by a number of individuals, each of which may have different rights to use various databases. Figure 4 illustrates a method using the system of Figure 2 wherein voice activated commands and verification are provided with resort to cumbersome security procedures on the part of the user. In block 70, a user accesses the gatekeeper, and the act of accessing the gatekeeper identifies the user as one of predetermined group of users, each of whom have pre-registered verbal icons. In the example above, the use of the computer in the accounts receivable department could identify the user as one of the group of people authorized to use that computer. In another example, a group of people could be assigned a number through which long distance telephone service was available. In yet another example, a group of people could be authorized to use a locked conference room.

In block 72, the user speaks a verbal icon pre-established in the system 30 and stored in the data bank 38. Using speech recognition techniques, the current voice print would be compared with the verbal icons 40 stored for each of the group of people determined to authorized in block 70. Once the matching verbal icon was found in block 74, it identifies the user uniquely. The current utterance could then be used to perform speaker verification in block 76 along with the previously stored template associated with the matching verbal icon. If the user is not verified in decision block 78, the request associated with the verbal icon is rejected in block 80, otherwise the request is performed in block 82.

The limits on the number of people in a group using this method depends upon a number of factors which affect performance of the system. In general, the time to match a verbal icon will depend upon the number of possible verbal icons which are compared. Therefore, if each potential user in a group uses a small number of verbal icons, then the group can have a relatively larger number of members. Another factor is the processing capabilities of the system; the faster the system, the more verbal icons can be compared in a given time interval. Thus, a faster system can handle groups with greater numbers of members or greater number of verbal icons per member relative to a slower system.

Many variations could be used with the system described above. For example, some databases may be more sensitive than others. Thus, the verification threshold employed for highly sensitive databases (such as financial information) may be narrower (higher ratio of rejection) than the verification threshold for less sensitive information.

The present invention provides significant advantages over the prior art. First identification of a user's putative identity is completely transparent to the user, and the user does not need to perform a separate step to inform the gateway of his or her identity. Second, the system uses the user's spoken request for service to transparently provide verification data to the system. Again, the caller does not knowingly provide verification information as the system is being used, nor is there a separate step to provide the information. These two differences make the system more convenient and faster to use, and avoid the irritation of extra steps.

Although the Detailed Description of the invention has been directed to certain exemplary embodiments, various modifications of these embodiments, as well as alternative embodiments, will be suggested to those skilled in the art. The invention encompasses any modifications or alternative embodiments that fall within the scope of the Claims.

Claims

1. A method of providing security, comprising the steps of: identifying a set of one or more authorized users by inherent criteria in response to an action by a purported authorized user; receiving a verbal utterance from the purported authorized user indicating a desired action; performing speech recognition on said verbal utterance with stored templates associated with said set of authorized users to identify a speech template associated with the desired action; performing speaker verification on said verbal utterance and said identified speech template to confirm the identity of the purported authorized user as one of said set of authorized users; if said purported authorized user is identified as one of said set of authorized users, performing the desired action.

2. The method of claim 1 wherein said identifying step comprises the step of dialing a telephone number associated with an authorized user.

3. The method of claim 1 wherein said identifying step comprises the step of dialing a telephone number associated with a group of authorized users.

4. The method of claim 1 wherein said identifying step comprises the step of speaking into a microphone of a connected to a uniquely identified computer in a network.

5. The method of claim 1 wherein said step of performing speech recognition comprises the step of performing speaker dependent speech recognition.

6. The method of claim 1 wherein said step of performing the desired action comprises the step of retrieving service information associated with said identified speech template.

7. The method of claim 6 wherein said step of retrieving service information comprises the step of retrieving information identifying an action.

8. The method of claim 6 wherein said step of retrieving service information comprises the step of retrieving information identifying data associated with an action.

9. A system for provided secure access comprising: a data bank for storing verbal icons each related to an authorized user , said verbal icons including a speech template and service information associated with said template; circuitry for identifying a set of one or more authorized users by inherent criteria in response to an action by a purported authorized user circuitry for receiving a verbal utterance from the purported authorized user indicating a desired action; circuitry for performing speech recognition on said verbal utterance with respect to templates associated with said set of authorized users to identify a speech template associated with the desired action; circuitry for performing speaker verification on said verbal utterance and said identified speech template to confirm the identity of the purported authorized user as one of said set of authorized users; circuitry for initiating the desired action if said purported authorized user is identified as one of said set of authorized users.

10. The system of claim 9 wherein said identifying circuitry comprises circuitry for identifying a user responsive to receiving a telephone connection at a telephone number associated with an authorized user.

11. The system of claim 9 wherein said identifying circuitry comprises circuitry for identifying a user responsive to receiving a telephone connection at a telephone number associated with a group of authorized users.

12. The system of claim 9 wherein said identifying circuitry comprises circuitry for identifying a user associated with a computer in a network.

13. The system of claim 9 wherein said step of speech recognition circuitry comprises the circuitry for performing speaker dependent speech recognition.

14. The system of claim 9 wherein said circuitry for performing the desired action comprises circuitry for retrieving service information associated with said identified speech template.

15. The system of claim 14 wherein said circuitry for retrieving service information comprises circuitry for retrieving information identifying an action.

16. The system of claim 14 wherein said circuitry for retrieving service information comprises circuitry for retrieving information identifying data associated with an action.