WO2007111170A1 - Speaking persian recognition system and computer program - Google Patents

Speaking persian recognition system and computer program Download PDF

Info

Publication number
WO2007111170A1
WO2007111170A1 PCT/JP2007/055434 JP2007055434W WO2007111170A1 WO 2007111170 A1 WO2007111170 A1 WO 2007111170A1 JP 2007055434 W JP2007055434 W JP 2007055434W WO 2007111170 A1 WO2007111170 A1 WO 2007111170A1
Authority
WO
WIPO (PCT)
Prior art keywords
speaker recognition
recognition
speaker
recognition system
user
Prior art date
Application number
PCT/JP2007/055434
Other languages
French (fr)
Japanese (ja)
Inventor
Yoshihiro Kawazoe
Soichi Toyama
Mitsuya Komamura
Original Assignee
Pioneer Corporation
Techexperts Incorporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pioneer Corporation, Techexperts Incorporation filed Critical Pioneer Corporation
Publication of WO2007111170A1 publication Critical patent/WO2007111170A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results

Definitions

  • the present invention is provided in various computer devices such as a car navigation device, a net banking device, an auto-lock device, and a computer recognition device, and various electronic electric devices.
  • the present invention relates to a technical field of a speaker recognition system for performing speaker recognition and a computer program for causing a computer to function as such a speaker recognition system.
  • a text-fixed type or a text-dependent type in which uttered text used for recognition is registered in advance, and such registration is not necessary, and any text is recognized.
  • a typical speaker recognition configuration for example, a technique including a speaker registration operation and a speaker recognition operation using an HMM (hidden Markov mode HMM) is shown (see Patent Document 1). Then, if recognition fails during the recognizing recognition operation, for example, it is rejected as another person (see Patent Document 2).
  • Patent Document 1 TECHNICAL REPORT OF IEICE.SP95-111 (1996- 01) P.17- P.24
  • Patent Document 2 Japanese Patent Laid-Open No. 2002-236666
  • Patent Document 1 does not particularly mention the measures to be taken when recognition fails, or Patent Document 2 only rejects the impersonator as another person and does not release the lock. With such measures alone, there was an attempt to recognize by impersonation. In this case, there is a technical problem that the impersonator may neglect to try to impersonate again without taking any measures to make the user aware of the facts of power.
  • the present invention has been made in view of the above-mentioned problems, for example, and a speaker recognition system and a computer that can effectively prevent impersonation or spoofing in speaker recognition are used as such a speaker recognition system. It is an issue to provide a computer program that functions.
  • a first speaker recognition system includes a recognition unit that performs speaker recognition, and the speaker recognition related to one user in the recognition unit continuously for a predetermined number of times.
  • a detection means for detecting whether or not the failure has occurred; and when the detection means detects that the speaker recognition has failed continuously for the predetermined number of times, it indicates that the speaker recognition has failed.
  • Reporting means for reporting failure information to the one user.
  • the following recognition is performed at the speaker recognition stage.
  • speaker recognition is performed by a recognition means having, for example, a microphone, a camera, a processor, a memory, and the like.
  • “speaker recognition” means whether the speaker whose recognition is desired is the registered speaker (hereinafter also referred to as “one user”) or a fraudulent person. That is, whether speaker recognition succeeds or fails.
  • Talking speaker recognition is typically based on the utterance of the speaker, but may be performed on the basis of fingerprints, irises, faces, etc. in addition to or instead of the utterance. Good.
  • the recognition means detects whether or not the recognition means has failed continuously for a predetermined number of times.
  • the “predetermined number of times” is the number of times that the speaker can be presumed to be an impersonator.
  • the number of times it is applied is, by experiment or simulation, extremely low when operating by one user himself / herself! Determined comprehensively.
  • continuous failure means success over multiple times. It means that it will fail unavoidably, and there is a certain amount of time between two consecutive failure occurrences! You don't have to.
  • speaker recognition of course, it also includes a case where failure occurs continuously at the same place, at the same opportunity, or during a series of operations.
  • failure information indicating that speaker recognition has failed is, for example, a display or the like. It is reported to one user by the reporting means comprising Here, the “report” mode is displayed on the display of the terminal on which the speaker recognition system is installed, as well as various modes as long as one user such as a preset e-mail or telephone can recognize the failure. You don't mind. At this time, in addition to one user, the administrator of the speaker recognition system is notified, so that a more accurate and quick response can be achieved.
  • impersonation or misrepresentation in speaker recognition can be suitably avoided or prevented. Even if an impersonator attempts to recognize fraud by impersonation, the failure information that the failure has occurred continuously is reported to one user or administrator without being discarded. By taking measures such as changes, the administrator can prompt all users including one user to take measures. In this way, countermeasures against fraudsters can be taken quickly and accurately, so that if a fraudster tries to impersonate again, the probability of successful recognition is further reduced, which is very advantageous in practice.
  • the second speaker recognition system of the present invention recognizes a speaker for recognition, and whether or not the speaker recognition for one user in the recognition unit fails. And a history storage unit for storing history information including failure information indicating that the speaker recognition has failed when the detection unit detects that the speaker recognition has failed. Reporting means for reporting the history information to the one user.
  • the following recognition is performed at the speaker recognition stage.
  • speaker recognition is first performed by a recognition means having a microphone, a camera, a processor, a memory, and the like.
  • detection means including a processor, a memory, and the like detects whether or not the recognition means V has failed in speaker recognition for one user.
  • the number of consecutive failures is not limited. In other words, if it fails even once, it may be detected. In reality, it may be a failure based on an operation by a single user, but more than that, it takes into account the practical benefits of detecting a careful pretender without omission.
  • the history storage means having a processor, memory, database, etc., for example, indicates that the speaker recognition has failed.
  • History information including failure information is stored.
  • the “history information” is information in which an operation history including failure information for one user is recorded, and is typically accumulated in time series.
  • history information is reported to one user immediately or afterward without delay, or regularly or irregularly by the reporting means.
  • impersonation or misrepresentation in speaker recognition can be preferably avoided or prevented. Even if a cautious impersonator attempts to recognize fraud by impersonation, the failure information that the failure has occurred even once is thrown away, so that one user or administrator is notified. As a result, the administrator can prompt all users including one user to take countermeasures. In this way, countermeasures against fraudsters can be taken quickly and accurately, so that when a fraudster tries to impersonate again, the probability of successful recognition is further reduced, which is very advantageous in practice.
  • the recognition means performs the speaker recognition via a voice input means, and the speaker recognition is performed.
  • a voice recording means for recording the voice input to the voice input means, and the history storage means further stores the recorded voice as the history information.
  • the recognition unit performs speaker recognition via the voice input unit.
  • the input voice is recorded by voice recording means having, for example, a processor, a memory, a database, etc., and stored as history information by the history storage means. Paid. Therefore, it is possible to use the voice stored in this way as influential information for identifying the impersonator, and it is possible to improve the recognition performance of the speaker recognition system by learning the impersonator's voice. Become.
  • the reporting means may report the history information and reproduce the recorded voice for the one user.
  • the history information is reported to one user by the reporting means. Played sound is played. Therefore, one user can confirm the unauthorized use reliably and quickly based on the reproduced voice. As a result, it is possible to speed up the process of changing the password.
  • the notification unit may provide the user with the one user when the number of failures detected by the detection unit continuously exceeds a predetermined number. You can report history information.
  • the notification unit reports the history information to one user.
  • the speaker recognition system can be controlled more flexibly by changing the predetermined number of times by, for example, history information.
  • the recognizing unit is connected via the access of a terminal power unit connected to the recognizing unit via a communication unit.
  • the history storage means stores the terminal name of the terminal on which the speaker recognition is performed further included in the history information, and the stored terminal name is the one user. If the terminal name is different from the one normally used, the predetermined number of times may be reduced compared to the same case.
  • the terminal name is different from the terminal name that is normally used, there is a high possibility of fraudulent use by a fraudulent person. Is done.
  • the terminal name that is usually used there is a high possibility that the user is the same user, so a relatively permissive speaker recognition is performed.
  • the predetermined number of times is suitably changed based on information other than speech, the performance of the speaker recognition system will be reduced. It will be complemented and is very useful in practice.
  • the recognition unit performs the speaker recognition via an access of a terminal power source connected to the recognition unit via a communication unit.
  • the history storage means stores the date and time when the speaker recognition succeeded most recently in the speaker recognition and the position of the terminal are further included in the history information, and stores the date and time in the speaker recognition.
  • the detection means causes the talk to be performed. It may be detected that person recognition has failed.
  • the following determination process is performed based on the history information stored by the history storage means. That is, the distance difference between the position of the terminal in the speaker recognition and the position of the stored terminal with respect to the time difference between the date and time in the speaker recognition and the stored date and time, that is, when the moving speed exceeds a predetermined speed threshold. Is detected by the detection means that speaker recognition has failed.
  • the “predetermined speed threshold” is a speed at which it is difficult or impossible to move realistically or physically. For example, a speed calculated based on the shortest distance search algorithm based on such a purpose. May be set in advance. Alternatively, it may be a value set by one user based on his / her own experience.
  • the threshold value to be used varies depending on the development of technology related to transportation means, and may be updated as appropriate.
  • the suspected unauthorized use is estimated based on the moving speed between the terminals. In other words, it is possible to complement speaker recognition from the viewpoint of mobility.
  • the reporting unit reports to the one user without delay via a communication unit.
  • the detection unit detects that the speaker recognition has failed continuously for a predetermined number of times, or when it is detected that the speaker recognition has failed.
  • the failure information or history information is reported to one user via the communication means without delay by the reporting means.
  • the “communication means” here specifically includes means capable of relatively quickly communicating with a single user, such as e-mail, landline phone, and mobile phone. Therefore, it is possible to quickly suppress the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence.
  • the notifying means may be configured so that the one user is connected to the speaker recognition means following the failed speaker recognition. When the speaker recognition is performed as an opportunity, the user is notified.
  • the detection unit detects that speaker recognition has failed continuously for a predetermined number of times, or when it is detected that speaker recognition has failed.
  • the failure information or history information is reported to one user by the reporting means when one user recognizes the speaker at the next opportunity of the failure. Therefore, even if there is no special communication means, it is possible to suppress the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence. Because it is for one user who is currently using the speaker recognition system, other measures are promoted at the same time as the notification, and one user can take measures on the spot. .
  • the recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance.
  • the reporting means reports that the password should be changed when reporting.
  • speaker recognition is performed by recognition means including, for example, a microphone, a camera, a processor, a memory, and the like, based on the voice corresponding to the password registered in advance! .
  • recognition means including, for example, a microphone, a camera, a processor, a memory, and the like.
  • the speaker recognition failure information or history information is reported to one user by the reporting means, a notification that the password should be changed is also made.
  • the reason for impersonators trying to impersonate is that there is a high possibility that the password is leaked. Therefore, the possibility of recurrence of impersonation or misrepresentation in speaker recognition or the success rate at the time of recurrence can be accurately suppressed.
  • the recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance.
  • the detection means it further comprises a change processing means for performing a process of changing the password.
  • speaker recognition by the recognition means is performed based on the voice corresponding to the password registered in advance! If a failure in speaker recognition is detected by the detection means, for example, a change processing means having a processor, a memory, etc. Therefore, a process for changing the password is performed. For example, if it is detected by the detection means that the speaker recognition has failed continuously for a predetermined number of times, it is assumed that an incorrect speaker recognition process by impersonation is being performed and the password is automatically entered. Change to a temporary password. As a result, it becomes more difficult to attempt to impersonate with the same password. If the changed password is notified to one user by the notification means in consideration of security, there will be no problem when one user uses the speaker recognition system next time. Therefore, the possibility of recurrence of impersonation or fraud in speaker recognition or the success rate at the time of recurrence can be suppressed very quickly.
  • the detection means for example, a change processing means having a processor, a memory, etc. Therefore, a process for changing the password is
  • the recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance.
  • the apparatus further includes invalidation processing means for invalidating the password for a predetermined period when detected by the detection means.
  • speaker recognition by the recognition means is performed based on the voice corresponding to the password registered in advance!
  • the nose word is invalidated for a predetermined period by an invalid processing means having, for example, a processor and a memory.
  • an invalid processing means having, for example, a processor and a memory.
  • the predetermined time is a predetermined period of time that can be determined in advance as a period during which a fraudulent person can abandon consecutive attempts, or a period of time that is sufficient for a legitimate user to take measures. It may be made changeable by the user. Therefore, the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence can be quickly suppressed.
  • a third speaker recognition system of the present invention includes a recognition unit for performing speaker recognition via a voice input unit, and the voice input unit when the speaker recognition is performed.
  • Voice recording means for recording the voice input to the stage, detection means for detecting whether or not the speaker recognition for one user in the recognition means has failed, and the speaker recognition by the detection means.
  • a history storage means for storing history information including the voice recorded in response to the failure of the speaker recognition when it is detected that the speaker has failed.
  • the speaker through the voice input means is recognized by the recognition means. Recognition is performed. Then, the voice input to the voice input means when the speaker recognition is performed is recorded by the voice recording means having a processor, a memory, a database, and the like. Simultaneously or in succession, the detection means detects whether or not the recognition means has failed in speaker recognition for one user.
  • the history storage means stores the history information including the voice recorded corresponding to the failure of the speaker recognition. .
  • the history information also includes voice
  • the voice is analyzed, and the voice of another spoofer who tries to impersonate another user is analyzed.
  • the impersonator image By performing comparisons, etc., or by confirming familiarity by authorized users, etc., it is possible to determine the impersonator image and use it to prevent impersonation.
  • impersonation in speaker recognition! ⁇ may be the recurrence of misrepresentation! Tsuji will be able to more efficiently control the success rate at the time of recurrence.
  • the history storage means further uses at least one of the date and time, position information, and terminal name of the speaker recognition as history information. Include and store.
  • At least one of the date and time of the speaker recognition, the position information, and the terminal name is further stored in the history information by the history storage means. The Therefore, in addition to failure information and voice data, date / time, location information, etc. are recorded, so the speed and accuracy of the impersonator can be increased, and the behavior pattern of the impersonator can be grasped, thus impersonating or impersonating the speaker. Recurrence can be more accurately suppressed.
  • the recognition unit when the detection unit detects the speaker, the recognition unit further fails in the speaker recognition.
  • the parameters are changed to make it easier.
  • the speaker recognition is performed by the recognition means when detected by the detection means.
  • the parameters are changed to make it easier to fail. Therefore, since the spoofer becomes more difficult to be recognized as the failure continues, the possibility of spoofing in the speaker recognition or the recurrence of the spoofing or the success rate at the time of the recurrence can be suppressed more accurately.
  • the parameter to be changed is a parameter of the one user who has been registered in advance as a reference for determining whether or not the speaker recognition has failed. It may be a threshold value of the similarity between the voice and the voice input at the time of speaker recognition.
  • the recognition means raises the similarity threshold. Therefore, when a spoofer tries to recognize the next speaker, speaker recognition is more likely to fail.
  • the degree to which the degree of similarity is raised may be set, for example, from the viewpoint of the impersonator's learning ability, and from the viewpoint of the voice fluctuation of one user due to physical condition, etc., respectively.
  • the recognizing unit is configured to connect the speaker via an access from a terminal power source connected to the recognizing unit via a communication unit.
  • the reporting means relates to the failure of the speaker recognition in addition to or instead of the case where the detection means detects that the speaker recognition has failed continuously for the predetermined number of times. If the predetermined condition for at least one of the temporal position and the spatial position of the terminal is not satisfied, the user is notified.
  • speaker recognition is performed by the recognition means via access from a terminal connected to the recognition means via the communication means.
  • the terminals here are equipped with ATM (Auto Teller Machine: ATM) and GPS (Global Positioning System: GPS) functions, which are installed at bank branches or convenience stores, for example, and connected to a dedicated line. Mobile phones that can be used for mobile banking.
  • ATM Auto Teller Machine
  • GPS Global Positioning System
  • speaker recognition by such a recognition means in addition to or instead of the case where the detection means detects that the speaker recognition has failed continuously for a predetermined number of times, that is, temporarily. Even if a failure is detected, if the predetermined condition for at least one of the temporal position and spatial position of the terminal related to the speaker recognition failure is not satisfied, the reporting means Report to the user.
  • the last device usage time and the current device usage time If it is judged that it is physically impossible to move within the powerful time difference taking into account the time difference between the two terminals and the distance between the two terminals, a report is made that the possibility of being a spoofer is relatively high. The Therefore, in addition to or instead of utterances, the possibility of recurrence of impersonation or misrepresentation in speaker recognition or the success rate at the time of recurrence by accurately grasping situations that are not common sense under the use of one user Can be suppressed more accurately.
  • a computer program according to the present invention uses a computer provided in a speaker recognition system as a first, second or third speaker recognition system according to the present invention described above. Including various aspects thereof).
  • the computer program of the present invention is read from a recording medium such as a CD-ROM or DVD-ROM storing the computer program into a computer provided in the speaker recognition system and executed. If the computer program is downloaded via a communication means and then executed, the speaker recognition system of the present invention described above can be constructed relatively easily. As a result, as in the case of the speaker recognition system of the present invention described above, the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence can be suppressed.
  • a computer program product in a computer-readable medium clearly embodies a computer-executable program instruction, and the computer is the first, It functions as a second or third speaker recognition system.
  • the computer program product of the present invention if the computer program product is read into a computer from a recording medium such as a ROM, CD-ROM, DVD-ROM, or hard disk storing the computer program product, or
  • a recording medium such as a ROM, CD-ROM, DVD-ROM, or hard disk storing the computer program product
  • the computer program product which is a transmission wave
  • the computer program product which is a transmission wave
  • the computer program product also has computer readable code (or computer readable instructions) power to function as the first, second, or third speaker recognition system of the present invention described above. Be composed.
  • the recognition means, the detection means, and the communication means are examples of the communication means.
  • the reporting means is provided, the possibility of reoccurrence of impersonation or misrepresentation in speaker recognition or the success rate at the time of relapse can be suppressed. Furthermore, according to the computer program of the present invention, since the computer functions as a recognition means, a detection means, and a notification means, the above-described speaker recognition system of the present invention can be constructed relatively easily.
  • FIG. 1 is a block diagram schematically showing the basic configuration and basic operation of a speaker recognition system according to a first example of the present invention.
  • FIG. 2 is a block diagram conceptually showing the basic structure and basic operation of a recognition unit provided in the speaker recognition system in the first example.
  • FIG. 3 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the second example.
  • FIG. 4 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the third example.
  • FIG. 5 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the fourth example.
  • FIG. 6 is a flowchart showing an operation process of the speaker recognition system according to the fifth embodiment.
  • FIG. 7 is a flowchart showing an operation process of the speaker recognition system according to the sixth embodiment.
  • FIG. 8 is a flowchart showing an operation process of the speaker recognition system according to the seventh embodiment.
  • FIG. 9 is a flowchart showing an operation process of the speaker recognition system according to the eighth embodiment.
  • FIG. 10 is a flowchart showing an operation process of the speaker recognition system according to the ninth embodiment.
  • FIG. 1 is a block diagram conceptually showing the basic configuration and basic operation of the speaker recognition system according to the first embodiment of the present invention
  • FIG. 2 is a speaker according to the first embodiment. It is a block diagram which shows notionally the basic composition and basic operation
  • a speaker recognition system 1 includes a microphone 132 and a recognition unit 14 as examples of “recognition means” according to the present invention, a display screen 52, and the present invention.
  • the detecting unit 60 as an example of the “detecting means” and the reporting unit 70 as an example of the “notifying means” according to the present invention are provided, and speaker recognition of the speaker A121 or the spoofer 122 is performed under the following configuration. Do.
  • the microphone 132 is a device that converts the utterance into an electrical signal and inputs it to the speaker recognition system 1 when the speaker A 121 or the spoofer 122 utters a keyword.
  • the recognition unit 14 is logically constructed according to a program in a computer including a processor, a memory, and the like, for example, and at the time of speaker recognition, any speaker (speaker A121 or By comparing the utterances of the spoofer 122) and the registered speaker model, it is recognized whether or not the speaker who works is the speaker A121 of the registered speaker model.
  • the recognizing unit 14 includes a speech part extracting unit 142, a feature amount calculating unit 201, a similarity calculating unit 15, a speaker model database 45, and a matching unit 30.
  • the voice partial extraction unit 142 is logically constructed in accordance with a program in a computer having a processor, a memory, and the like, for example, and generally uses a power difference between background noise and a voice utterance section. This is an arithmetic unit that cuts out an utterance voice portion when a keyword is uttered from an electrical signal related to an input utterance, that is, utterance data, by a typical voice segment detection method or the like.
  • the feature amount calculation unit 201 is logically constructed in accordance with a program in a computer having, for example, a processor, a memory, and the like, and converts an input utterance voice portion into a feature amount.
  • a feature amount is an arithmetic device that is converted by an MFCC (Mel Frequency Cepstrum Coefficient: MFCC), an LPC (Linear Predictive Coding: LPC) cepstrum, or the like.
  • MFCC Mel Frequency Cepstrum Coefficient: MFCC
  • LPC Linear Predictive Coding
  • the similarity calculation unit 15 is logically constructed according to a program in a computer having a processor, a memory, and the like, for example.
  • the similarity with the voice feature corresponding to the password registered in advance in the user model database 45 is calculated.
  • the collation unit 30 is logically constructed according to a program in a computer including a processor, a memory, and the like, for example, and the calculated similarity is based on a predetermined standard indicating the similarity corresponding to the person.
  • the speaker A121 or the spoofer 122 verifies whether or not the registered speaker A121 is a registered person, and the verification result (for example, whether the speaker recognition is successful or unsuccessful). ) Is output.
  • the predetermined standard indicating the degree of similarity corresponding to the person may be a value that can be changed as appropriate. Specifically, as the spoofer 132 repeatedly fails, the failure is detected by the detection unit 60, and if the predetermined standard is changed to make the failure easier, the spoofer 132 becomes more difficult to be spoofed.
  • the display screen 52 is, for example, a liquid crystal display or the like, and is a display device that displays a recognition result. If the message is not recognized as the person, a recognition failure message is displayed.
  • the detection unit 60 detects whether the recognition unit 14 has failed in speaker recognition continuously for a predetermined number of times. For example, if speaker recognition is on the same occasion or across different occasions, This is because if there are consecutive failures, the possibility of being a misrepresenter rather than the person himself is relatively high. Then, failure information indicating that the speaker recognition has failed is sent to the notification unit 70.
  • the reporting unit 70 reports failure information indicating that the speaker recognition has failed to one user himself (in this case, the speaker A121) via, for example, a display. At this time, failure information is notified to the speaker A121 without delay through a preset communication means such as e-mail or telephone. Alternatively, if the failure information is notified when the speaker A 121 performs speaker recognition at the next opportunity of the failure, it is possible to cope with the case where the speaker A 121 does not have any communication means. If you also report that you should change your password when reporting this failure information, you can respond to leaks of your password.
  • FIG. 3 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the second example.
  • FIG. 3 the same components as those in the above drawings are given the same reference numerals, and the description thereof will be omitted as appropriate.
  • the speaker recognition system 1 according to FIG. 3 is compared with the speaker recognition system 1 according to FIG. 1, and the change processing unit 65 as an example of the “change processing means” according to the present invention, It further includes an invalidation processing unit 66 as an example of the “invalid processing means”.
  • the change processing unit 65 changes the password used for speaker recognition. I do.
  • the invalidation processing unit 66 invalidates the password for a predetermined period.
  • FIG. 4 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the third example.
  • FIG. 4 the same components as those in the above drawings are given the same reference numerals, and the description thereof will be omitted as appropriate.
  • the speaker recognition system 1 according to FIG. 4 has a history storage unit 80 and a history database 85 as examples of the “history storage means” according to the present invention in addition to the speaker recognition system 1 according to FIG. In addition.
  • the history storage unit 80 stores history information including failure information indicating that speaker recognition has failed in the history database 85.
  • the table structure stored in the history database 85 is as shown in Table 86.
  • the table 86 stores the date and time when speaker recognition was performed, the name of the terminal used, the number of consecutive failures (number of consecutive failures), and the recognition result. Then, based on this history information, if the recognition unit 14 recognizes one user himself / herself while the number of consecutive failures is less than a predetermined number (for example, 5 times), the speaker recognition is considered successful, and the other time once.
  • the recognition of the speaker will be a failure and will be made by the impersonator 132 and will be the subject of notification.
  • the counter for counting the number of consecutive failures is set to an initial value of 0 after a predetermined time or a predetermined period of time or after successful recognition so that the number of failures by one user is not inadvertently accumulated. Good.
  • failure only when failure occurs continuously within a certain time or within a certain time period, it may be the subject of notification.
  • FIG. 5 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the fourth example. It should be noted that in FIG. The same reference numerals are given to the same components as those in the drawings, and the description thereof will be omitted as appropriate.
  • the speaker recognition system 1 according to FIG. 5 has a plurality of terminals (for example, a terminal A91 and a terminal B92) in addition to the speaker recognition system 1 according to FIG. And an audio recording unit 145 as an example.
  • Each of the plurality of terminals includes a microphone 132 and a display screen 52.
  • terminal A91 is installed in a branch in Hokkaido
  • terminal B92 is installed in a branch in Fukuoka.
  • the network configuration may be a so-called client-server type in which, for example, each terminal is arranged in a client and the others are arranged in a server.
  • the network configuration is not limited to this, and for example, only the history database 85 may be arranged in the server.
  • the voice recording unit 145 records the voice input to the microphone 132 when the speaker recognition is performed, in the history database 85 along with the history information, for example. Then, for example, when an unauthorized speaker recognition is attempted by the spoofer 122, the recorded voice is reproduced in response to the report by the report unit 70. Based on the reproduced voice, one user can confirm the unauthorized use surely and promptly, so that it is possible to quickly perform a process such as password change.
  • the table structure stored in the history database 85 is, for example, a table 87.
  • Table 87 stores, for example, the date and time when speaker recognition was performed, the name of the terminal used, location information indicating where the terminal was geographically located, voice data, and recognition results. .
  • the number of consecutive failures may be stored. Then, based on this history information, a deceiter 122 may be determined, or a defensive measure may be taken to prevent fraud.
  • FIG. 6 is a flowchart showing the operation process of the speaker recognition system according to the fifth embodiment. Note that the configuration in the present embodiment is the same as that in the fourth embodiment, and the same components are denoted by the same reference numerals, and the description thereof will be omitted as appropriate.
  • the recognition unit 14 calculates the similarity between the input speech and the speech registered in advance (step S1). It is determined whether or not the user has one person himself / herself based on whether or not the similarity exceeds a predetermined threshold (step S2). For example, a predetermined threshold value is set to 0.8, and it is determined whether or not the user is the user himself / herself based on whether or not the similarity obtained within the range of 0 to 1 is 0.8 or more.
  • step S2 when the similarity does not exceed the predetermined threshold (step S2: No), subsequently, the detection unit 14 determines whether or not the number of continuous recognition failures exceeds a predetermined value, and the recognition failure flag. Is set (step S32). For example, the default value is 5 times, and it is determined whether the number of consecutive recognition failures exceeds 5.
  • step S32 Yes
  • the number of consecutive recognition failures exceeds the default value of 5 (step S32: Yes)
  • the number of failures is no longer an excuse, and there is a relatively high possibility of being a spoofer.
  • the display screen 52 displays that the speaker recognition process as a whole has failed.
  • the invalidation processing unit 66 temporarily invalidates the password
  • the history storage unit 80 stores the history information including failure information related to the speaker recognition
  • the voice recording unit 145 stores the voice information input during the speaker recognition.
  • Each data is stored in the history database 85 (step S42). ) o
  • step S2 when the degree of similarity exceeds a predetermined threshold (step S2: Yes), that is, when it is recognized that the user is one, the recognition is basically successful at this point. And if there is an appearance of an impersonator, based on whether or not the recognition failure flag is set to prompt countermeasures, it is confirmed whether or not the previous speaker recognition has failed. (Step S31).
  • step S31 when the previous speaker recognition has failed (step S31: Yes), the reporting unit 70 recognizes the recognition failure history (that is, the fact that recognition by the impersonator has been performed) this time.
  • the security of the system can be ensured by taking measures such as changing the password if the user who has received the notification is not able to remember the password (step S41). On the other hand, if you remember, you do not need to change it.
  • step S31 if the previous speaker recognition has not failed (step S31: No), there is no evidence that the recognition by the impersonator has been performed. This is permitted and a message to that effect is displayed on the display screen 52 (step S43).
  • speaker recognition is preferably performed.
  • measures are taken when recognition fails continuously, and impersonation or misrepresentation in speaker recognition is preferably avoided or flaws can be prevented.
  • FIG. 7 is a flowchart showing the operation process of the speaker recognition system according to the sixth embodiment.
  • the configuration in the present embodiment is the same as that in the fourth embodiment, and the same reference numerals are given to the same configurations, and the description thereof is omitted as appropriate.
  • the same steps as those in the fifth embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate.
  • Step S52 Compared to FIG. 6, in FIG. 7, a process for increasing the threshold for similarity determination by the recognition unit 14 is added so that each time recognition fails, it becomes more difficult to recognize other than one user. (Step S52).
  • FIG. 8 is a flowchart showing the operation process of the speaker recognition system according to the seventh embodiment.
  • the configuration in the present embodiment is the same as that in the fourth embodiment, and the same reference numerals are given to the same configurations, and the description thereof is omitted as appropriate.
  • the same steps as those in the fifth embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate.
  • step S220 it is determined whether or not the terminal currently used is normally used.
  • the terminal normally used may be set in advance by one user himself / herself, for example. If it is determined that the terminal is in normal use (step S220: Yes), ex is substituted for the default value of the number of consecutive failures (step S221). On the other hand, if it is determined that the terminal is not normally used (step S220: No), ⁇ is substituted for the default value of the number of consecutive failures (step S222).
  • FIG. 9 is a flowchart showing the operation processing of the speaker recognition system according to the eighth embodiment.
  • the configuration in the present embodiment is the same as that in the fourth embodiment, and the same reference numerals are given to the same configurations, and the description thereof is omitted as appropriate.
  • the same steps as those in the fifth embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate.
  • Fig. 9 adds processing to assist speaker recognition based on the possibility of movement between the previous and current terminals.
  • the distance D between used terminals which is the distance between the terminals used in the previous recognition and the current recognition
  • the use time difference T which is the time difference between the previous recognition and the current recognition
  • V the moving speed V between the terminal used in the previous recognition and the terminal used in the current recognition
  • the “predetermined speed threshold” is a value set in advance as a speed at which it is difficult or impossible to move, for example, lOOOkmZh.
  • the predetermined speed threshold is a value set in advance as a speed at which it is difficult or impossible to move, for example, lOOOkmZh.
  • FIG. 10 is a flowchart showing the operation process of the speaker recognition system according to the ninth embodiment.
  • the configuration in the present embodiment is the same as that in the fourth embodiment, and the same components are denoted by the same reference numerals, and the description thereof will be omitted as appropriate.
  • the same steps as those in the fifth embodiment are denoted by the same reference numerals, and the description thereof is omitted as appropriate.
  • the timing for reporting failure information differs. Specifically, the timing for reporting failure information in Fig. 6 is when recognition succeeds in recognition after recognition related to this failure information (step S41), and is relatively long between failure times. Time lag occurs.
  • the timing at which failure information is reported in FIG. 7 is the point in time when the failure information is recognized (step S422), and the time lag that occurs between the failure and the failure can be relatively short. Therefore, one user or the administrator of the speaker recognition system can take quick measures. For example, in a situation where the password is temporarily invalidated, it is possible to avoid the failure of recognition by one user himself / herself without knowing the fact of such invalidation.
  • speaker recognition is preferably performed.
  • recognition fails continuously, one user himself / herself is notified at an appropriate timing, so the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of reoccurrence can be quickly suppressed. It becomes possible.
  • the speaker recognition system and the computer program according to the present invention are provided in various computer devices such as a car navigation device, a net banking device, an auto-lock device, and a computer recognition device, and various electronic electric devices. It can be used in a speaker recognition system that performs speaker recognition based on the utterance of the speaker.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Alarm Systems (AREA)

Abstract

A speaking person recognition system (1) comprises recognizing means (132, 14) for recognizing a speaking person and detecting means (60) for detecting whether or not speaking person recognition of a user by the recognizing means has failed continuously a predetermined times. The system (1) further comprises reporting means (70) for reporting failure information representing the failed speaking person recognition to the user when the fact that the speaking person recognition has failed continuously a predetermined time is detected by the detecting means.

Description

明 細 書  Specification
話者認識システム及びコンピュータプログラム  Speaker recognition system and computer program
技術分野  Technical field
[0001] 本発明は、例えばカーナビ装置、ネットバンキング装置、オートロック装置、コンビュ ータの認識装置等の各種コンピュータ機器や各種電子電気機器に設けられ、そのュ 一ザである話者の発話に基!、て、話者認識を行う話者認識システム及びコンビユー タをそのような話者認識システムとして機能させるコンピュータプログラムの技術分野 に関する。  [0001] The present invention is provided in various computer devices such as a car navigation device, a net banking device, an auto-lock device, and a computer recognition device, and various electronic electric devices. The present invention relates to a technical field of a speaker recognition system for performing speaker recognition and a computer program for causing a computer to function as such a speaker recognition system.
背景技術  Background art
[0002] この種の話者認識システムには、認識に用いられる発話されたテキストが予め登録 されているテキスト固定型或いはテキスト依存型と、このような登録が不要であり任意 のテキストについて認識を行うテキスト独立型或いは非テキスト依存型と、認識の際 或いは都度に認識にテキストが指定されるテキスト指定型の三種類がある(特許文献 1参照)。そして、典型的な話者認識の構成として、例えば HMM (hidden Markov mo dehHMM)による話者の登録操作と話者の認識操作とからなる技術が示されている( 特許文献 1参照)。そして、カゝかる認識操作の際に認識が失敗すると、例えば他人と して棄却される (特許文献 2参照)。  [0002] In this kind of speaker recognition system, a text-fixed type or a text-dependent type in which uttered text used for recognition is registered in advance, and such registration is not necessary, and any text is recognized. There are three types: a text independent type or a non-text dependent type, and a text designation type in which text is designated for recognition at the time of recognition or each time (see Patent Document 1). As a typical speaker recognition configuration, for example, a technique including a speaker registration operation and a speaker recognition operation using an HMM (hidden Markov mode HMM) is shown (see Patent Document 1). Then, if recognition fails during the recognizing recognition operation, for example, it is rejected as another person (see Patent Document 2).
[0003] 特許文献 1:信学技報 TECHNICAL REPORT OF IEICE.SP95- 111(1996- 01) P.17- P.24  [0003] Patent Document 1: TECHNICAL REPORT OF IEICE.SP95-111 (1996- 01) P.17- P.24
特許文献 2:特開 2002— 236666号公報  Patent Document 2: Japanese Patent Laid-Open No. 2002-236666
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0004] し力しながら、例えば前述の特許文献 1及び特許文献 2に開示されている技術によ れば、認識自体に注目する技術であり、万が一認識が失敗した際の対策が十分とは 言い難い。例えば、特許文献 1では、認識が失敗した際の対策までについては特に 触れておらず、或いは特許文献 2では、詐称者は他人として棄却され、ロックが解除 されないだけである。かかる対策のみでは、仮に、成りすましによる認識が試みられた 場合、力かる事実をユーザが知る由もなぐ何ら対策が採られぬまま、詐称者が再度 成りすましを試みることを放置することになりかねないという技術的問題点がある。 However, for example, according to the techniques disclosed in Patent Document 1 and Patent Document 2 described above, it is a technique that pays attention to the recognition itself, and a countermeasure in the event that recognition fails should be sufficient. It's hard to say. For example, Patent Document 1 does not particularly mention the measures to be taken when recognition fails, or Patent Document 2 only rejects the impersonator as another person and does not release the lock. With such measures alone, there was an attempt to recognize by impersonation. In this case, there is a technical problem that the impersonator may neglect to try to impersonate again without taking any measures to make the user aware of the facts of power.
[0005] 本発明は、例えば上述した問題点に鑑みてなされたものであり、話者認識における 成りすまし或いは詐称を効率的に防止可能な話者認識システム及びコンピュータを このような話者認識システムとして機能させるコンピュータプログラムを提供することを 課題とする。  [0005] The present invention has been made in view of the above-mentioned problems, for example, and a speaker recognition system and a computer that can effectively prevent impersonation or spoofing in speaker recognition are used as such a speaker recognition system. It is an issue to provide a computer program that functions.
課題を解決するための手段  Means for solving the problem
[0006] (話者認識システム) [0006] (Speaker recognition system)
本発明の第 1の話者認識システムは上記課題を解決するために、話者認識を行う 認識手段と、前記認識手段において一のユーザに係る前記話者認識が所定回数に 亘つて連続して失敗したか否かを検知する検知手段と、該検知手段により前記話者 認識が前記所定回数に亘つて連続して失敗したことが検知された場合に、前記話者 認識が失敗した旨を示す失敗情報を、前記一のユーザに対して通報する通報手段 とを備える。  In order to solve the above problems, a first speaker recognition system according to the present invention includes a recognition unit that performs speaker recognition, and the speaker recognition related to one user in the recognition unit continuously for a predetermined number of times. A detection means for detecting whether or not the failure has occurred; and when the detection means detects that the speaker recognition has failed continuously for the predetermined number of times, it indicates that the speaker recognition has failed. Reporting means for reporting failure information to the one user.
[0007] 第 1の話者認識システムによれば、話者認識段階で、次のような認識がなされる。  [0007] According to the first speaker recognition system, the following recognition is performed at the speaker recognition stage.
[0008] 即ちその動作時には、先ず、例えばマイクロホン、カメラ、プロセッサ、メモリ等を有 してなる認識手段によって、話者認識が行われる。ここに「話者認識」とは、認識を求 める話者が、登録された話者 (以下、「一のユーザ」とも言う)本人であるか、或いは詐 称者であるかであるかを認識すること、即ち話者認識が成功するか失敗するかを調 ベることである。カゝかる話者認識は、典型的には話者の発話に基いて行われるが、発 話に加えて又は代えて、例えば指紋、虹彩、顔等に基いて話者認識が行われてもよ い。 That is, during the operation, first, speaker recognition is performed by a recognition means having, for example, a microphone, a camera, a processor, a memory, and the like. Here, “speaker recognition” means whether the speaker whose recognition is desired is the registered speaker (hereinafter also referred to as “one user”) or a fraudulent person. That is, whether speaker recognition succeeds or fails. Talking speaker recognition is typically based on the utterance of the speaker, but may be performed on the basis of fingerprints, irises, faces, etc. in addition to or instead of the utterance. Good.
[0009] そして、例えばプロセッサ、メモリ等を有してなる検知手段によって、認識手段にお V、て話者認識が所定回数に亘つて連続して失敗した力否かが検知される。ここに「所 定回数」とは、話者が詐称者であると推測されうる回数である。力かる回数は、典型的 には、実験或いはシミュレーションにより、一のユーザ本人による動作時には到底有 り得な!/ヽ程に連続して失敗する回数として、話者認識システムの本人認識確率等を 総合的に考慮して定められる。また「連続して失敗」とは、複数回に亘つて成功を挟 むことなく失敗する意味であり、連続した二つの失敗の発生時期の間に、ある程度の 時間が空!、て 、ても力まわな 、し、連続した二つの失敗が発生した場所が同一でな くてもかまわない。但し、簡単には、話者認識を行う際に、同一場所や同一機会に或 いは一連の動作中に、連続して失敗する場合も勿論含む。 [0009] Then, by the detecting means having, for example, a processor, a memory, etc., the recognition means detects whether or not the recognition means has failed continuously for a predetermined number of times. Here, the “predetermined number of times” is the number of times that the speaker can be presumed to be an impersonator. Typically, the number of times it is applied is, by experiment or simulation, extremely low when operating by one user himself / herself! Determined comprehensively. In addition, “continuous failure” means success over multiple times. It means that it will fail unavoidably, and there is a certain amount of time between two consecutive failure occurrences! You don't have to. However, simply speaking, when performing speaker recognition, of course, it also includes a case where failure occurs continuously at the same place, at the same opportunity, or during a series of operations.
[0010] その結果、上記検知手段により話者認識が所定回数に亘つて連続して失敗したこ とが検知された場合に、話者認識が失敗した旨を示す失敗情報が、例えばディスプ レイ等を有してなる通報手段によって、一のユーザに対して通報される。ここに「通報 」の態様は、当該話者認識システムが搭載された端末のディスプレイに表示する他、 予め設定された電子メール、電話といった一のユーザがその失敗した旨を認識しうる 限り各種態様をとつても構わない。この際、一のユーザに加えて、当該話者認識シス テムの管理者にも通報することで、一層的確且つ迅速な対処が図られる。  [0010] As a result, when the detection means detects that speaker recognition has failed continuously for a predetermined number of times, failure information indicating that speaker recognition has failed is, for example, a display or the like. It is reported to one user by the reporting means comprising Here, the “report” mode is displayed on the display of the terminal on which the speaker recognition system is installed, as well as various modes as long as one user such as a preset e-mail or telephone can recognize the failure. You don't mind. At this time, in addition to one user, the administrator of the speaker recognition system is notified, so that a more accurate and quick response can be achieved.
[0011] 以上、第 1の話者認識システムによると、話者認識における成りすまし或いは詐称を 好適に回避し或いは予防可能となる。仮に、詐称者が成りすましにより不正な認識を 試みても、その際に連続して失敗したという失敗情報が捨てられることなぐ一のユー ザや管理者に通報されるので、一のユーザはパスワードの変更等の対策を採り、管 理者は一のユーザを含む全てのユーザに対して対策を促すことが可能となる。このよ うにして、詐称者への対策が迅速かつ的確に講じられるので、詐称者が再度成りす ましを試みる場合に、認識が成功する確率が一段と下がり、実践上非常に有利であ る。  As described above, according to the first speaker recognition system, impersonation or misrepresentation in speaker recognition can be suitably avoided or prevented. Even if an impersonator attempts to recognize fraud by impersonation, the failure information that the failure has occurred continuously is reported to one user or administrator without being discarded. By taking measures such as changes, the administrator can prompt all users including one user to take measures. In this way, countermeasures against fraudsters can be taken quickly and accurately, so that if a fraudster tries to impersonate again, the probability of successful recognition is further reduced, which is very advantageous in practice.
[0012] 本発明の第2の話者認識システムは上記課題を解決するために、話者認識を行う 認識手段と、前記認識手段において一のユーザに係る前記話者認識が失敗したか 否かを検知する検知手段と、該検知手段により前記話者認識が失敗したことが検知 された場合に、前記話者認識が失敗した旨を示す失敗情報を含む履歴情報を格納 する履歴格納手段と、前記一のユーザに対して前記履歴情報を通報する通報手段 とを備える。 [0012] In order to solve the above problem, the second speaker recognition system of the present invention recognizes a speaker for recognition, and whether or not the speaker recognition for one user in the recognition unit fails. And a history storage unit for storing history information including failure information indicating that the speaker recognition has failed when the detection unit detects that the speaker recognition has failed. Reporting means for reporting the history information to the one user.
[0013] 第 2の話者認識システムによれば、話者認識段階で、次のような認識がなされる。  [0013] According to the second speaker recognition system, the following recognition is performed at the speaker recognition stage.
[0014] 即ちその動作時には、先ず、例えばマイクロホン、カメラ、プロセッサ、メモリ等を有 してなる認識手段によって、話者認識が行われる。 [0015] そして、例えばプロセッサ、メモリ等を有してなる検知手段によって、認識手段にお V、て一のユーザに係る話者認識が失敗した力否かが検知される。ここで検知される「 失敗したか否か」については、連続しての失敗の回数は問わない。即ち、仮に 1回で も失敗していれば、それが検知されてよい。実際には一のユーザ本人による操作に 基づく失敗である可能性もあるが、それ以上に、用心深い詐称者も漏れなく検知する ことの実益を考慮したものである。 That is, during the operation, speaker recognition is first performed by a recognition means having a microphone, a camera, a processor, a memory, and the like. [0015] Then, for example, detection means including a processor, a memory, and the like detects whether or not the recognition means V has failed in speaker recognition for one user. Regarding the “whether or not failed” detected here, the number of consecutive failures is not limited. In other words, if it fails even once, it may be detected. In reality, it may be a failure based on an operation by a single user, but more than that, it takes into account the practical benefits of detecting a careful pretender without omission.
[0016] 該検知手段により話者認識が失敗したことが検知された場合には、例えばプロセッ サ、メモリ、データベース等を有してなる履歴格納手段によって、話者認識が失敗し た旨を示す失敗情報を含む履歴情報が格納される。ここに「履歴情報」とは、一のュ 一ザについての失敗情報を含む操作履歴が記録された情報であり、典型的には時 系列で蓄積される。  [0016] When the detection means detects that the speaker recognition has failed, the history storage means having a processor, memory, database, etc., for example, indicates that the speaker recognition has failed. History information including failure information is stored. Here, the “history information” is information in which an operation history including failure information for one user is recorded, and is typically accumulated in time series.
[0017] その結果、通報手段によって、即座に又は遅延無ぐ事後的に、若しくは定期的に 又は不定期的に、一のユーザに対して履歴情報が通報される。  [0017] As a result, history information is reported to one user immediately or afterward without delay, or regularly or irregularly by the reporting means.
[0018] 以上、第 2の話者認識システムによると、話者認識における成りすまし或いは詐称を 好適に回避し或いは予防可能となる。仮に、用心深い詐称者が成りすましにより不正 な認識を試みても、その際に一度でも失敗したという失敗情報が捨てられることなぐ 一のユーザや管理者に通報されるので、一のユーザは再発防止に向けての各種対 策を採り、管理者は一のユーザを含む全てのユーザに対して対策を促すことが可能 となる。このようにして、詐称者への対策が迅速かつ的確に講じられるので、詐称者 が再度成りすましを試みる場合に、認識が成功する確率が一段と下がり、実践上非 常に有利である。  [0018] As described above, according to the second speaker recognition system, impersonation or misrepresentation in speaker recognition can be preferably avoided or prevented. Even if a cautious impersonator attempts to recognize fraud by impersonation, the failure information that the failure has occurred even once is thrown away, so that one user or administrator is notified. As a result, the administrator can prompt all users including one user to take countermeasures. In this way, countermeasures against fraudsters can be taken quickly and accurately, so that when a fraudster tries to impersonate again, the probability of successful recognition is further reduced, which is very advantageous in practice.
[0019] 本発明に係る、第 1又は第 2の話者認識システムの一態様では、前記認識手段は、 音声入力手段を介して前記話者認識を行 ヽ、前記話者認識が行われる際に前記音 声入力手段に入力された音声を記録する音声記録手段を更に備え、前記履歴格納 手段は、前記履歴情報として、前記記録された音声を更に格納する。  In one aspect of the first or second speaker recognition system according to the present invention, the recognition means performs the speaker recognition via a voice input means, and the speaker recognition is performed. And a voice recording means for recording the voice input to the voice input means, and the history storage means further stores the recorded voice as the history information.
[0020] この態様によると、先ず認識手段において、音声入力手段を介した話者認識が行 われる。この際、入力された音声が、例えばプロセッサ、メモリ、データベース等を有 してなる音声記録手段によって記録され、履歴格納手段によって履歴情報として格 納される。従って、このように格納された音声を、詐称者を特定するための有力な情 報として活用することも可能となり、詐称者の音声を学習して当該話者認識システム の認識性能も向上可能となる。 [0020] According to this aspect, first, the recognition unit performs speaker recognition via the voice input unit. At this time, the input voice is recorded by voice recording means having, for example, a processor, a memory, a database, etc., and stored as history information by the history storage means. Paid. Therefore, it is possible to use the voice stored in this way as influential information for identifying the impersonator, and it is possible to improve the recognition performance of the speaker recognition system by learning the impersonator's voice. Become.
[0021] この、記録された音声を更に格納する態様では、前記通報手段は、前記履歴情報 を通報すると共に、前記一のユーザに対して前記記録された音声を再生してもよ 、。  [0021] In the aspect of further storing the recorded voice, the reporting means may report the history information and reproduce the recorded voice for the one user.
[0022] この態様〖こよると、例えば詐称者による不正な話者認識が試みられた際には、通報 手段によって、一のユーザに対して、履歴情報が通報されるのにカ卩えて記録された 音声が再生される。それ故に、この再生される音声に基いて、一のユーザは、確実か つ迅速に不正使用を確認できる。その結果、パスワードの変更等の処理を迅速に行 うことち可會となる。  [0022] According to this aspect, for example, when an unauthorized speaker recognition is attempted by an impersonator, the history information is reported to one user by the reporting means. Played sound is played. Therefore, one user can confirm the unauthorized use reliably and quickly based on the reproduced voice. As a result, it is possible to speed up the process of changing the password.
[0023] 上述した履歴格納手段を備える態様では、前記通報手段は、前記検知手段によつ て検知された失敗の回数が連続して所定回数を超える場合に、前記一のユーザに 対して前記履歴情報を通報してもよ 、。  [0023] In the aspect including the history storage unit described above, the notification unit may provide the user with the one user when the number of failures detected by the detection unit continuously exceeds a predetermined number. You can report history information.
[0024] この態様によると、検知手段によって検知された失敗の回数が連続して所定回数を 超える場合に、通報手段によって、一のユーザに対して履歴情報が通報される。この 際、所定回数を、例えば履歴情報によって変更する等して、当該話者認識システム をより柔軟に制御可能となる。  [0024] According to this aspect, when the number of failures detected by the detection unit continuously exceeds a predetermined number, the notification unit reports the history information to one user. At this time, the speaker recognition system can be controlled more flexibly by changing the predetermined number of times by, for example, history information.
[0025] この失敗の回数が連続して所定回数を超える力否かが判断される態様では、前記 認識手段は、前記認識手段に通信手段を介して接続された端末力ゝらのアクセスを介 して前記話者認識を行い、前記履歴格納手段は、前記話者認識が行われる前記端 末の端末名を更に前記履歴情報に含めて格納し、前記格納された端末名が前記一 のユーザによって普段使用される端末名と異なる場合には、同じ場合に比べて、前 記所定回数が少なくされてもよい。  [0025] In an aspect in which it is determined whether or not the number of failures continuously exceeds a predetermined number, the recognizing unit is connected via the access of a terminal power unit connected to the recognizing unit via a communication unit. The history storage means stores the terminal name of the terminal on which the speaker recognition is performed further included in the history information, and the stored terminal name is the one user. If the terminal name is different from the one normally used, the predetermined number of times may be reduced compared to the same case.
[0026] この態様〖こよると、普段使用される端末名と異なる場合には、詐称者による不正使 用の可能性が高いので、許容される連続失敗回数が少なくされ、一段と厳しい話者 認識が行われる。他方で、普段使用される端末名と同じ場合には、一のユーザ本人 である可能性が高いので、比較的寛容な話者認識が行われる。このように、音声以 外の情報にも基いて、所定回数が好適に変更されると、話者認識システムの性能が 補完されることになり、実践上非常に便利である。 [0026] According to this aspect, if the terminal name is different from the terminal name that is normally used, there is a high possibility of fraudulent use by a fraudulent person. Is done. On the other hand, if it is the same as the terminal name that is usually used, there is a high possibility that the user is the same user, so a relatively permissive speaker recognition is performed. Thus, if the predetermined number of times is suitably changed based on information other than speech, the performance of the speaker recognition system will be reduced. It will be complemented and is very useful in practice.
[0027] 或 ヽは、上述した履歴格納手段を備える態様では、前記認識手段は、前記認識手 段に通信手段を介して接続された端末力ゝらのアクセスを介して前記話者認識を行い 、前記履歴格納手段は、当該話者認識の直近で成功した前記話者認識が行われた 日時及び前記端末の位置を更に前記履歴情報に含めて格納し、当該話者認識にお ける日時と前記格納された日時との時間差に対する、当該話者認識における前記端 末の位置と前記格納された前記端末の位置との距離差が、所定速度閾値を超える 場合には、前記検知手段により前記話者認識が失敗したことが検知されてもよい。  Alternatively, in the aspect including the history storage unit described above, the recognition unit performs the speaker recognition via an access of a terminal power source connected to the recognition unit via a communication unit. The history storage means stores the date and time when the speaker recognition succeeded most recently in the speaker recognition and the position of the terminal are further included in the history information, and stores the date and time in the speaker recognition. When a difference in distance between the terminal position in the speaker recognition and the stored terminal position exceeds a predetermined speed threshold with respect to a time difference from the stored date and time, the detection means causes the talk to be performed. It may be detected that person recognition has failed.
[0028] この態様によると、履歴格納手段により格納された履歴情報に基いて、次のような 判断処理がなされる。即ち、当該話者認識における日時と格納された日時との時間 差に対する、当該話者認識における端末の位置と格納された端末の位置との距離差 、即ち移動速度が所定速度閾値を超える場合には、話者認識が失敗したことが検知 手段により検知される。ここに「所定速度閾値」とは、現実的に或いは物理的に移動 するのが困難或いは不可能な速度であり、例えばそのような趣旨に基き最短距離探 索のアルゴリズム等に基いて算出した速度として予め設定されてもよい。或いは、自 身の経験に基き一のユーザ本人が設定する値としてもよい。尚、力かる閾値は、交通 手段に関する技術の発展により変動するので、適宜更新されてもよい。このようにして 、本態様では、端末間の移動速度に基き不正使用の疑いが推定される。即ち、移動 可能性の観点力 話者認識を補完することができる。  [0028] According to this aspect, the following determination process is performed based on the history information stored by the history storage means. That is, the distance difference between the position of the terminal in the speaker recognition and the position of the stored terminal with respect to the time difference between the date and time in the speaker recognition and the stored date and time, that is, when the moving speed exceeds a predetermined speed threshold. Is detected by the detection means that speaker recognition has failed. Here, the “predetermined speed threshold” is a speed at which it is difficult or impossible to move realistically or physically. For example, a speed calculated based on the shortest distance search algorithm based on such a purpose. May be set in advance. Alternatively, it may be a value set by one user based on his / her own experience. Note that the threshold value to be used varies depending on the development of technology related to transportation means, and may be updated as appropriate. In this manner, in this aspect, the suspected unauthorized use is estimated based on the moving speed between the terminals. In other words, it is possible to complement speaker recognition from the viewpoint of mobility.
[0029] 本発明に係る、第 1又は第 2の話者認識システムの他の態様では、前記通報手段 は、通信手段を介して前記一のユーザに対して遅延なく通報する。  [0029] In another aspect of the first or second speaker recognition system according to the present invention, the reporting unit reports to the one user without delay via a communication unit.
[0030] この態様によれば、上述の如く検知手段により話者認識が所定回数に亘つて連続 して失敗したことが検知された場合、或いは話者認識が失敗したことが検知された場 合には、その失敗情報或いは履歴情報が、通報手段によって、通信手段を介して一 のユーザに対して遅延なく通報される。ここでの「通信手段」には、具体的に例えば 電子メール、固定電話、携帯電話のような、一のユーザに対して比較的早く通信可 能な手段が含まれる。従って、話者認識における成りすまし或いは詐称の再発の可 能性或いは再発時の成功率を迅速に抑制可能となる。 [0031] 本発明に係る、第 1又は第 2の話者認識システムの他の態様では、前記通報手段 は、前記一のユーザが、前記話者認識手段において前記失敗した話者認識の次の 機会として前記話者認識を行う際に、前記一のユーザに対して通報する。 [0030] According to this aspect, as described above, when the detection unit detects that the speaker recognition has failed continuously for a predetermined number of times, or when it is detected that the speaker recognition has failed. The failure information or history information is reported to one user via the communication means without delay by the reporting means. The “communication means” here specifically includes means capable of relatively quickly communicating with a single user, such as e-mail, landline phone, and mobile phone. Therefore, it is possible to quickly suppress the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence. [0031] In another aspect of the first or second speaker recognition system according to the present invention, the notifying means may be configured so that the one user is connected to the speaker recognition means following the failed speaker recognition. When the speaker recognition is performed as an opportunity, the user is notified.
[0032] この態様によれば、上述の如く検知手段により話者認識が所定回数に亘つて連続 して失敗したことが検知された場合、或いは話者認識が失敗したことが検知された場 合には、その失敗情報或いは履歴情報が、その失敗の次の機会に一のユーザが話 者認識を行う際に、通報手段によって、一のユーザに対して通報される。従って、特 別な通信手段がなくとも、話者認識における成りすまし或いは詐称の再発の可能性 或いは再発時の成功率を抑制可能となる。カロえて、当該話者認識システムを現に利 用している一のユーザに対してするので、通報と同時にその他の対策を促し、一のュ 一ザはその場で対策を講ずることが可能となる。  [0032] According to this aspect, as described above, when the detection unit detects that speaker recognition has failed continuously for a predetermined number of times, or when it is detected that speaker recognition has failed. The failure information or history information is reported to one user by the reporting means when one user recognizes the speaker at the next opportunity of the failure. Therefore, even if there is no special communication means, it is possible to suppress the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence. Because it is for one user who is currently using the speaker recognition system, other measures are promoted at the same time as the notification, and one user can take measures on the spot. .
[0033] 本発明に係る、第 1又は第 2の話者認識システムの他の態様では、前記認識手段 は、予め登録されているパスワードに対応する音声に基いて、前記話者認識を行い、 前記通報手段は、通報する際に、前記パスワードを変更すべき旨の通報を行う。  [0033] In another aspect of the first or second speaker recognition system according to the present invention, the recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance. The reporting means reports that the password should be changed when reporting.
[0034] この態様によれば、予め登録されて!、るパスワードに対応する音声に基 、て、例え ばマイクロホン、カメラ、プロセッサ、メモリ等を有してなる認識手段による話者認識が 行われる。そして、その話者認識の失敗情報或いは履歴情報が通報手段によって一 のユーザに通報される際には、パスワードを変更すべき旨の通報も行われる。詐称 者が執拗に成りすましを試みる理由としては、パスワードが漏洩している可能性が高 いからである。従って、話者認識における成りすまし或いは詐称の再発の可能性或 いは再発時の成功率を的確に抑制可能となる。  [0034] According to this aspect, speaker recognition is performed by recognition means including, for example, a microphone, a camera, a processor, a memory, and the like, based on the voice corresponding to the password registered in advance! . When the speaker recognition failure information or history information is reported to one user by the reporting means, a notification that the password should be changed is also made. The reason for impersonators trying to impersonate is that there is a high possibility that the password is leaked. Therefore, the possibility of recurrence of impersonation or misrepresentation in speaker recognition or the success rate at the time of recurrence can be accurately suppressed.
[0035] 本発明に係る、第 1又は第 2の話者認識システムの他の態様では、前記認識手段 は、予め登録されているパスワードに対応する音声に基いて、前記話者認識を行い、 前記検知手段により検知された場合に、前記パスワードを変更する処理を行う変更 処理手段を更に備える。  [0035] In another aspect of the first or second speaker recognition system according to the present invention, the recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance. When it is detected by the detection means, it further comprises a change processing means for performing a process of changing the password.
[0036] この態様によれば、予め登録されて!、るパスワードに対応する音声に基 、て、認識 手段による話者認識が行われる。そして、この話者認識における失敗が検知手段に より検知された場合には、例えばプロセッサ、メモリ等を有してなる変更処理手段によ つてパスワードを変更する処理が行われる。例えば、上記検知手段により話者認識が 所定回数に亘つて連続して失敗したことが検知された場合には、成りすましによる不 正な話者認識処理が行われているとして、自動的にパスワードを仮のパスワードに変 更する。その結果、それ以上同一のパスワードで成りすましを試みることが困難にな る。力かる変更されたパスワードは、セキュリティに配慮した上で、通知手段により一 のユーザへ通知されれば、一のユーザが当該話者認識システムを次回利用する際 にも問題ない。従って、話者認識における成りすまし或いは詐称の再発の可能性或 いは再発時の成功率を極めて迅速に抑制可能となる。 [0036] According to this aspect, speaker recognition by the recognition means is performed based on the voice corresponding to the password registered in advance! If a failure in speaker recognition is detected by the detection means, for example, a change processing means having a processor, a memory, etc. Therefore, a process for changing the password is performed. For example, if it is detected by the detection means that the speaker recognition has failed continuously for a predetermined number of times, it is assumed that an incorrect speaker recognition process by impersonation is being performed and the password is automatically entered. Change to a temporary password. As a result, it becomes more difficult to attempt to impersonate with the same password. If the changed password is notified to one user by the notification means in consideration of security, there will be no problem when one user uses the speaker recognition system next time. Therefore, the possibility of recurrence of impersonation or fraud in speaker recognition or the success rate at the time of recurrence can be suppressed very quickly.
[0037] 本発明に係る、第 1又は第 2の話者認識システムの他の態様では、前記認識手段 は、予め登録されているパスワードに対応する音声に基いて、前記話者認識を行い、 前記検知手段により検知された場合に、前記パスワードを所定期間無効にする無効 処理手段を更に備える。  [0037] In another aspect of the first or second speaker recognition system according to the present invention, the recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance. The apparatus further includes invalidation processing means for invalidating the password for a predetermined period when detected by the detection means.
[0038] この態様によれば、予め登録されて!、るパスワードに対応する音声に基 、て、認識 手段による話者認識が行われる。そして、この話者認識における失敗が検知手段に より検知された場合には、例えばプロセッサ、メモリ等を有してなる無効処理手段によ つて、ノ スワードが所定期間無効にされる。例えば、一度ならず二度も連続して失敗 したような場合には、一時間パスワードが無効にされる。ここに所定時間は、詐称者 が連続した試みを断念し得る期間として予め定められる力、或いは、正規ユーザが対 策を講ずるのに十分な期間として予め定められるとよぐ力かる所定期間は一のユー ザ本人によって変更可能にしてもよい。従って、話者認識における成りすまし或いは 詐称の再発の可能性或いは再発時の成功率を迅速に抑制可能となる。  [0038] According to this aspect, speaker recognition by the recognition means is performed based on the voice corresponding to the password registered in advance! When a failure in speaker recognition is detected by the detecting means, the nose word is invalidated for a predetermined period by an invalid processing means having, for example, a processor and a memory. For example, if you have failed twice in succession, your password will be invalidated for one hour. Here, the predetermined time is a predetermined period of time that can be determined in advance as a period during which a fraudulent person can abandon consecutive attempts, or a period of time that is sufficient for a legitimate user to take measures. It may be made changeable by the user. Therefore, the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence can be quickly suppressed.
[0039] 本発明の第 3の話者認識システムは上記課題を解決するために、音声入力手段を 介して話者認識を行う認識手段と、前記話者認識が行われる際に前記音声入力手 段に入力された音声を記録する音声記録手段と、前記認識手段において一のユー ザに係る前記話者認識が失敗したか否かを検知する検知手段と、該検知手段により 前記話者認識が失敗したことが検知された場合に、前記話者認識が失敗したのに対 応して記録された前記音声を含む履歴情報を格納する履歴格納手段とを備える。  [0039] In order to solve the above problems, a third speaker recognition system of the present invention includes a recognition unit for performing speaker recognition via a voice input unit, and the voice input unit when the speaker recognition is performed. Voice recording means for recording the voice input to the stage, detection means for detecting whether or not the speaker recognition for one user in the recognition means has failed, and the speaker recognition by the detection means. And a history storage means for storing history information including the voice recorded in response to the failure of the speaker recognition when it is detected that the speaker has failed.
[0040] 第 3の話者認識システムによれば、認識手段によって、音声入力手段を介して話者 認識が行われる。そして、例えばプロセッサ、メモリ、データベース等を有してなる音 声記録手段によって、話者認識が行われる際に音声入力手段に入力された音声が 記録される。これと同時に又は相前後して、検知手段によって、認識手段において一 のユーザに係る話者認識が失敗したカゝ否かが検知される。ここで、該検知手段により 話者認識が失敗したことが検知された場合には、履歴格納手段によって、話者認識 が失敗したのに対応して記録された音声を含む履歴情報が格納される。 [0040] According to the third speaker recognition system, the speaker through the voice input means is recognized by the recognition means. Recognition is performed. Then, the voice input to the voice input means when the speaker recognition is performed is recorded by the voice recording means having a processor, a memory, a database, and the like. Simultaneously or in succession, the detection means detects whether or not the recognition means has failed in speaker recognition for one user. Here, when the detection means detects that the speaker recognition has failed, the history storage means stores the history information including the voice recorded corresponding to the failure of the speaker recognition. .
[0041] 以上、第 3の話者認識システムによると、履歴情報には音声も含まれるので、例え ばこの音声を分析し、他のユーザに成りすまそうとする他の詐称者の音声との比較を すること等によって、或いは正規ユーザ等による聞き覚えの確認作業を行うこと等に よって、詐称者像を割り出し、成りすましの予防等に利用することができる。このように して、話者認識における成りすまし或!ヽは詐称の再発の可能性或!ヽは再発時の成功 率を一層効率良く抑制可能となる。  [0041] As described above, according to the third speaker recognition system, since the history information also includes voice, for example, the voice is analyzed, and the voice of another spoofer who tries to impersonate another user is analyzed. By performing comparisons, etc., or by confirming familiarity by authorized users, etc., it is possible to determine the impersonator image and use it to prevent impersonation. In this way, impersonation in speaker recognition!ヽ may be the recurrence of misrepresentation! Tsuji will be able to more efficiently control the success rate at the time of recurrence.
[0042] 第 2又は第 3の話者認識システムの他の態様では、前記履歴格納手段は、前記話 者認識が行われた日時、位置情報及び端末名のうち少なくとも一つを更に履歴情報 に含めて格納する。  [0042] In another aspect of the second or third speaker recognition system, the history storage means further uses at least one of the date and time, position information, and terminal name of the speaker recognition as history information. Include and store.
[0043] この態様によれば、失敗情報や音声データに加えて、話者認識が行われた日時、 位置情報及び端末名のうち少なくとも一つが更に、履歴格納手段によって履歴情報 に含めて格納される。従って、失敗情報や音声データに加えて、日時、位置情報等 も記録されるので、詐称者の特定速度及び精度を高め、詐称者の行動パターンも把 握され、もって話者認識における成りすまし或いは詐称の再発を一層的確に抑制可 能となる。その際、日時、位置情報等を考慮した結果、明らかに成りすましであると推 測されるなら、たとえ連続しての失敗回数が所定回数に至らずとも、当該話者認識を 中断して一のユーザ本人に力かる事実を通報するようにしてもよい。その結果、話者 認識における成りすまし'詐称の予防され得る。  [0043] According to this aspect, in addition to the failure information and voice data, at least one of the date and time of the speaker recognition, the position information, and the terminal name is further stored in the history information by the history storage means. The Therefore, in addition to failure information and voice data, date / time, location information, etc. are recorded, so the speed and accuracy of the impersonator can be increased, and the behavior pattern of the impersonator can be grasped, thus impersonating or impersonating the speaker. Recurrence can be more accurately suppressed. At that time, if it is estimated that the result is clearly impersonating as a result of considering the date and time, location information, etc., even if the number of consecutive failures does not reach the predetermined number, the speaker recognition is interrupted and You may make it report the fact which helps a user himself / herself. As a result, spoofing in the speaker recognition can be prevented.
[0044] 本発明に係る、第 1、第 2又は第 3の話者認識システムの他の態様では、前記検知 手段により検知された場合に、前記認識手段において、前記話者認識が、より失敗し やすくなるようにパラメータの変更が行われる。  [0044] In another aspect of the first, second, or third speaker recognition system according to the present invention, when the detection unit detects the speaker, the recognition unit further fails in the speaker recognition. The parameters are changed to make it easier.
[0045] この態様によれば、検知手段により検知された場合に、認識手段において、話者認 識が、より失敗しやすくなるようにパラメータの変更が行われる。従って、詐称者が失 敗を重ねるにつれて段々と認識され難くなるので、話者認識における成りすまし或い は詐称の再発の可能性或いは再発時の成功率を一層的確に抑制可能となる。 [0045] According to this aspect, the speaker recognition is performed by the recognition means when detected by the detection means. The parameters are changed to make it easier to fail. Therefore, since the spoofer becomes more difficult to be recognized as the failure continues, the possibility of spoofing in the speaker recognition or the recurrence of the spoofing or the success rate at the time of the recurrence can be suppressed more accurately.
[0046] このパラメータの変更が行われる態様では、前記変更が行われるパラメータは、前 記話者認識が失敗されたか否かを判定する際の基準となる、予め登録された前記一 のユーザの音声と前記話者認識の際に入力される音声との類似度の閾値であっても よい。  [0046] In the aspect in which the parameter is changed, the parameter to be changed is a parameter of the one user who has been registered in advance as a reference for determining whether or not the speaker recognition has failed. It may be a threshold value of the similarity between the voice and the voice input at the time of speaker recognition.
[0047] この態様によると、話者認識において失敗されたことが検知される度に、認識手段 において、類似度の閾値が引き上げられる。従って、詐称者が次の話者認識を試み る場合には、話者認識が、より失敗しやすくなる。ここで類似度が引き上げられる度合 いは、例えば詐称者の学習能力の観点から下限を、体調等による一のユーザ本人の 音声変動の観点力 上限を夫々設定するとよ 、。  [0047] According to this aspect, each time it is detected that the speaker recognition has failed, the recognition means raises the similarity threshold. Therefore, when a spoofer tries to recognize the next speaker, speaker recognition is more likely to fail. Here, the degree to which the degree of similarity is raised may be set, for example, from the viewpoint of the impersonator's learning ability, and from the viewpoint of the voice fluctuation of one user due to physical condition, etc., respectively.
[0048] 本発明に係る、第 1の話者認識システムの他の態様では、前記認識手段は、前記 認識手段に通信手段を介して接続された端末力ゝらのアクセスを介して前記話者認識 を行い、前記通報手段は、前記検知手段により前記話者認識が前記所定回数に亘 つて連続して失敗したことが検知された場合に加えて又は代えて、前記話者認識の 失敗に係る前記端末の時間的な位置及び空間的な位置のうち少なくとも一方につい ての所定条件を満たさな 、場合に、前記一のユーザに対して通報する。  [0048] In another aspect of the first speaker recognition system according to the present invention, the recognizing unit is configured to connect the speaker via an access from a terminal power source connected to the recognizing unit via a communication unit. And the reporting means relates to the failure of the speaker recognition in addition to or instead of the case where the detection means detects that the speaker recognition has failed continuously for the predetermined number of times. If the predetermined condition for at least one of the temporal position and the spatial position of the terminal is not satisfied, the user is notified.
[0049] この態様によれば、その動作時には、認識手段に通信手段を介して接続された端 末からのアクセスを介して認識手段による話者認識が行われる。ここでの端末には、 例えば銀行の支店或 、はコンビ-エンスストアに設けられ、専用線に接続された AT M (Auto Teller Machine : ATM)、 GPS (Global Positioning System : GPS)機能を搭 載するモバイルバンキング可能な携帯電話等が挙げられる。このような認識手段によ る話者認識の際、検知手段により話者認識が所定回数に亘つて連続して失敗したこ とが検知された場合に加えて又は代えて、即ち、仮に一度きりの失敗が検知された 場合であっても、話者認識の失敗に係る端末の時間的な位置及び空間的な位置の うち少なくとも一方についての所定条件を満たさない場合には、通報手段は一のュ 一ザに対して通報する。例えば、前回の端末の利用時間と今回の端末の利用時間と の時間差及び両端末の距離を勘案して、力かる時間差内に移動することが物理的に 不可能であると判断される場合には、詐称者である可能性が比較的高いとして通報 がなされる。従って、発話に加えて又は代えて、一のユーザによる利用下では常識的 にはありえないような状況を的確に捉えることで、話者認識における成りすまし或いは 詐称の再発の可能性或いは再発時の成功率を一層的確に抑制可能となる。 [0049] According to this aspect, during the operation, speaker recognition is performed by the recognition means via access from a terminal connected to the recognition means via the communication means. The terminals here are equipped with ATM (Auto Teller Machine: ATM) and GPS (Global Positioning System: GPS) functions, which are installed at bank branches or convenience stores, for example, and connected to a dedicated line. Mobile phones that can be used for mobile banking. In speaker recognition by such a recognition means, in addition to or instead of the case where the detection means detects that the speaker recognition has failed continuously for a predetermined number of times, that is, temporarily. Even if a failure is detected, if the predetermined condition for at least one of the temporal position and spatial position of the terminal related to the speaker recognition failure is not satisfied, the reporting means Report to the user. For example, the last device usage time and the current device usage time If it is judged that it is physically impossible to move within the powerful time difference taking into account the time difference between the two terminals and the distance between the two terminals, a report is made that the possibility of being a spoofer is relatively high. The Therefore, in addition to or instead of utterances, the possibility of recurrence of impersonation or misrepresentation in speaker recognition or the success rate at the time of recurrence by accurately grasping situations that are not common sense under the use of one user Can be suppressed more accurately.
[0050] (コンピュータプログラム)  [0050] (Computer program)
上記課題を解決するために、本発明のコンピュータプログラムは、話者認識システ ムに備えられたコンピュータを、上述した本発明に係る、第 1、第 2又は第 3の話者認 識システム (但し、その各種態様を含む)として機能させる。  In order to solve the above problems, a computer program according to the present invention uses a computer provided in a speaker recognition system as a first, second or third speaker recognition system according to the present invention described above. Including various aspects thereof).
[0051] 本発明のコンピュータプログラムによれば、当該コンピュータプログラムを格納する CD-ROM, DVD—ROM等の記録媒体から、当該コンピュータプログラムを、話者 認識システムに備えられたコンピュータに読み込んで実行させれば、或いは、当該コ ンピュータプログラムを通信手段を介してダウンロードさせた後に実行させれば、上 述した本発明の話者認識システムを比較的簡単に構築できる。これにより、上述した 本発明の話者認識システムの場合と同様に、話者認識における成りすまし或いは詐 称の再発の可能性或いは再発時の成功率を抑制可能となる。  [0051] According to the computer program of the present invention, the computer program is read from a recording medium such as a CD-ROM or DVD-ROM storing the computer program into a computer provided in the speaker recognition system and executed. If the computer program is downloaded via a communication means and then executed, the speaker recognition system of the present invention described above can be constructed relatively easily. As a result, as in the case of the speaker recognition system of the present invention described above, the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence can be suppressed.
[0052] 上記課題を解決するために、コンピュータ読取可能な媒体内のコンピュータプログ ラム製品は、コンピュータにより実行可能なプログラム命令を明白に具現ィ匕し、該コン ピュータを、本発明の第 1、第 2又は第 3の話者認識システムとして機能させる。  [0052] In order to solve the above problems, a computer program product in a computer-readable medium clearly embodies a computer-executable program instruction, and the computer is the first, It functions as a second or third speaker recognition system.
[0053] 本発明のコンピュータプログラム製品によれば、当該コンピュータプログラム製品を 格納する ROM、 CD-ROM, DVD-ROM,ハードディスク等の記録媒体から、当 該コンピュータプログラム製品をコンピュータに読み込めば、或いは、例えば伝送波 である当該コンピュータプログラム製品を、通信手段を介してコンピュータにダウン口 ードすれば、上述した本発明の第 1、第 2又は第 3の話者認識システムを比較的容易 に実施可能となる。更に具体的には、当該コンピュータプログラム製品は、上述した 本発明の第 1、第 2又は第 3の話者認識システムとして機能させるコンピュータ読取可 能なコード (或 、はコンピュータ読取可能な命令)力も構成されてよ 、。 以上詳細に 説明したように、本発明の話者認識システムによれば、認識手段、検知手段及び通 報手段を備えるので、話者認識における成りすまし或いは詐称の再発の可能性或い は再発時の成功率を抑制可能となる。更に、本発明のコンピュータプログラムによれ ば、コンピュータを認識手段、検知手段及び通報手段として機能させるので、上述し た本発明の話者認識システムを、比較的容易に構築できる。 [0053] According to the computer program product of the present invention, if the computer program product is read into a computer from a recording medium such as a ROM, CD-ROM, DVD-ROM, or hard disk storing the computer program product, or For example, if the computer program product, which is a transmission wave, is downloaded to a computer via communication means, the first, second, or third speaker recognition system of the present invention described above can be implemented relatively easily. It becomes. More specifically, the computer program product also has computer readable code (or computer readable instructions) power to function as the first, second, or third speaker recognition system of the present invention described above. Be composed. As described above in detail, according to the speaker recognition system of the present invention, the recognition means, the detection means, and the communication means. Since the reporting means is provided, the possibility of reoccurrence of impersonation or misrepresentation in speaker recognition or the success rate at the time of relapse can be suppressed. Furthermore, according to the computer program of the present invention, since the computer functions as a recognition means, a detection means, and a notification means, the above-described speaker recognition system of the present invention can be constructed relatively easily.
[0054] 本発明の作用及び他の利得は次に説明する実施例力 明らかにされよう。  [0054] The operation and other advantages of the present invention will be clarified in the embodiment described below.
図面の簡単な説明  Brief Description of Drawings
[0055] [図 1]本発明の第 1実施例に係る、話者認識システムの基本構成及び基本動作を概 念的に示すブロック図である。  FIG. 1 is a block diagram schematically showing the basic configuration and basic operation of a speaker recognition system according to a first example of the present invention.
[図 2]第 1実施例に係る、話者認識システムに備わる認識部の基本構成及び基本動 作を概念的に示すブロック図である。  FIG. 2 is a block diagram conceptually showing the basic structure and basic operation of a recognition unit provided in the speaker recognition system in the first example.
[図 3]第 2実施例に係る、話者認識システムの基本構成及び基本動作を概念的に示 すブロック図である。  FIG. 3 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the second example.
[図 4]第 3実施例に係る、話者認識システムの基本構成及び基本動作を概念的に示 すブロック図である。  FIG. 4 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the third example.
[図 5]第 4実施例に係る、話者認識システムの基本構成及び基本動作を概念的に示 すブロック図である。  FIG. 5 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the fourth example.
[図 6]第 5実施例に係る、話者認識システムの動作処理を示すフローチャートである。  FIG. 6 is a flowchart showing an operation process of the speaker recognition system according to the fifth embodiment.
[図 7]第 6実施例に係る、話者認識システムの動作処理を示すフローチャートである。  FIG. 7 is a flowchart showing an operation process of the speaker recognition system according to the sixth embodiment.
[図 8]第 7実施例に係る、話者認識システムの動作処理を示すフローチャートである。  FIG. 8 is a flowchart showing an operation process of the speaker recognition system according to the seventh embodiment.
[図 9]第 8実施例に係る、話者認識システムの動作処理を示すフローチャートである。  FIG. 9 is a flowchart showing an operation process of the speaker recognition system according to the eighth embodiment.
[図 10]第 9実施例に係る、話者認識システムの動作処理を示すフローチャートである  FIG. 10 is a flowchart showing an operation process of the speaker recognition system according to the ninth embodiment.
符号の説明 Explanation of symbols
[0056] 1 話者認識システム [0056] 1 Speaker recognition system
132 マイクロホン  132 Microphone
14 認識部  14 Recognition part
52 表示画面  52 Display screen
60 検知部 70 通報部 60 detector 70 Reporting Department
65 変更処理部  65 Change processing section
66 無効処理部  66 Invalidation processing part
80 履歴格納部  80 History storage
85 履歴データベース  85 Historical database
145 音声記録部  145 Audio recording unit
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0057] 以下、本発明を実施するための最良の形態について実施例毎に順に図面に基づ いて説明する。  Hereinafter, the best mode for carrying out the present invention will be described in each embodiment in order with reference to the drawings.
[0058] (1)第 1実施例  [0058] (1) First Example
第 1実施例に係る話者認識システムの構成及び動作処理を、図 1及び図 2を参照し て説明する。ここに、図 1は、本発明の第 1実施例に係る、話者認識システムの基本 構成及び基本動作を概念的に示すブロック図であり、図 2は、第 1実施例に係る、話 者認識システムに備わる認識部の基本構成及び基本動作を概念的に示すブロック 図である。  The configuration and operation process of the speaker recognition system according to the first embodiment will be described with reference to FIG. 1 and FIG. FIG. 1 is a block diagram conceptually showing the basic configuration and basic operation of the speaker recognition system according to the first embodiment of the present invention, and FIG. 2 is a speaker according to the first embodiment. It is a block diagram which shows notionally the basic composition and basic operation | movement of the recognition part with which a recognition system is equipped.
[0059] 図 1にお ヽて、本実施例に係る話者認識システム 1は、本発明に係る「認識手段」の 一例としてのマイクロホン 132及び認識部 14と、表示画面 52と、本発明に係る「検知 手段」の一例としての検知部 60と、本発明に係る「通報手段」の一例としての通報部 70とを備え、以下の構成下で話者 A121或いは詐称者 122の話者認識を行う。  Referring to FIG. 1, a speaker recognition system 1 according to the present embodiment includes a microphone 132 and a recognition unit 14 as examples of “recognition means” according to the present invention, a display screen 52, and the present invention. The detecting unit 60 as an example of the “detecting means” and the reporting unit 70 as an example of the “notifying means” according to the present invention are provided, and speaker recognition of the speaker A121 or the spoofer 122 is performed under the following configuration. Do.
[0060] マイクロホン 132は、話者 A121或いは詐称者 122がキーワードの発話を行う際、 該発話を電気信号に変換して話者認識システム 1に入力する機器である。  The microphone 132 is a device that converts the utterance into an electrical signal and inputs it to the speaker recognition system 1 when the speaker A 121 or the spoofer 122 utters a keyword.
[0061] 認識部 14は、例えばプロセッサ、メモリ等を備えたコンピュータ内にプログラムに従 つて論理的に構築されるものであり、話者認識時には、認識を求める任意の話者 (話 者 A121或いは詐称者 122)の発話と、登録された話者モデルとを照合することで、 力かる話者が、登録された話者モデルの話者 A121本人であるか否かを認識する。  [0061] The recognition unit 14 is logically constructed according to a program in a computer including a processor, a memory, and the like, for example, and at the time of speaker recognition, any speaker (speaker A121 or By comparing the utterances of the spoofer 122) and the registered speaker model, it is recognized whether or not the speaker who works is the speaker A121 of the registered speaker model.
[0062] ここで、図 2を用いて認識部 14について説明をカ卩える。  Here, the description of the recognition unit 14 will be described with reference to FIG.
[0063] 図 2において、本実施例に係る認識部 14は、音声部分抽出部 142と、特徴量算出 部 201と、類似度算出部 15と、話者モデルデータベース 45と、照合部 30とを備える [0064] ここに、音声部分抽出部 142は、例えばプロセッサ、メモリ等を備えたコンピュータ 内にプログラムに従って論理的に構築されるものであり、背景雑音と音声発話区間と のパワー差を利用する一般的な音声区間検出方法等により、入力される発話に係る 電気信号即ち発話データからキーワードが発話されて!、る発話音声部分を切り出す 演算装置である。 In FIG. 2, the recognizing unit 14 according to the present embodiment includes a speech part extracting unit 142, a feature amount calculating unit 201, a similarity calculating unit 15, a speaker model database 45, and a matching unit 30. Prepare Here, the voice partial extraction unit 142 is logically constructed in accordance with a program in a computer having a processor, a memory, and the like, for example, and generally uses a power difference between background noise and a voice utterance section. This is an arithmetic unit that cuts out an utterance voice portion when a keyword is uttered from an electrical signal related to an input utterance, that is, utterance data, by a typical voice segment detection method or the like.
[0065] 特徴量算出部 201は、例えばプロセッサ、メモリ等を備えたコンピュータ内にプログ ラムに従って論理的に構築されるものであり、入力される発話音声部分を特徴量に 変換する。かかる特徴量は、 MFCC (Mel Frequency Cepstrum Coefficient: MFCC) 、 LPC (Linear Predictive Coding : LPC)ケプストラム等によって変換される演算装置 である。  [0065] The feature amount calculation unit 201 is logically constructed in accordance with a program in a computer having, for example, a processor, a memory, and the like, and converts an input utterance voice portion into a feature amount. Such a feature amount is an arithmetic device that is converted by an MFCC (Mel Frequency Cepstrum Coefficient: MFCC), an LPC (Linear Predictive Coding: LPC) cepstrum, or the like.
[0066] 類似度算出部 15は、例えばプロセッサ、メモリ等を備えたコンピュータ内にプロダラ ムに従って論理的に構築されるものであり、キーワードが発話されている発話音声部 分の特徴量と、話者モデルデータベース 45に予め登録されているパスワードに対応 する音声の特徴量との類似度の算出を行う。  [0066] The similarity calculation unit 15 is logically constructed according to a program in a computer having a processor, a memory, and the like, for example. The similarity with the voice feature corresponding to the password registered in advance in the user model database 45 is calculated.
[0067] 照合部 30は、例えばプロセッサ、メモリ等を備えたコンピュータ内にプログラムに従 つて論理的に構築されるものであり、算出された類似度が本人に相当する類似度を 示す所定基準に達している力否かを確認し、話者 A121或いは詐称者 122が、登録 された話者 A121本人である力否かを照合し、この照合結果 (例えば、話者認識が成 功か失敗か)を出力する。尚、本人に相当する類似度を示す所定基準は、適宜変更 され得る値でもよい。具体的には、詐称者 132が失敗を重ねるにつれて、その失敗 が検知部 60により検知され、より失敗しやすくなるように所定基準の変更が行われる と、一層詐称され難くなる。  [0067] The collation unit 30 is logically constructed according to a program in a computer including a processor, a memory, and the like, for example, and the calculated similarity is based on a predetermined standard indicating the similarity corresponding to the person. The speaker A121 or the spoofer 122 verifies whether or not the registered speaker A121 is a registered person, and the verification result (for example, whether the speaker recognition is successful or unsuccessful). ) Is output. The predetermined standard indicating the degree of similarity corresponding to the person may be a value that can be changed as appropriate. Specifically, as the spoofer 132 repeatedly fails, the failure is detected by the detection unit 60, and if the predetermined standard is changed to make the failure easier, the spoofer 132 becomes more difficult to be spoofed.
[0068] 再び図 1に戻り、表示画面 52は、例えば液晶ディスプレイ等であり、認識結果を表 示する表示機器であり、認識部 14による認識の結果、例えば本人と認識されれば認 識成功のメッセージを、本人と認識されなければ認識失敗のメッセージを表示する。  [0068] Returning to FIG. 1 again, the display screen 52 is, for example, a liquid crystal display or the like, and is a display device that displays a recognition result. If the message is not recognized as the person, a recognition failure message is displayed.
[0069] 検知部 60は、認識部 14において話者認識が所定回数に亘つて連続して失敗した か否かを検知する。例えば、話者認識が、同一機会に又は異なる機会に跨って、 5 回に亘つて連続して失敗した場合には、もはや本人ではなく詐称者である可能性が 比較的高いからである。そして、この話者認識が失敗した旨を示す失敗情報が通報 部 70に送られる。 [0069] The detection unit 60 detects whether the recognition unit 14 has failed in speaker recognition continuously for a predetermined number of times. For example, if speaker recognition is on the same occasion or across different occasions, This is because if there are consecutive failures, the possibility of being a misrepresenter rather than the person himself is relatively high. Then, failure information indicating that the speaker recognition has failed is sent to the notification unit 70.
[0070] 通報部 70は、この話者認識が失敗した旨を示す失敗情報を、例えばディスプレイ 等を介して、一のユーザ本人 (この場合は話者 A121)に対して通報する。この際、予 め設定された電子メール、電話等の通信手段を介せば、失敗情報が話者 A121に対 して遅延なく通報される。或いは、その失敗情報等が、その失敗の次の機会に話者 A 121が話者認識を行う際に通報されば、話者 A121が通信手段を何ら所持しな!ヽ 場合にも対応できる。力 tlえて、この失敗情報を通報する際に併せてパスワードを変更 すべき旨の通報も行えば、ノ スワードの漏洩にも対応できる。  [0070] The reporting unit 70 reports failure information indicating that the speaker recognition has failed to one user himself (in this case, the speaker A121) via, for example, a display. At this time, failure information is notified to the speaker A121 without delay through a preset communication means such as e-mail or telephone. Alternatively, if the failure information is notified when the speaker A 121 performs speaker recognition at the next opportunity of the failure, it is possible to cope with the case where the speaker A 121 does not have any communication means. If you also report that you should change your password when reporting this failure information, you can respond to leaks of your password.
[0071] 以上、図 1及び図 2によると、例えば話者 A121が認識を求める場合には話者認識 が成功する一方で、詐称者 122が認識を求める場合には、その失敗情報が好適に 話者 A121へと通報されるので、話者認識における成りすまし或いは詐称を好適に 回避し或いは予防可能となる。  [0071] As described above, according to Figs. 1 and 2, for example, when speaker A121 seeks recognition, speaker recognition succeeds, but when impostor 122 seeks recognition, failure information is preferably used. Since it is reported to speaker A121, impersonation or misrepresentation in speaker recognition can be suitably avoided or prevented.
[0072] (2)第 2実施例  [0072] (2) Second embodiment
続いて、第 2実施例に係る話者認識システムの構成及び基本的な動作を、図 1〖こ 加えて図 3を参照して説明する。ここに、図 3は、第 2実施例に係る、話者認識システ ムの基本構成及び基本動作を概念的に示すブロック図である。尚、図 3において、上 記図面に係る構成と同一の構成には同一の符号を付し、その説明は適宜省略する。  Next, the configuration and basic operation of the speaker recognition system according to the second embodiment will be described with reference to FIG. 3 in addition to FIG. FIG. 3 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the second example. In FIG. 3, the same components as those in the above drawings are given the same reference numerals, and the description thereof will be omitted as appropriate.
[0073] 図 3に係る話者認識システム 1は、図 1に係る話者認識システム 1にカ卩えて、本発明 に係る「変更処理手段」の一例としての変更処理部 65と、本発明に係る「無効処理手 段」の一例としての無効処理部 66とを更に備える。  [0073] The speaker recognition system 1 according to FIG. 3 is compared with the speaker recognition system 1 according to FIG. 1, and the change processing unit 65 as an example of the “change processing means” according to the present invention, It further includes an invalidation processing unit 66 as an example of the “invalid processing means”.
[0074] 例えば、検知部 60により話者認識が所定回数に亘つて連続して失敗したことが検 知された場合には、変更処理部 65が、話者認識に用いられるパスワードを変更する 処理を行う。或いは無効処理部 66が、パスワードを所定期間無効にする。  [0074] For example, when the detection unit 60 detects that speaker recognition has failed continuously for a predetermined number of times, the change processing unit 65 changes the password used for speaker recognition. I do. Alternatively, the invalidation processing unit 66 invalidates the password for a predetermined period.
[0075] 以上、図 3によると、一のユーザ本人に通報されて対策がとられるまでに、パスヮー ドが変更され、或いは無効にされるので、本実施例でも話者認識における成りすまし 或いは詐称の再発の可能性或いは再発時の成功率を迅速に抑制可能となる。 [0076] (3)第 3実施例 [0075] As described above, according to Fig. 3, since the password is changed or invalidated until a countermeasure is taken after being notified to one user, impersonation or misrepresentation in speaker recognition is also performed in this embodiment. The possibility of recurrence or the success rate at the time of recurrence can be quickly suppressed. [0076] (3) Third Example
続いて、第 3実施例に係る話者認識システムの構成及び基本的な動作を、図 3〖こ 加えて図 4を参照して説明する。ここに図 4は、第 3実施例に係る、話者認識システム の基本構成及び基本動作を概念的に示すブロック図である。尚、図 4において、上 記図面に係る構成と同一の構成には同一の符号を付し、その説明は適宜省略する。  Next, the configuration and basic operation of the speaker recognition system according to the third embodiment will be described with reference to FIG. 4 in addition to FIG. FIG. 4 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the third example. In FIG. 4, the same components as those in the above drawings are given the same reference numerals, and the description thereof will be omitted as appropriate.
[0077] 図 4に係る話者認識システム 1は、図 3に係る話者認識システム 1にカ卩えて、本発明 に係る「履歴格納手段」の一例としての履歴格納部 80及び履歴データベース 85を 更に備える。  The speaker recognition system 1 according to FIG. 4 has a history storage unit 80 and a history database 85 as examples of the “history storage means” according to the present invention in addition to the speaker recognition system 1 according to FIG. In addition.
[0078] 履歴格納部 80は、話者認識が失敗した旨を示す失敗情報を含む履歴情報を、履 歴データベース 85に格納する。履歴データベース 85に格納されるテーブル構造は 、テーブル 86のようになる。テーブル 86には例えば、話者認識が行われた日時、利 用された端末名、その際に連続して失敗した回数 (連続失敗回数)、そして認識結果 が格納される。そして、この履歴情報に基づき、連続失敗回数が所定回数 (例えば 5 回)より少ないまま認識部 14により一のユーザ本人と認識されれば、当該話者認識は 成功であるとされ、他方で一度も一のユーザ本人と認識されぬまま連続失敗回数が 所定回数以上になると、当該話者認識は失敗である上に詐称者 132によるものとさ れ、通報の対象となる。尚、一のユーザ本人による失敗回数が不用意に蓄積されぬ よう、連続失敗回数を数えるためのカウンタは、所定時間又は所定期間を経過した後 或いは認識が成功した後に初期値 0とされるとよい。或いは、所定時間又は所定期 間内に失敗が連続して発生した場合のみを、通報の対象としてもょ ヽ。  The history storage unit 80 stores history information including failure information indicating that speaker recognition has failed in the history database 85. The table structure stored in the history database 85 is as shown in Table 86. For example, the table 86 stores the date and time when speaker recognition was performed, the name of the terminal used, the number of consecutive failures (number of consecutive failures), and the recognition result. Then, based on this history information, if the recognition unit 14 recognizes one user himself / herself while the number of consecutive failures is less than a predetermined number (for example, 5 times), the speaker recognition is considered successful, and the other time once. However, if the number of consecutive failures exceeds a predetermined number without being recognized as one user, the recognition of the speaker will be a failure and will be made by the impersonator 132 and will be the subject of notification. It should be noted that the counter for counting the number of consecutive failures is set to an initial value of 0 after a predetermined time or a predetermined period of time or after successful recognition so that the number of failures by one user is not inadvertently accumulated. Good. Alternatively, only when failure occurs continuously within a certain time or within a certain time period, it may be the subject of notification.
[0079] 以上、図 4によると、履歴情報に基づき、一のユーザは再発防止に向けての各種対 策を採り、管理者は一のユーザを含む全てのユーザに対して対策を促すことが可能 となるので、話者認識における成りすまし或いは詐称の再発の可能性或いは再発時 の成功率を一層効率良く抑制可能となる。  [0079] As described above, according to FIG. 4, based on the history information, one user takes various measures to prevent recurrence, and the administrator prompts all users including one user to take measures. As a result, it is possible to more efficiently suppress the possibility of reoccurrence of spoofing or fraud in speaker recognition or the success rate at the time of relapse.
[0080] (4)第 4実施例  [0080] (4) Fourth embodiment
続いて、第 4実施例に係る話者認識システムの構成及び基本的な動作を、図 1及 び図 4に加えて図 5を参照して説明する。ここに図 5は、第 4実施例に係る、話者認識 システムの基本構成及び基本動作を概念的に示すブロック図である。尚、図 5におい て、上記図面に係る構成と同一の構成には同一の符号を付し、その説明は適宜省略 する。 Next, the configuration and basic operation of the speaker recognition system according to the fourth embodiment will be described with reference to FIG. 5 in addition to FIGS. FIG. 5 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the fourth example. It should be noted that in FIG. The same reference numerals are given to the same components as those in the drawings, and the description thereof will be omitted as appropriate.
[0081] 図 5に係る話者認識システム 1は、図 4に係る話者認識システム 1にカ卩えて、複数の 端末 (例えば端末 A91及び端末 B92)と、本発明に係る「音声記録手段」の一例とし ての音声記録部 145とを更に備える。  [0081] The speaker recognition system 1 according to FIG. 5 has a plurality of terminals (for example, a terminal A91 and a terminal B92) in addition to the speaker recognition system 1 according to FIG. And an audio recording unit 145 as an example.
[0082] 複数の端末は夫々マイクロホン 132及び表示画面 52を含み、そのうち例えば端末 A91は北海道の支店に、端末 B92は福岡県の支店に夫々設置される。そのネットヮ ーク構成は、例えば各端末をクライアント内に、その他をサーバ内に配置する、いわ ゆるクライアント 'サーバ型でよい。但し、ネットワーク構成はこれに限らず、例えば履 歴データベース 85のみをサーバ内に配置するような構成でもよい。  [0082] Each of the plurality of terminals includes a microphone 132 and a display screen 52. For example, terminal A91 is installed in a branch in Hokkaido, and terminal B92 is installed in a branch in Fukuoka. The network configuration may be a so-called client-server type in which, for example, each terminal is arranged in a client and the others are arranged in a server. However, the network configuration is not limited to this, and for example, only the history database 85 may be arranged in the server.
[0083] 音声記録部 145は、当該話者認識が行われる際にマイクロホン 132に入力された 音声を例えば履歴情報に添えて、履歴データベース 85に記録する。そして、例えば 詐称者 122による不正な話者認識が試みられた際には、通報部 70による通報にカロ えて、記録された音声が再生される。この再生される音声に基いて、一のユーザは、 確実かつ迅速に不正使用を確認できるので、パスワードの変更等の処理を迅速に行 うことち可會となる。  The voice recording unit 145 records the voice input to the microphone 132 when the speaker recognition is performed, in the history database 85 along with the history information, for example. Then, for example, when an unauthorized speaker recognition is attempted by the spoofer 122, the recorded voice is reproduced in response to the report by the report unit 70. Based on the reproduced voice, one user can confirm the unauthorized use surely and promptly, so that it is possible to quickly perform a process such as password change.
[0084] 履歴データベース 85に格納されるテーブル構造は、例えばテーブル 87のようにな る。テーブル 87には例えば、話者認識が行われた日時、利用された端末名、その端 末が地理的に何処に設置されているかを示す位置情報、音声データ、そして認識結 果が格納される。勿論これらに加えて、その際に連続して失敗した回数 (連続失敗回 数)が格納されてもよい。そして、この履歴情報に基づき、詐称者 122の割り出しをし 、或いは詐称を未然に防ぐ防衛策を施すようにしてもよい。例えば、 2006年 2月 1日に 北海道の端末 Aで話者認識が成功して 、るのにも関わらず、その 1分後に福岡県の 端末 Bで話者認識が試みられた場合、 1分間で北海道カゝら福岡県へ移動することは 物理的に或いは技術的に不可能であるとして、たとえ連続失敗回数が所定回数に至 らずとも、この端末 Bでの話者認識を中断して一のユーザ本人に力かる事実を通報 する。この際、音声データも記録されているので、詐称者の特定が飛躍的に高まる。  The table structure stored in the history database 85 is, for example, a table 87. Table 87 stores, for example, the date and time when speaker recognition was performed, the name of the terminal used, location information indicating where the terminal was geographically located, voice data, and recognition results. . Of course, in addition to these, the number of consecutive failures (number of consecutive failures) may be stored. Then, based on this history information, a deceiter 122 may be determined, or a defensive measure may be taken to prevent fraud. For example, if the speaker recognition succeeded on the terminal A in Hokkaido on February 1, 2006, and the speaker recognition was attempted on the terminal B in Fukuoka Prefecture 1 minute later, Because it is physically or technically impossible to move from Hokkaido to Fukuoka Prefecture, the speaker recognition at this terminal B is interrupted even if the number of consecutive failures has not reached the predetermined number. Report facts that help one user. At this time, since voice data is also recorded, the identification of an impersonator is dramatically increased.
[0085] 以上、図 5によると、失敗情報に加えて、音声データ、日時及び位置情報等も記録 されているので、詐称者の特定速度及び精度を高め、詐称者の行動パターンも把握 され、もって話者認識における成りすまし或いは詐称の再発を一層的確に抑制可能 となる。この際、上記情報を物理的な困難性の観点等力も分析することで、話者認識 における成りすまし'詐称も予防され得る。 As described above, according to FIG. 5, in addition to failure information, audio data, date and position information, etc. are recorded. Therefore, it is possible to increase the specific speed and accuracy of fraudsters and to understand the behavioral patterns of the fraudsters, and thus more accurately suppress spoofing or recurrence of spoofing. At this time, the impersonation in the speaker recognition can be prevented by analyzing the above information from the viewpoint of physical difficulty.
[0086] (5)第 5実施例  [0086] (5) Fifth embodiment
続いて、第 5実施例に係る話者認識システムにおける話者モデル登録装置の基本 的な動作を、図 5に加えて図 6を参照して説明する。ここに図 6は、第 5実施例に係る 、話者認識システムの動作処理を示すフローチャートである。尚、本実施例における 構成は、第 4実施例に係る構成と同一でよぐ同一の構成には同一の符号を付し、そ の説明は適宜省略する。  Next, the basic operation of the speaker model registration apparatus in the speaker recognition system according to the fifth embodiment will be described with reference to FIG. 6 in addition to FIG. FIG. 6 is a flowchart showing the operation process of the speaker recognition system according to the fifth embodiment. Note that the configuration in the present embodiment is the same as that in the fourth embodiment, and the same components are denoted by the same reference numerals, and the description thereof will be omitted as appropriate.
[0087] 図 6において、話者認識にあたりユーザにより音声が入力されると、認識部 14は、 入力された音声と、予め登録されている音声との類似度を算出し (ステップ S1)、この 類似度が所定閾値を超えるか否かによってこのユーザが一のユーザ本人力否かが 判断される (ステップ S2)。例えば、所定閾値を 0. 8とし、 0から 1の範囲内で求められ る類似度が 0. 8以上である力否かにより、一のユーザ本人か否かが判断される。  In FIG. 6, when speech is input by the user for speaker recognition, the recognition unit 14 calculates the similarity between the input speech and the speech registered in advance (step S1). It is determined whether or not the user has one person himself / herself based on whether or not the similarity exceeds a predetermined threshold (step S2). For example, a predetermined threshold value is set to 0.8, and it is determined whether or not the user is the user himself / herself based on whether or not the similarity obtained within the range of 0 to 1 is 0.8 or more.
[0088] ここで、類似度が所定閾値を超えない場合 (ステップ S2 : No)、続いて、検知部 14 により、連続認識失敗回数が既定値を超えるカゝ否かが判断され、認識失敗フラグが セットされる (ステップ S32)。例えば、既定値を 5回とし、連続認識失敗回数が 5回を 超えるか否かが判断される。  [0088] Here, when the similarity does not exceed the predetermined threshold (step S2: No), subsequently, the detection unit 14 determines whether or not the number of continuous recognition failures exceeds a predetermined value, and the recognition failure flag. Is set (step S32). For example, the default value is 5 times, and it is determined whether the number of consecutive recognition failures exceeds 5.
[0089] ここで、未だ連続認識失敗回数が既定値の 5回を超えない場合 (ステップ S32 : No [0089] If the number of consecutive recognition failures still does not exceed the default value of 5 (step S32: No
)、ユーザは再度音声を入力し直して話者認識を再試行する。一のユーザ本人でも 雑音や体調により何度か失敗することはあり得る力 である。 ) The user tries to recognize the speaker again by inputting the voice again. It is possible for a single user to fail several times due to noise or physical condition.
[0090] 他方で、連続認識失敗回数が既定値の 5回を超える場合 (ステップ S32 : Yes)、も はや雑音如何が言い訳にならない失敗回数であり、詐称者である可能性が比較的 高いとして、当該話者認識の処理が全体として失敗した旨が表示画面 52に表示され る。力!]えて、無効処理部 66はパスワードを一時的に無効にし、履歴格納部 80は当該 話者認識に関する失敗情報を含む履歴情報を、音声記録部 145は当該話者認識に おいて入力された音声データを夫々履歴データベース 85に格納する(ステップ S42 ) o [0090] On the other hand, if the number of consecutive recognition failures exceeds the default value of 5 (step S32: Yes), the number of failures is no longer an excuse, and there is a relatively high possibility of being a spoofer. As a result, the display screen 52 displays that the speaker recognition process as a whole has failed. Power! The invalidation processing unit 66 temporarily invalidates the password, the history storage unit 80 stores the history information including failure information related to the speaker recognition, and the voice recording unit 145 stores the voice information input during the speaker recognition. Each data is stored in the history database 85 (step S42). ) o
[0091] 他方で、類似度が所定閾値を超える場合 (ステップ S2 : Yes)、即ち、一のユーザ本 人であると認識される場合、基本的にこの時点で認識は成功である。そして詐称者の 出現がある場合には対策を促すために、認識失敗フラグがセットされている力否かに 基!、て、前回の話者認識が失敗して 、る力否かが確認される (ステップ S31)。  On the other hand, when the degree of similarity exceeds a predetermined threshold (step S2: Yes), that is, when it is recognized that the user is one, the recognition is basically successful at this point. And if there is an appearance of an impersonator, based on whether or not the recognition failure flag is set to prompt countermeasures, it is confirmed whether or not the previous speaker recognition has failed. (Step S31).
[0092] ここで、前回の話者認識が失敗している場合 (ステップ S31: Yes)、通報部 70は、 認識失敗履歴 (即ち、詐称者による認識が行われたという事実)を今回の認識に成功 したユーザに対して通報し、これを受けたユーザが身に覚えがなければパスワードを 変更する等の対策を採ることでシステムの安全性を確保できる (ステップ S41)。逆に 、身に覚えがあれば変更をする必要はない。  [0092] Here, when the previous speaker recognition has failed (step S31: Yes), the reporting unit 70 recognizes the recognition failure history (that is, the fact that recognition by the impersonator has been performed) this time. The security of the system can be ensured by taking measures such as changing the password if the user who has received the notification is not able to remember the password (step S41). On the other hand, if you remember, you do not need to change it.
[0093] 他方で、前回の話者認識が失敗していない場合 (ステップ S31 :No)、特に詐称者 による認識が行われたという形跡もないので、そのまま認識成功処理として、このュ 一ザが許可され、表示画面 52にその旨が表示される(ステップ S43)。  [0093] On the other hand, if the previous speaker recognition has not failed (step S31: No), there is no evidence that the recognition by the impersonator has been performed. This is permitted and a message to that effect is displayed on the display screen 52 (step S43).
[0094] 以上、本実施例では、図 6に示した処理が行われるので、好適に話者認識が行わ れる。特に、認識が連続して失敗する際の対策が採られており、話者認識における 成りすまし或 、は詐称を好適に回避し或 ヽは予防可能となる。  As described above, in the present embodiment, since the processing shown in FIG. 6 is performed, speaker recognition is preferably performed. In particular, measures are taken when recognition fails continuously, and impersonation or misrepresentation in speaker recognition is preferably avoided or flaws can be prevented.
[0095] (6)第 6実施例  [0095] (6) Sixth embodiment
続いて、第 6実施例に係る話者認識システムにおける話者モデル登録装置の基本 的な動作を、図 5及び図 6に加えて図 7を参照して説明する。ここに図 7は、第 6実施 例に係る、話者認識システムの動作処理を示すフローチャートである。尚、本実施例 における構成は、第 4実施例に係る構成と同一でよぐ同一の構成には同一の符号 を付し、その説明は適宜省略する。また、第 5実施例と同一のステップには、同一の 符号を付し、その説明は適宜省略する。  Next, the basic operation of the speaker model registration apparatus in the speaker recognition system according to the sixth embodiment will be described with reference to FIG. 7 in addition to FIGS. FIG. 7 is a flowchart showing the operation process of the speaker recognition system according to the sixth embodiment. The configuration in the present embodiment is the same as that in the fourth embodiment, and the same reference numerals are given to the same configurations, and the description thereof is omitted as appropriate. The same steps as those in the fifth embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate.
[0096] 図 7では図 6と比べて特に、認識を失敗するたびに、一のユーザ本人以外の認識が さらに困難になるように、認識部 14による類似度判定の閾値を上げる処理が追加さ れている(ステップ S 52)。  [0096] Compared to FIG. 6, in FIG. 7, a process for increasing the threshold for similarity determination by the recognition unit 14 is added so that each time recognition fails, it becomes more difficult to recognize other than one user. (Step S52).
[0097] 従って、本実施例では、詐称者が、連続して認識失敗する度に一のユーザ本人の 音声に近づくように学習して認識に成功してしまうような事態を回避可能となり、実践 上非常に有利である。 [0097] Therefore, in this embodiment, it is possible to avoid a situation in which an impersonator succeeds in recognition by learning to approach one user's voice every time recognition failure continues. Very advantageous.
[0098] (7)第 7実施例  [0098] (7) Seventh embodiment
続いて、第 7実施例に係る話者認識システムにおける話者モデル登録装置の基本 的な動作を、図 5及び図 6に加えて図 8を参照して説明する。ここに図 8は、第 7実施 例に係る、話者認識システムの動作処理を示すフローチャートである。尚、本実施例 における構成は、第 4実施例に係る構成と同一でよぐ同一の構成には同一の符号 を付し、その説明は適宜省略する。また、第 5実施例と同一のステップには、同一の 符号を付し、その説明は適宜省略する。  Next, the basic operation of the speaker model registration apparatus in the speaker recognition system according to the seventh embodiment will be described with reference to FIG. 8 in addition to FIG. 5 and FIG. FIG. 8 is a flowchart showing the operation process of the speaker recognition system according to the seventh embodiment. The configuration in the present embodiment is the same as that in the fourth embodiment, and the same reference numerals are given to the same configurations, and the description thereof is omitted as appropriate. The same steps as those in the fifth embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate.
[0099] 図 8では図 6と比べて特に、一のユーザ本人が普段使っている端末と今回使用して いる端末が異なるか否かを確認する処理が追加されている。具体的には、今回使つ ている端末が普段使っている端末力否かが判定される (ステップ S220)。この際、普 段使っている端末は、例えば、一のユーザ本人によって予め設定されるとよい。そし て、普段使っている端末と判定された場合 (ステップ S220 : Yes)、連続失敗回数の 既定値に exが代入される (ステップ S221)。他方で、普段使っている端末でないと判 定された場合 (ステップ S220 : No)、連続失敗回数の既定値に βが代入される (ステ ップ S222)。ここに、 α > βである。なぜなら、今回使用している端末が普段使って V、る端末ではな 、と 、うことは、一のユーザ本人以外の人物が認識処理を行おうとし ている可能性が高いと推定されるため、連続失敗回数の既定値を小さくした方が好 ましいからである。 In FIG. 8, in particular, compared to FIG. 6, a process for confirming whether or not a terminal that one user normally uses and a terminal that is currently used is different is added. Specifically, it is determined whether or not the terminal currently used is normally used (step S220). At this time, the terminal normally used may be set in advance by one user himself / herself, for example. If it is determined that the terminal is in normal use (step S220: Yes), ex is substituted for the default value of the number of consecutive failures (step S221). On the other hand, if it is determined that the terminal is not normally used (step S220: No), β is substituted for the default value of the number of consecutive failures (step S222). Here, α > β. This is because it is estimated that there is a high possibility that a person other than the user himself is trying to perform the recognition process if the terminal being used this time is not a V terminal that is normally used. This is because it is preferable to reduce the default number of consecutive failures.
[0100] 以上、本実施例では、使用される端末から一のユーザ本人か否かの可能性を絞込 まれ、一層適切に話者認識が行われる。  [0100] As described above, in this embodiment, the possibility of being one user himself / herself is narrowed down from the terminals used, and speaker recognition is performed more appropriately.
[0101] (8)第 8実施例  [0101] (8) Eighth Example
続いて、第 8実施例に係る話者認識システムにおける話者モデル登録装置の基本 的な動作を、図 5及び図 6に加えて図 9を参照して説明する。ここに図 9は、第 8実施 例に係る、話者認識システムの動作処理を示すフローチャートである。尚、本実施例 における構成は、第 4実施例に係る構成と同一でよぐ同一の構成には同一の符号 を付し、その説明は適宜省略する。また、第 5実施例と同一のステップには、同一の 符号を付し、その説明は適宜省略する。 [0102] 図 9では図 6と比べて特に、前回と今回使っている端末間の移動可能性に基いて、 話者認識を補助する処理が追加されている。具体的には、先ず、予め設定された各 端末の地理的な情報カゝら前回の認識と今回の認識とで使用された端末間の距離で ある使用端末間距離 Dが取得される (ステップ S225)。カロえて、前回の認識時と今回 の認識時との時刻差である使用時刻差 Tが取得される (ステップ S226)。続いて、こ の使用端末間距離 Dと、使用時刻差 Tとから、前回の認識で使用された端末と今回 の認識で使用された端末との間の移動速度 Vが、 V=DZTとして算出される (ステツ プ S227)。そして、この移動速度 Vが所定速度閾値を超える力否かが判断される (ス テツプ 321)。ここに、「所定速度閾値」とは、移動するのが困難或いは不可能な速度 として予め設定された値であり、例えば lOOOkmZhである。ここで、移動速度 Vが所 定速度閾値を超える場合 (ステップ S321: Yes)、一のユーザ本人がこのよう速度で 移動するとは考えられない、即ち、詐称者による不正使用である疑いが強いので、認 識失敗処理等が行われる (ステップ S42)。他方、移動速度 Vが所定速度閾値を超え ない場合 (ステップ S321 :No)、移動速度 Vからは不正使用の疑いを推定できない ので、引き続き話者認識が行われる (ステップ Sl)。 Next, the basic operation of the speaker model registration apparatus in the speaker recognition system according to the eighth embodiment will be described with reference to FIG. 9 in addition to FIGS. FIG. 9 is a flowchart showing the operation processing of the speaker recognition system according to the eighth embodiment. The configuration in the present embodiment is the same as that in the fourth embodiment, and the same reference numerals are given to the same configurations, and the description thereof is omitted as appropriate. The same steps as those in the fifth embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate. [0102] Compared to Fig. 6, Fig. 9 adds processing to assist speaker recognition based on the possibility of movement between the previous and current terminals. Specifically, first, the distance D between used terminals, which is the distance between the terminals used in the previous recognition and the current recognition, is acquired from the preset geographical information of each terminal (step S225). The use time difference T, which is the time difference between the previous recognition and the current recognition, is acquired (step S226). Next, based on the distance D between the terminals used and the time difference T, the moving speed V between the terminal used in the previous recognition and the terminal used in the current recognition is calculated as V = DZT. (Step S227). Then, it is determined whether or not the moving speed V exceeds a predetermined speed threshold (step 321). Here, the “predetermined speed threshold” is a value set in advance as a speed at which it is difficult or impossible to move, for example, lOOOkmZh. Here, if the moving speed V exceeds the predetermined speed threshold (step S321: Yes), it is unlikely that one user will move at such speed, that is, there is a strong suspicion of fraudulent use by an impersonator. Then, recognition failure processing is performed (step S42). On the other hand, when the moving speed V does not exceed the predetermined speed threshold (step S321: No), since the suspected unauthorized use cannot be estimated from the moving speed V, speaker recognition is continued (step Sl).
[0103] 以上、本実施例では、前回と今回使って!/、る端末間の移動可能性に基 、て、不正 使用の疑いが推定されるので、一層適切に話者認識が行われる。  [0103] As described above, in the present embodiment, since the suspected unauthorized use is estimated based on the possibility of movement between the previous and current terminals! /, The speaker recognition is performed more appropriately.
[0104] (9)第 9実施例  [9] Ninth embodiment
続いて、第 9実施例に係る話者認識システムにおける話者モデル登録装置の基本 的な動作を、図 5及び図 6に加えて図 10を参照して説明する。ここに図 10は、第 9実 施例に係る、話者認識システムの動作処理を示すフローチャートである。尚、本実施 例における構成は、第 4実施例に係る構成と同一でよぐ同一の構成には同一の符 号を付し、その説明は適宜省略する。また、第 5実施例と同一のステップには、同一 の符号を付し、その説明は適宜省略する。  Next, the basic operation of the speaker model registration apparatus in the speaker recognition system according to the ninth embodiment will be described with reference to FIG. 10 in addition to FIG. 5 and FIG. FIG. 10 is a flowchart showing the operation process of the speaker recognition system according to the ninth embodiment. Note that the configuration in the present embodiment is the same as that in the fourth embodiment, and the same components are denoted by the same reference numerals, and the description thereof will be omitted as appropriate. The same steps as those in the fifth embodiment are denoted by the same reference numerals, and the description thereof is omitted as appropriate.
[0105] 図 10では図 6と比べて特に、連続認識失敗回数が既定値を超える場合 (ステップ S 32 : Yes)に、失敗情報を通報するタイミングが異なる。具体的には、図 6において失 敗情報が通報されるタイミングは、この失敗情報に係る認識以降の認識において認 識が成功する時 (ステップ S41)であり、失敗時との間に比較的長いタイムラグが生じ る。それに対して、図 7において失敗情報が通報されるタイミングは、この失敗情報に 係る認識が行われる時点 (ステップ S422)であり、失敗時との間に生じるタイムラグは 比較的短くて済む。故に、一のユーザ本人や当該話者認識システムの管理者が迅 速に対策を講ずることが可能である。カロえて、例えばパスワードが一時的に無効にさ れている状況下で、一のユーザ本人がかかる無効の事実を知らずに、認識に失敗し てしまうことを回避可能となる。 [0105] In FIG. 10, in particular, when the number of consecutive recognition failures exceeds a predetermined value (step S32: Yes), the timing for reporting failure information differs. Specifically, the timing for reporting failure information in Fig. 6 is when recognition succeeds in recognition after recognition related to this failure information (step S41), and is relatively long between failure times. Time lag occurs The On the other hand, the timing at which failure information is reported in FIG. 7 is the point in time when the failure information is recognized (step S422), and the time lag that occurs between the failure and the failure can be relatively short. Therefore, one user or the administrator of the speaker recognition system can take quick measures. For example, in a situation where the password is temporarily invalidated, it is possible to avoid the failure of recognition by one user himself / herself without knowing the fact of such invalidation.
[0106] 以上、本実施例では、図 10に示した処理が行われるので、好適に話者認識が行わ れる。特に、認識が連続して失敗する際の、適切なタイミングで一のユーザ本人が通 報を受けるので、話者認識における成りすまし或いは詐称の再発の可能性或いは再 発時の成功率を迅速に抑制可能となる。  As described above, in the present embodiment, since the process shown in FIG. 10 is performed, speaker recognition is preferably performed. In particular, when recognition fails continuously, one user himself / herself is notified at an appropriate timing, so the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of reoccurrence can be quickly suppressed. It becomes possible.
[0107] 尚、本発明は、上述した実施例に限られるものではなぐ請求の範囲及び明細書全 体力も読み取れる発明の要旨、或いは思想に反しない範囲で適宜変更可能であり、 そのような変更を伴う話者認識システム及びコンピュータプログラムもまた、本発明の 技術的範囲に含まれるものである。  It should be noted that the present invention is not limited to the above-described embodiments, and can be appropriately changed within the scope of the invention and the gist of the invention, which can also read the whole specification of the specification, or the concept thereof. A speaker recognition system and a computer program with the above are also included in the technical scope of the present invention.
産業上の利用可能性  Industrial applicability
[0108] 本発明に係る話者認識システム及びコンピュータプログラムは、例えばカーナビ装 置、ネットバンキング装置、オートロック装置、コンピュータの認識装置等の各種コン ピュータ機器や各種電子電気機器に設けられ、そのユーザである話者の発話に基 ヽ て、話者認識を行う話者認識システムに利用可能である。 [0108] The speaker recognition system and the computer program according to the present invention are provided in various computer devices such as a car navigation device, a net banking device, an auto-lock device, and a computer recognition device, and various electronic electric devices. It can be used in a speaker recognition system that performs speaker recognition based on the utterance of the speaker.

Claims

請求の範囲 The scope of the claims
[1] 話者認識を行う認識手段と、  [1] recognition means for speaker recognition;
前記認識手段において一のユーザに係る前記話者認識が所定回数に亘つて連続 して失敗したか否かを検知する検知手段と、  Detecting means for detecting whether or not the speaker recognition related to one user in the recognition means has failed continuously for a predetermined number of times;
該検知手段により前記話者認識が前記所定回数に亘つて連続して失敗したことが 検知された場合に、前記話者認識が失敗した旨を示す失敗情報を、前記一のユー ザに対して通報する通報手段と  When the detection unit detects that the speaker recognition has failed continuously for the predetermined number of times, failure information indicating that the speaker recognition has failed is sent to the one user. Reporting means to report and
を備えることを特徴とする話者認識システム。  A speaker recognition system comprising:
[2] 話者認識を行う認識手段と、  [2] recognition means for speaker recognition;
前記認識手段において一のユーザに係る前記話者認識が失敗したカゝ否かを検知 する検知手段と、  Detecting means for detecting whether or not the speaker recognition related to one user in the recognition means has failed;
該検知手段により前記話者認識が失敗したことが検知された場合に、前記話者認 識が失敗した旨を示す失敗情報を含む履歴情報を格納する履歴格納手段と、 前記一のユーザに対して前記履歴情報を通報する通報手段と  A history storage means for storing history information including failure information indicating that the speaker recognition has failed when the detection means detects that the speaker recognition has failed; Reporting means for reporting the history information
を備えることを特徴とする話者認識システム。  A speaker recognition system comprising:
[3] 前記認識手段は、音声入力手段を介して前記話者認識を行 ヽ、 [3] The recognition means performs the speaker recognition via voice input means,
前記話者認識が行われる際に前記音声入力手段に入力された音声を記録する音 声記録手段を更に備え、  Voice recording means for recording the voice input to the voice input means when the speaker recognition is performed;
前記履歴格納手段は、前記履歴情報として、前記記録された音声を更に格納する ことを特徴とする請求の範囲第 2項に記載の話者認識システム。  The speaker recognition system according to claim 2, wherein the history storage means further stores the recorded voice as the history information.
[4] 前記通報手段は、前記履歴情報を通報すると共に、前記一のユーザに対して前記 記録された音声を再生する [4] The reporting means reports the history information and reproduces the recorded voice for the one user.
ことを特徴とする請求の範囲第 3項に記載の話者認識システム。  The speaker recognition system according to claim 3, wherein
[5] 前記通報手段は、前記検知手段によって検知された失敗の回数が連続して所定 回数を超える場合に、前記一のユーザに対して前記履歴情報を通報する [5] The reporting unit reports the history information to the one user when the number of failures detected by the detection unit continuously exceeds a predetermined number of times.
ことを特徴とする請求の範囲第 2項に記載の話者認識システム。  The speaker recognition system according to claim 2, wherein:
[6] 前記認識手段は、前記認識手段に通信手段を介して接続された端末からのァクセ スを介して前記話者認識を行 、、 前記履歴格納手段は、前記話者認識が行われる前記端末の端末名を更に前記履 歴情報に含めて格納し、 [6] The recognition means performs the speaker recognition via an access from a terminal connected to the recognition means via a communication means, The history storage means further stores the history information including the terminal name of the terminal on which the speaker recognition is performed,
前記格納された端末名が前記一のユーザによって普段使用される端末名と異なる 場合には、同じ場合に比べて、前記所定回数が少ない  When the stored terminal name is different from the terminal name normally used by the one user, the predetermined number of times is smaller than the same case.
ことを特徴とする請求の範囲第 5項に記載の話者認識システム。  The speaker recognition system according to claim 5, wherein:
[7] 前記認識手段は、前記認識手段に通信手段を介して接続された端末からのァクセ スを介して前記話者認識を行 、、 [7] The recognition means performs the speaker recognition via an access from a terminal connected to the recognition means via a communication means,
前記履歴格納手段は、当該話者認識の直近で成功した前記話者認識が行われた 日時及び前記端末の位置を更に前記履歴情報に含めて格納し、  The history storage means stores the date and time when the speaker recognition succeeded most recently in the speaker recognition and the position of the terminal further included in the history information,
当該話者認識における日時と前記格納された日時との時間差に対する、当該話者 認識における前記端末の位置と前記格納された前記端末の位置との距離差が、所 定速度閾値を超える場合には、前記検知手段により前記話者認識が失敗したことが 検知される  When the distance difference between the position of the terminal and the stored position of the terminal in the speaker recognition with respect to the time difference between the date and time of the speaker recognition and the stored date and time exceeds a predetermined speed threshold The detection means detects that the speaker recognition has failed.
ことを特徴とする請求の範囲第 2項に記載の話者認識システム。  The speaker recognition system according to claim 2, wherein:
[8] 前記通報手段は、通信手段を介して前記一のユーザに対して遅延なく通報するこ とを特徴とする請求の範囲第 1項に記載の話者認識システム。 8. The speaker recognition system according to claim 1, wherein the reporting unit reports to the one user without delay through a communication unit.
[9] 前記通報手段は、通信手段を介して前記一のユーザに対して遅延なく通報するこ とを特徴とする請求の範囲第 2項に記載の話者認識システム。 9. The speaker recognition system according to claim 2, wherein the reporting unit reports the one user without delay through a communication unit.
[10] 前記通報手段は、前記一のユーザが、前記話者認識手段において前記失敗した 話者認識の次の機会として前記話者認識を行う際に、前記一のユーザに対して通報 することを特徴とする請求の範囲第 1項に記載の話者認識システム。 [10] The reporting unit reports to the one user when the one user performs the speaker recognition as the next opportunity of the failed speaker recognition in the speaker recognition unit. The speaker recognition system according to claim 1, characterized in that:
[11] 前記通報手段は、前記一のユーザが、前記話者認識手段において前記失敗した 話者認識の次の機会として前記話者認識を行う際に、前記一のユーザに対して通報 することを特徴とする請求の範囲第 2項に記載の話者認識システム。 [11] The notifying means notifies the one user when the one user performs the speaker recognition as the next opportunity for the failed speaker recognition in the speaker recognition means. The speaker recognition system according to claim 2, wherein:
[12] 前記認識手段は、予め登録されているパスワードに対応する音声に基いて、前記 話者認識を行い、 [12] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance,
前記通報手段は、通報する際に、前記パスワードを変更すべき旨の通報を行う ことを特徴とする請求の範囲第 1項に記載の話者認識システム。 The speaker recognition system according to claim 1, wherein the reporting means reports that the password should be changed when reporting.
[13] 前記認識手段は、予め登録されているパスワードに対応する音声に基いて、前記 話者認識を行い、 [13] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance,
前記通報手段は、通報する際に、前記パスワードを変更すべき旨の通報を行う ことを特徴とする請求の範囲第 2項に記載の話者認識システム。  3. The speaker recognition system according to claim 2, wherein the reporting means reports that the password should be changed when reporting.
[14] 前記認識手段は、予め登録されているパスワードに対応する音声に基いて、前記 話者認識を行い、 [14] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance,
前記検知手段により検知された場合に、前記パスワードを変更する処理を行う変更 処理手段を更に備える  When it is detected by the detection means, it further comprises a change processing means for performing a process of changing the password.
ことを特徴とする請求の範囲第 1項に記載の話者認識システム。  The speaker recognition system according to claim 1, characterized in that:
[15] 前記認識手段は、予め登録されているパスワードに対応する音声に基いて、前記 話者認識を行い、 [15] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance.
前記検知手段により検知された場合に、前記パスワードを変更する処理を行う変更 処理手段を更に備える  When it is detected by the detection means, it further comprises a change processing means for performing a process of changing the password.
ことを特徴とする請求の範囲第 2項に記載の話者認識システム。  The speaker recognition system according to claim 2, wherein:
[16] 前記認識手段は、予め登録されているパスワードに対応する音声に基いて、前記 話者認識を行い、 [16] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance,
前記検知手段により検知された場合に、前記パスワードを所定期間無効にする無 効処理手段を更に備える  When it is detected by the detection means, it further comprises an invalidation processing means for invalidating the password for a predetermined period.
ことを特徴とする請求の範囲第 1項に記載の話者認識システム。  The speaker recognition system according to claim 1, characterized in that:
[17] 前記認識手段は、予め登録されているパスワードに対応する音声に基いて、前記 話者認識を行い、 [17] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance,
前記検知手段により検知された場合に、前記パスワードを所定期間無効にする無 効処理手段を更に備える  When it is detected by the detection means, it further comprises an invalidation processing means for invalidating the password for a predetermined period.
ことを特徴とする請求の範囲第 2項に記載の話者認識システム。  The speaker recognition system according to claim 2, wherein:
[18] 音声入力手段を介して話者認識を行う認識手段と、 [18] recognition means for performing speaker recognition via voice input means;
前記話者認識が行われる際に前記音声入力手段に入力された音声を記録する音 声記録手段と、  Voice recording means for recording the voice input to the voice input means when the speaker recognition is performed;
前記認識手段において一のユーザに係る前記話者認識が失敗したカゝ否かを検知 する検知手段と、 Detecting whether or not the speaker recognition for one user has failed in the recognition means Detecting means for
該検知手段により前記話者認識が失敗したことが検知された場合に、前記話者認 識が失敗したのに対応して記録された前記音声を含む履歴情報を格納する履歴格 納手段と  A history storage means for storing history information including the voice recorded corresponding to the failure of the speaker recognition when the detection means detects that the speaker recognition has failed;
を備えることを特徴とする話者認識システム。  A speaker recognition system comprising:
[19] 前記履歴格納手段は、前記話者認識が行われた日時、位置情報及び端末名のう ち少なくとも一つを更に前記履歴情報に含めて格納する  [19] The history storage means further stores at least one of the date / time, the location information, and the terminal name when the speaker recognition is performed in the history information.
ことを特徴とする請求の範囲第 2項に記載の話者認識システム。  The speaker recognition system according to claim 2, wherein:
[20] 前記履歴格納手段は、前記話者認識が行われた日時、位置情報及び端末名のう ち少なくとも一つを更に前記履歴情報に含めて格納する [20] The history storage means further stores at least one of the date / time, the location information, and the terminal name of the speaker recognition included in the history information.
ことを特徴とする請求の範囲第 18項に記載の話者認識システム。  The speaker recognition system according to claim 18, wherein:
[21] 前記検知手段により検知された場合に、前記認識手段において、前記話者認識が 、より失敗しやすくなるようにパラメータの変更が行われる [21] When detected by the detection means, the recognition means changes the parameters so that the speaker recognition is more likely to fail.
ことを特徴とする請求の範囲第 1項に記載の話者認識システム。  The speaker recognition system according to claim 1, characterized in that:
[22] 前記検知手段により検知された場合に、前記認識手段にお!、て、前記話者認識が 、より失敗しやすくなるようにパラメータの変更が行われる [22] When detected by the detection means, the recognition means! The parameters are changed so that the speaker recognition is more likely to fail.
ことを特徴とする請求の範囲第 2項に記載の話者認識システム。  The speaker recognition system according to claim 2, wherein:
[23] 前記検知手段により検知された場合に、前記認識手段にお!ヽて、前記話者認識が 、より失敗しやすくなるようにパラメータの変更が行われる [23] When detected by the detection means, the recognition means is changed to change parameters so that the speaker recognition is more likely to fail.
ことを特徴とする請求の範囲第 18項に記載の話者認識システム。  The speaker recognition system according to claim 18, wherein:
[24] 前記変更が行われるパラメータは、前記話者認識が失敗されたカゝ否かを判定する 際の基準となる、予め登録された前記一のユーザの音声と前記話者認識の際に入 力される音声との類似度の閾値である [24] The parameter to be changed includes a pre-registered voice of the one user and a speaker recognition that serve as a reference for determining whether or not the speaker recognition has failed. This is the threshold of similarity to the input audio
ことを特徴とする請求の範囲第 21項に記載の話者認識システム。  The speaker recognition system according to claim 21, wherein the speaker recognition system is characterized in that:
[25] 前記変更が行われるパラメータは、前記話者認識が失敗されたカゝ否かを判定する 際の基準となる、予め登録された前記一のユーザの音声と前記話者認識の際に入 力される音声との類似度の閾値である [25] The parameter to be changed includes a pre-registered voice of the one user and a speaker recognition that serve as a reference for determining whether or not the speaker recognition has failed. This is the threshold of similarity to the input audio
ことを特徴とする請求の範囲第 22項に記載の話者認識システム。 The speaker recognition system according to claim 22, wherein the speaker recognition system is characterized in that:
[26] 前記変更が行われるパラメータは、前記話者認識が失敗されたカゝ否かを判定する 際の基準となる、予め登録された前記一のユーザの音声と前記話者認識の際に入 力される音声との類似度の閾値である [26] The parameters to be changed include the one user's voice registered in advance and the speaker recognition used as a reference when determining whether or not the speaker recognition has failed. This is the threshold of similarity to the input audio
ことを特徴とする請求の範囲第 23項に記載の話者認識システム。  24. The speaker recognition system according to claim 23, characterized in that:
[27] 前記認識手段は、前記認識手段に通信手段を介して接続された端末からのァクセ スを介して前記話者認識を行 、、 [27] The recognizing unit performs the speaker recognition via an access from a terminal connected to the recognizing unit via a communication unit,
前記通報手段は、前記検知手段により前記話者認識が前記所定回数に亘つて連 続して失敗したことが検知された場合に加えて又は代えて、前記話者認識の失敗に 係る前記端末の時間的な位置及び空間的な位置のうち少なくとも一方につ 、ての所 定条件を満たさな 、場合に、前記一のユーザに対して通報する  In addition to or instead of the case where the detection unit detects that the speaker recognition has failed continuously for the predetermined number of times, the notification unit is configured so that the terminal of the terminal related to the speaker recognition failure If at least one of the temporal position and the spatial position does not satisfy the predetermined conditions, the user is notified.
ことを特徴とする請求の範囲第 1項に記載の話者認識システム。  The speaker recognition system according to claim 1, characterized in that:
[28] コンピュータを、請求の範囲第 1項に記載の話者認識システムとして機能させること を特徴とするコンピュータプログラム。 [28] A computer program for causing a computer to function as the speaker recognition system according to claim 1.
[29] コンピュータを、請求の範囲第 2項に記載の話者認識システムとして機能させること を特徴とするコンピュータプログラム。 [29] A computer program for causing a computer to function as the speaker recognition system according to claim 2.
[30] コンピュータを、請求の範囲第 18項に記載の話者認識システムとして機能させるこ とを特徴とするコンピュータプログラム。 [30] A computer program for causing a computer to function as the speaker recognition system according to claim 18.
PCT/JP2007/055434 2006-03-27 2007-03-16 Speaking persian recognition system and computer program WO2007111170A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2006086165A JP2009145924A (en) 2006-03-27 2006-03-27 Speaker recognition system and computer program
JP2006-086165 2006-03-27

Publications (1)

Publication Number Publication Date
WO2007111170A1 true WO2007111170A1 (en) 2007-10-04

Family

ID=38541090

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/055434 WO2007111170A1 (en) 2006-03-27 2007-03-16 Speaking persian recognition system and computer program

Country Status (2)

Country Link
JP (1) JP2009145924A (en)
WO (1) WO2007111170A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008050765A1 (en) * 2006-10-24 2008-05-02 Ihc Corp. Individual authentication system

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5370335B2 (en) * 2010-10-26 2013-12-18 日本電気株式会社 Speech recognition support system, speech recognition support device, user terminal, method and program
JP6385651B2 (en) * 2013-07-03 2018-09-05 三菱重工機械システム株式会社 On-vehicle device and spoofing detection method
JP7119558B2 (en) * 2018-05-16 2022-08-17 コニカミノルタ株式会社 Image processing device, image forming device, confidential information management method and program
JP7376985B2 (en) * 2018-10-24 2023-11-09 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Information processing method, information processing device, and program

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09127975A (en) * 1995-10-30 1997-05-16 Ricoh Co Ltd Speaker recognition system and information control method
JP2002064861A (en) * 2000-08-14 2002-02-28 Pioneer Electronic Corp User authentication system
JP2002278938A (en) * 2001-03-21 2002-09-27 Fuji Xerox Co Ltd Method and device for identifying individual, individual identification program and individual authenticating system
JP2004164130A (en) * 2002-11-11 2004-06-10 Ricoh Co Ltd Document management system using biological information, document management method using biological information, and program for running this method on computer
JP2005018186A (en) * 2003-06-24 2005-01-20 Hitachi Ltd Access control method, device, and its processing program
JP2005122251A (en) * 2003-10-14 2005-05-12 Nec Fielding Ltd Security system, server computer, security method, and program for security
JP2005165704A (en) * 2003-12-03 2005-06-23 Yokogawa Electric Corp Security check system
JP2005293490A (en) * 2004-04-05 2005-10-20 Hitachi Ltd Biometrics system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09127975A (en) * 1995-10-30 1997-05-16 Ricoh Co Ltd Speaker recognition system and information control method
JP2002064861A (en) * 2000-08-14 2002-02-28 Pioneer Electronic Corp User authentication system
JP2002278938A (en) * 2001-03-21 2002-09-27 Fuji Xerox Co Ltd Method and device for identifying individual, individual identification program and individual authenticating system
JP2004164130A (en) * 2002-11-11 2004-06-10 Ricoh Co Ltd Document management system using biological information, document management method using biological information, and program for running this method on computer
JP2005018186A (en) * 2003-06-24 2005-01-20 Hitachi Ltd Access control method, device, and its processing program
JP2005122251A (en) * 2003-10-14 2005-05-12 Nec Fielding Ltd Security system, server computer, security method, and program for security
JP2005165704A (en) * 2003-12-03 2005-06-23 Yokogawa Electric Corp Security check system
JP2005293490A (en) * 2004-04-05 2005-10-20 Hitachi Ltd Biometrics system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HIRATA T. ET AL.: "Bunsan Kankyo ni Okeru Security Shoseki Kaiseki Hoho no Kento", INFORMATION PROCESSING SOCIETY OF JAPAN KENKYU HOKOKU, vol. 97, no. 97, 17 October 1997 (1997-10-17), pages 43 - 48, XP003018315 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008050765A1 (en) * 2006-10-24 2008-05-02 Ihc Corp. Individual authentication system

Also Published As

Publication number Publication date
JP2009145924A (en) 2009-07-02

Similar Documents

Publication Publication Date Title
EP3469587B1 (en) Securely executing voice actions using contextual signals
EP3157005B1 (en) Hotword recognition
JP4573792B2 (en) User authentication system, unauthorized user discrimination method, and computer program
US20190051309A1 (en) Speaker identification and unsupervised speaker adaptation techniques
EP3740949B1 (en) Authenticating a user
US20170061968A1 (en) Speaker verification methods and apparatus
US7657431B2 (en) Voice authentication system
EP2622832B1 (en) Speech comparison
US9697836B1 (en) Authentication of users of self service channels
EP3147768A1 (en) Screen interface unlocking method and screen interface unlocking device
KR101552587B1 (en) Location-based access control for portable electronic device
CN106961418A (en) Identity identifying method and identity authorization system
WO2010047816A1 (en) Speaker verification methods and apparatus
WO2010047817A1 (en) Speaker verification methods and systems
KR20180014176A (en) Dynamic threshold for speaker verification
US20140075570A1 (en) Method, electronic device, and machine readable storage medium for protecting information security
WO2007111170A1 (en) Speaking persian recognition system and computer program
JP6407634B2 (en) Communication device, voice print data determination result notification method, and program
JP2007266944A (en) Telephone terminal and caller verification method
GB2584827A (en) Multilayer set of neural networks
WO2018187555A1 (en) System and method for providing suicide prevention and support
WO2022107242A1 (en) Processing device, processing method, and program
CN112565242B (en) Remote authorization method, system, equipment and storage medium based on voiceprint recognition
JP6772832B2 (en) Crime judgment device, relay system, telephone system, crime judgment method and program
US12010260B2 (en) Detecting synthetic sounds in call audio

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07738879

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 07738879

Country of ref document: EP

Kind code of ref document: A1