WO2007111170A1

WO2007111170A1 - Speaking persian recognition system and computer program

Info

Publication number: WO2007111170A1
Application number: PCT/JP2007/055434
Authority: WO
Inventors: Yoshihiro Kawazoe; Soichi Toyama; Mitsuya Komamura
Original assignee: Pioneer Corporation; Techexperts Incorporation
Priority date: 2006-03-27
Filing date: 2007-03-16
Publication date: 2007-10-04
Also published as: JP2009145924A

Abstract

A speaking person recognition system (1) comprises recognizing means (132, 14) for recognizing a speaking person and detecting means (60) for detecting whether or not speaking person recognition of a user by the recognizing means has failed continuously a predetermined times. The system (1) further comprises reporting means (70) for reporting failure information representing the failed speaking person recognition to the user when the fact that the speaking person recognition has failed continuously a predetermined time is detected by the detecting means.

Description

Specification

Speaker recognition system and computer program

Technical field

[0001] The present invention is provided in various computer devices such as a car navigation device, a net banking device, an auto-lock device, and a computer recognition device, and various electronic electric devices. The present invention relates to a technical field of a speaker recognition system for performing speaker recognition and a computer program for causing a computer to function as such a speaker recognition system.

Background art

[0002] In this kind of speaker recognition system, a text-fixed type or a text-dependent type in which uttered text used for recognition is registered in advance, and such registration is not necessary, and any text is recognized. There are three types: a text independent type or a non-text dependent type, and a text designation type in which text is designated for recognition at the time of recognition or each time (see Patent Document 1). As a typical speaker recognition configuration, for example, a technique including a speaker registration operation and a speaker recognition operation using an HMM (hidden Markov mode HMM) is shown (see Patent Document 1). Then, if recognition fails during the recognizing recognition operation, for example, it is rejected as another person (see Patent Document 2).

[0003] Patent Document 1: TECHNICAL REPORT OF IEICE.SP95-111 (1996- 01) P.17- P.24

Patent Document 2: Japanese Patent Laid-Open No. 2002-236666

Disclosure of the invention

Problems to be solved by the invention

However, for example, according to the techniques disclosed in Patent Document 1 and Patent Document 2 described above, it is a technique that pays attention to the recognition itself, and a countermeasure in the event that recognition fails should be sufficient. It's hard to say. For example, Patent Document 1 does not particularly mention the measures to be taken when recognition fails, or Patent Document 2 only rejects the impersonator as another person and does not release the lock. With such measures alone, there was an attempt to recognize by impersonation. In this case, there is a technical problem that the impersonator may neglect to try to impersonate again without taking any measures to make the user aware of the facts of power.

[0005] The present invention has been made in view of the above-mentioned problems, for example, and a speaker recognition system and a computer that can effectively prevent impersonation or spoofing in speaker recognition are used as such a speaker recognition system. It is an issue to provide a computer program that functions.

Means for solving the problem

[0006] (Speaker recognition system)

In order to solve the above problems, a first speaker recognition system according to the present invention includes a recognition unit that performs speaker recognition, and the speaker recognition related to one user in the recognition unit continuously for a predetermined number of times. A detection means for detecting whether or not the failure has occurred; and when the detection means detects that the speaker recognition has failed continuously for the predetermined number of times, it indicates that the speaker recognition has failed. Reporting means for reporting failure information to the one user.

[0007] According to the first speaker recognition system, the following recognition is performed at the speaker recognition stage.

That is, during the operation, first, speaker recognition is performed by a recognition means having, for example, a microphone, a camera, a processor, a memory, and the like. Here, “speaker recognition” means whether the speaker whose recognition is desired is the registered speaker (hereinafter also referred to as “one user”) or a fraudulent person. That is, whether speaker recognition succeeds or fails. Talking speaker recognition is typically based on the utterance of the speaker, but may be performed on the basis of fingerprints, irises, faces, etc. in addition to or instead of the utterance. Good.

[0009] Then, by the detecting means having, for example, a processor, a memory, etc., the recognition means detects whether or not the recognition means has failed continuously for a predetermined number of times. Here, the “predetermined number of times” is the number of times that the speaker can be presumed to be an impersonator. Typically, the number of times it is applied is, by experiment or simulation, extremely low when operating by one user himself / herself! Determined comprehensively. In addition, “continuous failure” means success over multiple times. It means that it will fail unavoidably, and there is a certain amount of time between two consecutive failure occurrences! You don't have to. However, simply speaking, when performing speaker recognition, of course, it also includes a case where failure occurs continuously at the same place, at the same opportunity, or during a series of operations.

[0010] As a result, when the detection means detects that speaker recognition has failed continuously for a predetermined number of times, failure information indicating that speaker recognition has failed is, for example, a display or the like. It is reported to one user by the reporting means comprising Here, the “report” mode is displayed on the display of the terminal on which the speaker recognition system is installed, as well as various modes as long as one user such as a preset e-mail or telephone can recognize the failure. You don't mind. At this time, in addition to one user, the administrator of the speaker recognition system is notified, so that a more accurate and quick response can be achieved.

As described above, according to the first speaker recognition system, impersonation or misrepresentation in speaker recognition can be suitably avoided or prevented. Even if an impersonator attempts to recognize fraud by impersonation, the failure information that the failure has occurred continuously is reported to one user or administrator without being discarded. By taking measures such as changes, the administrator can prompt all users including one user to take measures. In this way, countermeasures against fraudsters can be taken quickly and accurately, so that if a fraudster tries to impersonate again, the probability of successful recognition is further reduced, which is very advantageous in practice.

[0012] In order to solve the above problem, the ^second speaker recognition system of the present invention recognizes a speaker for recognition, and whether or not the speaker recognition for one user in the recognition unit fails. And a history storage unit for storing history information including failure information indicating that the speaker recognition has failed when the detection unit detects that the speaker recognition has failed. Reporting means for reporting the history information to the one user.

[0013] According to the second speaker recognition system, the following recognition is performed at the speaker recognition stage.

That is, during the operation, speaker recognition is first performed by a recognition means having a microphone, a camera, a processor, a memory, and the like. [0015] Then, for example, detection means including a processor, a memory, and the like detects whether or not the recognition means V has failed in speaker recognition for one user. Regarding the “whether or not failed” detected here, the number of consecutive failures is not limited. In other words, if it fails even once, it may be detected. In reality, it may be a failure based on an operation by a single user, but more than that, it takes into account the practical benefits of detecting a careful pretender without omission.

[0016] When the detection means detects that the speaker recognition has failed, the history storage means having a processor, memory, database, etc., for example, indicates that the speaker recognition has failed. History information including failure information is stored. Here, the “history information” is information in which an operation history including failure information for one user is recorded, and is typically accumulated in time series.

[0017] As a result, history information is reported to one user immediately or afterward without delay, or regularly or irregularly by the reporting means.

[0018] As described above, according to the second speaker recognition system, impersonation or misrepresentation in speaker recognition can be preferably avoided or prevented. Even if a cautious impersonator attempts to recognize fraud by impersonation, the failure information that the failure has occurred even once is thrown away, so that one user or administrator is notified. As a result, the administrator can prompt all users including one user to take countermeasures. In this way, countermeasures against fraudsters can be taken quickly and accurately, so that when a fraudster tries to impersonate again, the probability of successful recognition is further reduced, which is very advantageous in practice.

In one aspect of the first or second speaker recognition system according to the present invention, the recognition means performs the speaker recognition via a voice input means, and the speaker recognition is performed. And a voice recording means for recording the voice input to the voice input means, and the history storage means further stores the recorded voice as the history information.

[0020] According to this aspect, first, the recognition unit performs speaker recognition via the voice input unit. At this time, the input voice is recorded by voice recording means having, for example, a processor, a memory, a database, etc., and stored as history information by the history storage means. Paid. Therefore, it is possible to use the voice stored in this way as influential information for identifying the impersonator, and it is possible to improve the recognition performance of the speaker recognition system by learning the impersonator's voice. Become.

[0021] In the aspect of further storing the recorded voice, the reporting means may report the history information and reproduce the recorded voice for the one user.

[0022] According to this aspect, for example, when an unauthorized speaker recognition is attempted by an impersonator, the history information is reported to one user by the reporting means. Played sound is played. Therefore, one user can confirm the unauthorized use reliably and quickly based on the reproduced voice. As a result, it is possible to speed up the process of changing the password.

[0023] In the aspect including the history storage unit described above, the notification unit may provide the user with the one user when the number of failures detected by the detection unit continuously exceeds a predetermined number. You can report history information.

[0024] According to this aspect, when the number of failures detected by the detection unit continuously exceeds a predetermined number, the notification unit reports the history information to one user. At this time, the speaker recognition system can be controlled more flexibly by changing the predetermined number of times by, for example, history information.

[0025] In an aspect in which it is determined whether or not the number of failures continuously exceeds a predetermined number, the recognizing unit is connected via the access of a terminal power unit connected to the recognizing unit via a communication unit. The history storage means stores the terminal name of the terminal on which the speaker recognition is performed further included in the history information, and the stored terminal name is the one user. If the terminal name is different from the one normally used, the predetermined number of times may be reduced compared to the same case.

[0026] According to this aspect, if the terminal name is different from the terminal name that is normally used, there is a high possibility of fraudulent use by a fraudulent person. Is done. On the other hand, if it is the same as the terminal name that is usually used, there is a high possibility that the user is the same user, so a relatively permissive speaker recognition is performed. Thus, if the predetermined number of times is suitably changed based on information other than speech, the performance of the speaker recognition system will be reduced. It will be complemented and is very useful in practice.

Alternatively, in the aspect including the history storage unit described above, the recognition unit performs the speaker recognition via an access of a terminal power source connected to the recognition unit via a communication unit. The history storage means stores the date and time when the speaker recognition succeeded most recently in the speaker recognition and the position of the terminal are further included in the history information, and stores the date and time in the speaker recognition. When a difference in distance between the terminal position in the speaker recognition and the stored terminal position exceeds a predetermined speed threshold with respect to a time difference from the stored date and time, the detection means causes the talk to be performed. It may be detected that person recognition has failed.

[0028] According to this aspect, the following determination process is performed based on the history information stored by the history storage means. That is, the distance difference between the position of the terminal in the speaker recognition and the position of the stored terminal with respect to the time difference between the date and time in the speaker recognition and the stored date and time, that is, when the moving speed exceeds a predetermined speed threshold. Is detected by the detection means that speaker recognition has failed. Here, the “predetermined speed threshold” is a speed at which it is difficult or impossible to move realistically or physically. For example, a speed calculated based on the shortest distance search algorithm based on such a purpose. May be set in advance. Alternatively, it may be a value set by one user based on his / her own experience. Note that the threshold value to be used varies depending on the development of technology related to transportation means, and may be updated as appropriate. In this manner, in this aspect, the suspected unauthorized use is estimated based on the moving speed between the terminals. In other words, it is possible to complement speaker recognition from the viewpoint of mobility.

[0029] In another aspect of the first or second speaker recognition system according to the present invention, the reporting unit reports to the one user without delay via a communication unit.

[0030] According to this aspect, as described above, when the detection unit detects that the speaker recognition has failed continuously for a predetermined number of times, or when it is detected that the speaker recognition has failed. The failure information or history information is reported to one user via the communication means without delay by the reporting means. The “communication means” here specifically includes means capable of relatively quickly communicating with a single user, such as e-mail, landline phone, and mobile phone. Therefore, it is possible to quickly suppress the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence. [0031] In another aspect of the first or second speaker recognition system according to the present invention, the notifying means may be configured so that the one user is connected to the speaker recognition means following the failed speaker recognition. When the speaker recognition is performed as an opportunity, the user is notified.

[0032] According to this aspect, as described above, when the detection unit detects that speaker recognition has failed continuously for a predetermined number of times, or when it is detected that speaker recognition has failed. The failure information or history information is reported to one user by the reporting means when one user recognizes the speaker at the next opportunity of the failure. Therefore, even if there is no special communication means, it is possible to suppress the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence. Because it is for one user who is currently using the speaker recognition system, other measures are promoted at the same time as the notification, and one user can take measures on the spot. .

[0033] In another aspect of the first or second speaker recognition system according to the present invention, the recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance. The reporting means reports that the password should be changed when reporting.

[0034] According to this aspect, speaker recognition is performed by recognition means including, for example, a microphone, a camera, a processor, a memory, and the like, based on the voice corresponding to the password registered in advance! . When the speaker recognition failure information or history information is reported to one user by the reporting means, a notification that the password should be changed is also made. The reason for impersonators trying to impersonate is that there is a high possibility that the password is leaked. Therefore, the possibility of recurrence of impersonation or misrepresentation in speaker recognition or the success rate at the time of recurrence can be accurately suppressed.

[0035] In another aspect of the first or second speaker recognition system according to the present invention, the recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance. When it is detected by the detection means, it further comprises a change processing means for performing a process of changing the password.

[0036] According to this aspect, speaker recognition by the recognition means is performed based on the voice corresponding to the password registered in advance! If a failure in speaker recognition is detected by the detection means, for example, a change processing means having a processor, a memory, etc. Therefore, a process for changing the password is performed. For example, if it is detected by the detection means that the speaker recognition has failed continuously for a predetermined number of times, it is assumed that an incorrect speaker recognition process by impersonation is being performed and the password is automatically entered. Change to a temporary password. As a result, it becomes more difficult to attempt to impersonate with the same password. If the changed password is notified to one user by the notification means in consideration of security, there will be no problem when one user uses the speaker recognition system next time. Therefore, the possibility of recurrence of impersonation or fraud in speaker recognition or the success rate at the time of recurrence can be suppressed very quickly.

[0037] In another aspect of the first or second speaker recognition system according to the present invention, the recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance. The apparatus further includes invalidation processing means for invalidating the password for a predetermined period when detected by the detection means.

[0038] According to this aspect, speaker recognition by the recognition means is performed based on the voice corresponding to the password registered in advance! When a failure in speaker recognition is detected by the detecting means, the nose word is invalidated for a predetermined period by an invalid processing means having, for example, a processor and a memory. For example, if you have failed twice in succession, your password will be invalidated for one hour. Here, the predetermined time is a predetermined period of time that can be determined in advance as a period during which a fraudulent person can abandon consecutive attempts, or a period of time that is sufficient for a legitimate user to take measures. It may be made changeable by the user. Therefore, the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence can be quickly suppressed.

[0039] In order to solve the above problems, a third speaker recognition system of the present invention includes a recognition unit for performing speaker recognition via a voice input unit, and the voice input unit when the speaker recognition is performed. Voice recording means for recording the voice input to the stage, detection means for detecting whether or not the speaker recognition for one user in the recognition means has failed, and the speaker recognition by the detection means. And a history storage means for storing history information including the voice recorded in response to the failure of the speaker recognition when it is detected that the speaker has failed.

[0040] According to the third speaker recognition system, the speaker through the voice input means is recognized by the recognition means. Recognition is performed. Then, the voice input to the voice input means when the speaker recognition is performed is recorded by the voice recording means having a processor, a memory, a database, and the like. Simultaneously or in succession, the detection means detects whether or not the recognition means has failed in speaker recognition for one user. Here, when the detection means detects that the speaker recognition has failed, the history storage means stores the history information including the voice recorded corresponding to the failure of the speaker recognition. .

[0041] As described above, according to the third speaker recognition system, since the history information also includes voice, for example, the voice is analyzed, and the voice of another spoofer who tries to impersonate another user is analyzed. By performing comparisons, etc., or by confirming familiarity by authorized users, etc., it is possible to determine the impersonator image and use it to prevent impersonation. In this way, impersonation in speaker recognition!ヽ may be the recurrence of misrepresentation! Tsuji will be able to more efficiently control the success rate at the time of recurrence.

[0042] In another aspect of the second or third speaker recognition system, the history storage means further uses at least one of the date and time, position information, and terminal name of the speaker recognition as history information. Include and store.

[0043] According to this aspect, in addition to the failure information and voice data, at least one of the date and time of the speaker recognition, the position information, and the terminal name is further stored in the history information by the history storage means. The Therefore, in addition to failure information and voice data, date / time, location information, etc. are recorded, so the speed and accuracy of the impersonator can be increased, and the behavior pattern of the impersonator can be grasped, thus impersonating or impersonating the speaker. Recurrence can be more accurately suppressed. At that time, if it is estimated that the result is clearly impersonating as a result of considering the date and time, location information, etc., even if the number of consecutive failures does not reach the predetermined number, the speaker recognition is interrupted and You may make it report the fact which helps a user himself / herself. As a result, spoofing in the speaker recognition can be prevented.

[0044] In another aspect of the first, second, or third speaker recognition system according to the present invention, when the detection unit detects the speaker, the recognition unit further fails in the speaker recognition. The parameters are changed to make it easier.

[0045] According to this aspect, the speaker recognition is performed by the recognition means when detected by the detection means. The parameters are changed to make it easier to fail. Therefore, since the spoofer becomes more difficult to be recognized as the failure continues, the possibility of spoofing in the speaker recognition or the recurrence of the spoofing or the success rate at the time of the recurrence can be suppressed more accurately.

[0046] In the aspect in which the parameter is changed, the parameter to be changed is a parameter of the one user who has been registered in advance as a reference for determining whether or not the speaker recognition has failed. It may be a threshold value of the similarity between the voice and the voice input at the time of speaker recognition.

[0047] According to this aspect, each time it is detected that the speaker recognition has failed, the recognition means raises the similarity threshold. Therefore, when a spoofer tries to recognize the next speaker, speaker recognition is more likely to fail. Here, the degree to which the degree of similarity is raised may be set, for example, from the viewpoint of the impersonator's learning ability, and from the viewpoint of the voice fluctuation of one user due to physical condition, etc., respectively.

[0048] In another aspect of the first speaker recognition system according to the present invention, the recognizing unit is configured to connect the speaker via an access from a terminal power source connected to the recognizing unit via a communication unit. And the reporting means relates to the failure of the speaker recognition in addition to or instead of the case where the detection means detects that the speaker recognition has failed continuously for the predetermined number of times. If the predetermined condition for at least one of the temporal position and the spatial position of the terminal is not satisfied, the user is notified.

[0049] According to this aspect, during the operation, speaker recognition is performed by the recognition means via access from a terminal connected to the recognition means via the communication means. The terminals here are equipped with ATM (Auto Teller Machine: ATM) and GPS (Global Positioning System: GPS) functions, which are installed at bank branches or convenience stores, for example, and connected to a dedicated line. Mobile phones that can be used for mobile banking. In speaker recognition by such a recognition means, in addition to or instead of the case where the detection means detects that the speaker recognition has failed continuously for a predetermined number of times, that is, temporarily. Even if a failure is detected, if the predetermined condition for at least one of the temporal position and spatial position of the terminal related to the speaker recognition failure is not satisfied, the reporting means Report to the user. For example, the last device usage time and the current device usage time If it is judged that it is physically impossible to move within the powerful time difference taking into account the time difference between the two terminals and the distance between the two terminals, a report is made that the possibility of being a spoofer is relatively high. The Therefore, in addition to or instead of utterances, the possibility of recurrence of impersonation or misrepresentation in speaker recognition or the success rate at the time of recurrence by accurately grasping situations that are not common sense under the use of one user Can be suppressed more accurately.

[0050] (Computer program)

In order to solve the above problems, a computer program according to the present invention uses a computer provided in a speaker recognition system as a first, second or third speaker recognition system according to the present invention described above. Including various aspects thereof).

[0051] According to the computer program of the present invention, the computer program is read from a recording medium such as a CD-ROM or DVD-ROM storing the computer program into a computer provided in the speaker recognition system and executed. If the computer program is downloaded via a communication means and then executed, the speaker recognition system of the present invention described above can be constructed relatively easily. As a result, as in the case of the speaker recognition system of the present invention described above, the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of recurrence can be suppressed.

[0052] In order to solve the above problems, a computer program product in a computer-readable medium clearly embodies a computer-executable program instruction, and the computer is the first, It functions as a second or third speaker recognition system.

[0053] According to the computer program product of the present invention, if the computer program product is read into a computer from a recording medium such as a ROM, CD-ROM, DVD-ROM, or hard disk storing the computer program product, or For example, if the computer program product, which is a transmission wave, is downloaded to a computer via communication means, the first, second, or third speaker recognition system of the present invention described above can be implemented relatively easily. It becomes. More specifically, the computer program product also has computer readable code (or computer readable instructions) power to function as the first, second, or third speaker recognition system of the present invention described above. Be composed. As described above in detail, according to the speaker recognition system of the present invention, the recognition means, the detection means, and the communication means. Since the reporting means is provided, the possibility of reoccurrence of impersonation or misrepresentation in speaker recognition or the success rate at the time of relapse can be suppressed. Furthermore, according to the computer program of the present invention, since the computer functions as a recognition means, a detection means, and a notification means, the above-described speaker recognition system of the present invention can be constructed relatively easily.

[0054] The operation and other advantages of the present invention will be clarified in the embodiment described below.

Brief Description of Drawings

FIG. 1 is a block diagram schematically showing the basic configuration and basic operation of a speaker recognition system according to a first example of the present invention.

FIG. 2 is a block diagram conceptually showing the basic structure and basic operation of a recognition unit provided in the speaker recognition system in the first example.

FIG. 3 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the second example.

FIG. 4 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the third example.

FIG. 5 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the fourth example.

FIG. 6 is a flowchart showing an operation process of the speaker recognition system according to the fifth embodiment.

FIG. 7 is a flowchart showing an operation process of the speaker recognition system according to the sixth embodiment.

FIG. 8 is a flowchart showing an operation process of the speaker recognition system according to the seventh embodiment.

FIG. 9 is a flowchart showing an operation process of the speaker recognition system according to the eighth embodiment.

FIG. 10 is a flowchart showing an operation process of the speaker recognition system according to the ninth embodiment.

Explanation of symbols

[0056] 1 Speaker recognition system

132 Microphone

14 Recognition part

52 Display screen

60 detector 70 Reporting Department

65 Change processing section

66 Invalidation processing part

80 History storage

85 Historical database

145 Audio recording unit

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, the best mode for carrying out the present invention will be described in each embodiment in order with reference to the drawings.

[0058] (1) First Example

The configuration and operation process of the speaker recognition system according to the first embodiment will be described with reference to FIG. 1 and FIG. FIG. 1 is a block diagram conceptually showing the basic configuration and basic operation of the speaker recognition system according to the first embodiment of the present invention, and FIG. 2 is a speaker according to the first embodiment. It is a block diagram which shows notionally the basic composition and basic operation | movement of the recognition part with which a recognition system is equipped.

Referring to FIG. 1, a speaker recognition system 1 according to the present embodiment includes a microphone 132 and a recognition unit 14 as examples of “recognition means” according to the present invention, a display screen 52, and the present invention. The detecting unit 60 as an example of the “detecting means” and the reporting unit 70 as an example of the “notifying means” according to the present invention are provided, and speaker recognition of the speaker A121 or the spoofer 122 is performed under the following configuration. Do.

The microphone 132 is a device that converts the utterance into an electrical signal and inputs it to the speaker recognition system 1 when the speaker A 121 or the spoofer 122 utters a keyword.

[0061] The recognition unit 14 is logically constructed according to a program in a computer including a processor, a memory, and the like, for example, and at the time of speaker recognition, any speaker (speaker A121 or By comparing the utterances of the spoofer 122) and the registered speaker model, it is recognized whether or not the speaker who works is the speaker A121 of the registered speaker model.

Here, the description of the recognition unit 14 will be described with reference to FIG.

In FIG. 2, the recognizing unit 14 according to the present embodiment includes a speech part extracting unit 142, a feature amount calculating unit 201, a similarity calculating unit 15, a speaker model database 45, and a matching unit 30. Prepare Here, the voice partial extraction unit 142 is logically constructed in accordance with a program in a computer having a processor, a memory, and the like, for example, and generally uses a power difference between background noise and a voice utterance section. This is an arithmetic unit that cuts out an utterance voice portion when a keyword is uttered from an electrical signal related to an input utterance, that is, utterance data, by a typical voice segment detection method or the like.

[0065] The feature amount calculation unit 201 is logically constructed in accordance with a program in a computer having, for example, a processor, a memory, and the like, and converts an input utterance voice portion into a feature amount. Such a feature amount is an arithmetic device that is converted by an MFCC (Mel Frequency Cepstrum Coefficient: MFCC), an LPC (Linear Predictive Coding: LPC) cepstrum, or the like.

[0066] The similarity calculation unit 15 is logically constructed according to a program in a computer having a processor, a memory, and the like, for example. The similarity with the voice feature corresponding to the password registered in advance in the user model database 45 is calculated.

[0067] The collation unit 30 is logically constructed according to a program in a computer including a processor, a memory, and the like, for example, and the calculated similarity is based on a predetermined standard indicating the similarity corresponding to the person. The speaker A121 or the spoofer 122 verifies whether or not the registered speaker A121 is a registered person, and the verification result (for example, whether the speaker recognition is successful or unsuccessful). ) Is output. The predetermined standard indicating the degree of similarity corresponding to the person may be a value that can be changed as appropriate. Specifically, as the spoofer 132 repeatedly fails, the failure is detected by the detection unit 60, and if the predetermined standard is changed to make the failure easier, the spoofer 132 becomes more difficult to be spoofed.

[0068] Returning to FIG. 1 again, the display screen 52 is, for example, a liquid crystal display or the like, and is a display device that displays a recognition result. If the message is not recognized as the person, a recognition failure message is displayed.

[0069] The detection unit 60 detects whether the recognition unit 14 has failed in speaker recognition continuously for a predetermined number of times. For example, if speaker recognition is on the same occasion or across different occasions, This is because if there are consecutive failures, the possibility of being a misrepresenter rather than the person himself is relatively high. Then, failure information indicating that the speaker recognition has failed is sent to the notification unit 70.

[0070] The reporting unit 70 reports failure information indicating that the speaker recognition has failed to one user himself (in this case, the speaker A121) via, for example, a display. At this time, failure information is notified to the speaker A121 without delay through a preset communication means such as e-mail or telephone. Alternatively, if the failure information is notified when the speaker A 121 performs speaker recognition at the next opportunity of the failure, it is possible to cope with the case where the speaker A 121 does not have any communication means. If you also report that you should change your password when reporting this failure information, you can respond to leaks of your password.

[0071] As described above, according to Figs. 1 and 2, for example, when speaker A121 seeks recognition, speaker recognition succeeds, but when impostor 122 seeks recognition, failure information is preferably used. Since it is reported to speaker A121, impersonation or misrepresentation in speaker recognition can be suitably avoided or prevented.

[0072] (2) Second embodiment

Next, the configuration and basic operation of the speaker recognition system according to the second embodiment will be described with reference to FIG. 3 in addition to FIG. FIG. 3 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the second example. In FIG. 3, the same components as those in the above drawings are given the same reference numerals, and the description thereof will be omitted as appropriate.

[0073] The speaker recognition system 1 according to FIG. 3 is compared with the speaker recognition system 1 according to FIG. 1, and the change processing unit 65 as an example of the “change processing means” according to the present invention, It further includes an invalidation processing unit 66 as an example of the “invalid processing means”.

[0074] For example, when the detection unit 60 detects that speaker recognition has failed continuously for a predetermined number of times, the change processing unit 65 changes the password used for speaker recognition. I do. Alternatively, the invalidation processing unit 66 invalidates the password for a predetermined period.

[0075] As described above, according to Fig. 3, since the password is changed or invalidated until a countermeasure is taken after being notified to one user, impersonation or misrepresentation in speaker recognition is also performed in this embodiment. The possibility of recurrence or the success rate at the time of recurrence can be quickly suppressed. [0076] (3) Third Example

Next, the configuration and basic operation of the speaker recognition system according to the third embodiment will be described with reference to FIG. 4 in addition to FIG. FIG. 4 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the third example. In FIG. 4, the same components as those in the above drawings are given the same reference numerals, and the description thereof will be omitted as appropriate.

The speaker recognition system 1 according to FIG. 4 has a history storage unit 80 and a history database 85 as examples of the “history storage means” according to the present invention in addition to the speaker recognition system 1 according to FIG. In addition.

The history storage unit 80 stores history information including failure information indicating that speaker recognition has failed in the history database 85. The table structure stored in the history database 85 is as shown in Table 86. For example, the table 86 stores the date and time when speaker recognition was performed, the name of the terminal used, the number of consecutive failures (number of consecutive failures), and the recognition result. Then, based on this history information, if the recognition unit 14 recognizes one user himself / herself while the number of consecutive failures is less than a predetermined number (for example, 5 times), the speaker recognition is considered successful, and the other time once. However, if the number of consecutive failures exceeds a predetermined number without being recognized as one user, the recognition of the speaker will be a failure and will be made by the impersonator 132 and will be the subject of notification. It should be noted that the counter for counting the number of consecutive failures is set to an initial value of 0 after a predetermined time or a predetermined period of time or after successful recognition so that the number of failures by one user is not inadvertently accumulated. Good. Alternatively, only when failure occurs continuously within a certain time or within a certain time period, it may be the subject of notification.

[0079] As described above, according to FIG. 4, based on the history information, one user takes various measures to prevent recurrence, and the administrator prompts all users including one user to take measures. As a result, it is possible to more efficiently suppress the possibility of reoccurrence of spoofing or fraud in speaker recognition or the success rate at the time of relapse.

[0080] (4) Fourth embodiment

Next, the configuration and basic operation of the speaker recognition system according to the fourth embodiment will be described with reference to FIG. 5 in addition to FIGS. FIG. 5 is a block diagram conceptually showing the basic structure and basic operation of the speaker recognition system in the fourth example. It should be noted that in FIG. The same reference numerals are given to the same components as those in the drawings, and the description thereof will be omitted as appropriate.

[0081] The speaker recognition system 1 according to FIG. 5 has a plurality of terminals (for example, a terminal A91 and a terminal B92) in addition to the speaker recognition system 1 according to FIG. And an audio recording unit 145 as an example.

[0082] Each of the plurality of terminals includes a microphone 132 and a display screen 52. For example, terminal A91 is installed in a branch in Hokkaido, and terminal B92 is installed in a branch in Fukuoka. The network configuration may be a so-called client-server type in which, for example, each terminal is arranged in a client and the others are arranged in a server. However, the network configuration is not limited to this, and for example, only the history database 85 may be arranged in the server.

The voice recording unit 145 records the voice input to the microphone 132 when the speaker recognition is performed, in the history database 85 along with the history information, for example. Then, for example, when an unauthorized speaker recognition is attempted by the spoofer 122, the recorded voice is reproduced in response to the report by the report unit 70. Based on the reproduced voice, one user can confirm the unauthorized use surely and promptly, so that it is possible to quickly perform a process such as password change.

The table structure stored in the history database 85 is, for example, a table 87. Table 87 stores, for example, the date and time when speaker recognition was performed, the name of the terminal used, location information indicating where the terminal was geographically located, voice data, and recognition results. . Of course, in addition to these, the number of consecutive failures (number of consecutive failures) may be stored. Then, based on this history information, a deceiter 122 may be determined, or a defensive measure may be taken to prevent fraud. For example, if the speaker recognition succeeded on the terminal A in Hokkaido on February 1, 2006, and the speaker recognition was attempted on the terminal B in Fukuoka Prefecture 1 minute later, Because it is physically or technically impossible to move from Hokkaido to Fukuoka Prefecture, the speaker recognition at this terminal B is interrupted even if the number of consecutive failures has not reached the predetermined number. Report facts that help one user. At this time, since voice data is also recorded, the identification of an impersonator is dramatically increased.

As described above, according to FIG. 5, in addition to failure information, audio data, date and position information, etc. are recorded. Therefore, it is possible to increase the specific speed and accuracy of fraudsters and to understand the behavioral patterns of the fraudsters, and thus more accurately suppress spoofing or recurrence of spoofing. At this time, the impersonation in the speaker recognition can be prevented by analyzing the above information from the viewpoint of physical difficulty.

[0086] (5) Fifth embodiment

Next, the basic operation of the speaker model registration apparatus in the speaker recognition system according to the fifth embodiment will be described with reference to FIG. 6 in addition to FIG. FIG. 6 is a flowchart showing the operation process of the speaker recognition system according to the fifth embodiment. Note that the configuration in the present embodiment is the same as that in the fourth embodiment, and the same components are denoted by the same reference numerals, and the description thereof will be omitted as appropriate.

In FIG. 6, when speech is input by the user for speaker recognition, the recognition unit 14 calculates the similarity between the input speech and the speech registered in advance (step S1). It is determined whether or not the user has one person himself / herself based on whether or not the similarity exceeds a predetermined threshold (step S2). For example, a predetermined threshold value is set to 0.8, and it is determined whether or not the user is the user himself / herself based on whether or not the similarity obtained within the range of 0 to 1 is 0.8 or more.

[0088] Here, when the similarity does not exceed the predetermined threshold (step S2: No), subsequently, the detection unit 14 determines whether or not the number of continuous recognition failures exceeds a predetermined value, and the recognition failure flag. Is set (step S32). For example, the default value is 5 times, and it is determined whether the number of consecutive recognition failures exceeds 5.

[0089] If the number of consecutive recognition failures still does not exceed the default value of 5 (step S32: No

) The user tries to recognize the speaker again by inputting the voice again. It is possible for a single user to fail several times due to noise or physical condition.

[0090] On the other hand, if the number of consecutive recognition failures exceeds the default value of 5 (step S32: Yes), the number of failures is no longer an excuse, and there is a relatively high possibility of being a spoofer. As a result, the display screen 52 displays that the speaker recognition process as a whole has failed. Power! The invalidation processing unit 66 temporarily invalidates the password, the history storage unit 80 stores the history information including failure information related to the speaker recognition, and the voice recording unit 145 stores the voice information input during the speaker recognition. Each data is stored in the history database 85 (step S42). ) o

On the other hand, when the degree of similarity exceeds a predetermined threshold (step S2: Yes), that is, when it is recognized that the user is one, the recognition is basically successful at this point. And if there is an appearance of an impersonator, based on whether or not the recognition failure flag is set to prompt countermeasures, it is confirmed whether or not the previous speaker recognition has failed. (Step S31).

[0092] Here, when the previous speaker recognition has failed (step S31: Yes), the reporting unit 70 recognizes the recognition failure history (that is, the fact that recognition by the impersonator has been performed) this time. The security of the system can be ensured by taking measures such as changing the password if the user who has received the notification is not able to remember the password (step S41). On the other hand, if you remember, you do not need to change it.

[0093] On the other hand, if the previous speaker recognition has not failed (step S31: No), there is no evidence that the recognition by the impersonator has been performed. This is permitted and a message to that effect is displayed on the display screen 52 (step S43).

As described above, in the present embodiment, since the processing shown in FIG. 6 is performed, speaker recognition is preferably performed. In particular, measures are taken when recognition fails continuously, and impersonation or misrepresentation in speaker recognition is preferably avoided or flaws can be prevented.

[0095] (6) Sixth embodiment

Next, the basic operation of the speaker model registration apparatus in the speaker recognition system according to the sixth embodiment will be described with reference to FIG. 7 in addition to FIGS. FIG. 7 is a flowchart showing the operation process of the speaker recognition system according to the sixth embodiment. The configuration in the present embodiment is the same as that in the fourth embodiment, and the same reference numerals are given to the same configurations, and the description thereof is omitted as appropriate. The same steps as those in the fifth embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

[0096] Compared to FIG. 6, in FIG. 7, a process for increasing the threshold for similarity determination by the recognition unit 14 is added so that each time recognition fails, it becomes more difficult to recognize other than one user. (Step S52).

[0097] Therefore, in this embodiment, it is possible to avoid a situation in which an impersonator succeeds in recognition by learning to approach one user's voice every time recognition failure continues. Very advantageous.

[0098] (7) Seventh embodiment

Next, the basic operation of the speaker model registration apparatus in the speaker recognition system according to the seventh embodiment will be described with reference to FIG. 8 in addition to FIG. 5 and FIG. FIG. 8 is a flowchart showing the operation process of the speaker recognition system according to the seventh embodiment. The configuration in the present embodiment is the same as that in the fourth embodiment, and the same reference numerals are given to the same configurations, and the description thereof is omitted as appropriate. The same steps as those in the fifth embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

In FIG. 8, in particular, compared to FIG. 6, a process for confirming whether or not a terminal that one user normally uses and a terminal that is currently used is different is added. Specifically, it is determined whether or not the terminal currently used is normally used (step S220). At this time, the terminal normally used may be set in advance by one user himself / herself, for example. If it is determined that the terminal is in normal use (step S220: Yes), ex is substituted for the default value of the number of consecutive failures (step S221). On the other hand, if it is determined that the terminal is not normally used (step S220: No), β is substituted for the default value of the number of consecutive failures (step S222). Here, _α > β. This is because it is estimated that there is a high possibility that a person other than the user himself is trying to perform the recognition process if the terminal being used this time is not a V terminal that is normally used. This is because it is preferable to reduce the default number of consecutive failures.

[0100] As described above, in this embodiment, the possibility of being one user himself / herself is narrowed down from the terminals used, and speaker recognition is performed more appropriately.

[0101] (8) Eighth Example

Next, the basic operation of the speaker model registration apparatus in the speaker recognition system according to the eighth embodiment will be described with reference to FIG. 9 in addition to FIGS. FIG. 9 is a flowchart showing the operation processing of the speaker recognition system according to the eighth embodiment. The configuration in the present embodiment is the same as that in the fourth embodiment, and the same reference numerals are given to the same configurations, and the description thereof is omitted as appropriate. The same steps as those in the fifth embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate. [0102] Compared to Fig. 6, Fig. 9 adds processing to assist speaker recognition based on the possibility of movement between the previous and current terminals. Specifically, first, the distance D between used terminals, which is the distance between the terminals used in the previous recognition and the current recognition, is acquired from the preset geographical information of each terminal (step S225). The use time difference T, which is the time difference between the previous recognition and the current recognition, is acquired (step S226). Next, based on the distance D between the terminals used and the time difference T, the moving speed V between the terminal used in the previous recognition and the terminal used in the current recognition is calculated as V = DZT. (Step S227). Then, it is determined whether or not the moving speed V exceeds a predetermined speed threshold (step 321). Here, the “predetermined speed threshold” is a value set in advance as a speed at which it is difficult or impossible to move, for example, lOOOkmZh. Here, if the moving speed V exceeds the predetermined speed threshold (step S321: Yes), it is unlikely that one user will move at such speed, that is, there is a strong suspicion of fraudulent use by an impersonator. Then, recognition failure processing is performed (step S42). On the other hand, when the moving speed V does not exceed the predetermined speed threshold (step S321: No), since the suspected unauthorized use cannot be estimated from the moving speed V, speaker recognition is continued (step Sl).

[0103] As described above, in the present embodiment, since the suspected unauthorized use is estimated based on the possibility of movement between the previous and current terminals! /, The speaker recognition is performed more appropriately.

[9] Ninth embodiment

Next, the basic operation of the speaker model registration apparatus in the speaker recognition system according to the ninth embodiment will be described with reference to FIG. 10 in addition to FIG. 5 and FIG. FIG. 10 is a flowchart showing the operation process of the speaker recognition system according to the ninth embodiment. Note that the configuration in the present embodiment is the same as that in the fourth embodiment, and the same components are denoted by the same reference numerals, and the description thereof will be omitted as appropriate. The same steps as those in the fifth embodiment are denoted by the same reference numerals, and the description thereof is omitted as appropriate.

[0105] In FIG. 10, in particular, when the number of consecutive recognition failures exceeds a predetermined value (step S32: Yes), the timing for reporting failure information differs. Specifically, the timing for reporting failure information in Fig. 6 is when recognition succeeds in recognition after recognition related to this failure information (step S41), and is relatively long between failure times. Time lag occurs The On the other hand, the timing at which failure information is reported in FIG. 7 is the point in time when the failure information is recognized (step S422), and the time lag that occurs between the failure and the failure can be relatively short. Therefore, one user or the administrator of the speaker recognition system can take quick measures. For example, in a situation where the password is temporarily invalidated, it is possible to avoid the failure of recognition by one user himself / herself without knowing the fact of such invalidation.

As described above, in the present embodiment, since the process shown in FIG. 10 is performed, speaker recognition is preferably performed. In particular, when recognition fails continuously, one user himself / herself is notified at an appropriate timing, so the possibility of spoofing or spoofing recurrence in speaker recognition or the success rate at the time of reoccurrence can be quickly suppressed. It becomes possible.

It should be noted that the present invention is not limited to the above-described embodiments, and can be appropriately changed within the scope of the invention and the gist of the invention, which can also read the whole specification of the specification, or the concept thereof. A speaker recognition system and a computer program with the above are also included in the technical scope of the present invention.

Industrial applicability

[0108] The speaker recognition system and the computer program according to the present invention are provided in various computer devices such as a car navigation device, a net banking device, an auto-lock device, and a computer recognition device, and various electronic electric devices. It can be used in a speaker recognition system that performs speaker recognition based on the utterance of the speaker.

Claims

The scope of the claims

[1] recognition means for speaker recognition;

Detecting means for detecting whether or not the speaker recognition related to one user in the recognition means has failed continuously for a predetermined number of times;

When the detection unit detects that the speaker recognition has failed continuously for the predetermined number of times, failure information indicating that the speaker recognition has failed is sent to the one user. Reporting means to report and

A speaker recognition system comprising:

[2] recognition means for speaker recognition;

Detecting means for detecting whether or not the speaker recognition related to one user in the recognition means has failed;

A history storage means for storing history information including failure information indicating that the speaker recognition has failed when the detection means detects that the speaker recognition has failed; Reporting means for reporting the history information

A speaker recognition system comprising:

[3] The recognition means performs the speaker recognition via voice input means,

Voice recording means for recording the voice input to the voice input means when the speaker recognition is performed;

The speaker recognition system according to claim 2, wherein the history storage means further stores the recorded voice as the history information.

[4] The reporting means reports the history information and reproduces the recorded voice for the one user.

The speaker recognition system according to claim 3, wherein

[5] The reporting unit reports the history information to the one user when the number of failures detected by the detection unit continuously exceeds a predetermined number of times.

The speaker recognition system according to claim 2, wherein:

[6] The recognition means performs the speaker recognition via an access from a terminal connected to the recognition means via a communication means, The history storage means further stores the history information including the terminal name of the terminal on which the speaker recognition is performed,

When the stored terminal name is different from the terminal name normally used by the one user, the predetermined number of times is smaller than the same case.

The speaker recognition system according to claim 5, wherein:

[7] The recognition means performs the speaker recognition via an access from a terminal connected to the recognition means via a communication means,

The history storage means stores the date and time when the speaker recognition succeeded most recently in the speaker recognition and the position of the terminal further included in the history information,

When the distance difference between the position of the terminal and the stored position of the terminal in the speaker recognition with respect to the time difference between the date and time of the speaker recognition and the stored date and time exceeds a predetermined speed threshold The detection means detects that the speaker recognition has failed.

The speaker recognition system according to claim 2, wherein:

8. The speaker recognition system according to claim 1, wherein the reporting unit reports to the one user without delay through a communication unit.

9. The speaker recognition system according to claim 2, wherein the reporting unit reports the one user without delay through a communication unit.

[10] The reporting unit reports to the one user when the one user performs the speaker recognition as the next opportunity of the failed speaker recognition in the speaker recognition unit. The speaker recognition system according to claim 1, characterized in that:

[11] The notifying means notifies the one user when the one user performs the speaker recognition as the next opportunity for the failed speaker recognition in the speaker recognition means. The speaker recognition system according to claim 2, wherein:

[12] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance,

The speaker recognition system according to claim 1, wherein the reporting means reports that the password should be changed when reporting.

[13] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance,

3. The speaker recognition system according to claim 2, wherein the reporting means reports that the password should be changed when reporting.

[14] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance,

When it is detected by the detection means, it further comprises a change processing means for performing a process of changing the password.

The speaker recognition system according to claim 1, characterized in that:

[15] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance.

The speaker recognition system according to claim 2, wherein:

[16] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance,

When it is detected by the detection means, it further comprises an invalidation processing means for invalidating the password for a predetermined period.

The speaker recognition system according to claim 1, characterized in that:

[17] The recognition means performs the speaker recognition based on a voice corresponding to a password registered in advance,

The speaker recognition system according to claim 2, wherein:

[18] recognition means for performing speaker recognition via voice input means;

Detecting whether or not the speaker recognition for one user has failed in the recognition means Detecting means for

A history storage means for storing history information including the voice recorded corresponding to the failure of the speaker recognition when the detection means detects that the speaker recognition has failed;

A speaker recognition system comprising:

[19] The history storage means further stores at least one of the date / time, the location information, and the terminal name when the speaker recognition is performed in the history information.

The speaker recognition system according to claim 2, wherein:

[20] The history storage means further stores at least one of the date / time, the location information, and the terminal name of the speaker recognition included in the history information.

The speaker recognition system according to claim 18, wherein:

[21] When detected by the detection means, the recognition means changes the parameters so that the speaker recognition is more likely to fail.

The speaker recognition system according to claim 1, characterized in that:

[22] When detected by the detection means, the recognition means! The parameters are changed so that the speaker recognition is more likely to fail.

The speaker recognition system according to claim 2, wherein:

[23] When detected by the detection means, the recognition means is changed to change parameters so that the speaker recognition is more likely to fail.

The speaker recognition system according to claim 18, wherein:

[24] The parameter to be changed includes a pre-registered voice of the one user and a speaker recognition that serve as a reference for determining whether or not the speaker recognition has failed. This is the threshold of similarity to the input audio

The speaker recognition system according to claim 21, wherein the speaker recognition system is characterized in that:

[25] The parameter to be changed includes a pre-registered voice of the one user and a speaker recognition that serve as a reference for determining whether or not the speaker recognition has failed. This is the threshold of similarity to the input audio

The speaker recognition system according to claim 22, wherein the speaker recognition system is characterized in that:

[26] The parameters to be changed include the one user's voice registered in advance and the speaker recognition used as a reference when determining whether or not the speaker recognition has failed. This is the threshold of similarity to the input audio

24. The speaker recognition system according to claim 23, characterized in that:

[27] The recognizing unit performs the speaker recognition via an access from a terminal connected to the recognizing unit via a communication unit,

In addition to or instead of the case where the detection unit detects that the speaker recognition has failed continuously for the predetermined number of times, the notification unit is configured so that the terminal of the terminal related to the speaker recognition failure If at least one of the temporal position and the spatial position does not satisfy the predetermined conditions, the user is notified.

The speaker recognition system according to claim 1, characterized in that:

[28] A computer program for causing a computer to function as the speaker recognition system according to claim 1.

[29] A computer program for causing a computer to function as the speaker recognition system according to claim 2.

[30] A computer program for causing a computer to function as the speaker recognition system according to claim 18.