WO2006027844A1 - 話者照合装置 - Google Patents
話者照合装置 Download PDFInfo
- Publication number
- WO2006027844A1 WO2006027844A1 PCT/JP2004/013197 JP2004013197W WO2006027844A1 WO 2006027844 A1 WO2006027844 A1 WO 2006027844A1 JP 2004013197 W JP2004013197 W JP 2004013197W WO 2006027844 A1 WO2006027844 A1 WO 2006027844A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- unit
- requester
- authentication
- standard pattern
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
Definitions
- the present invention relates to a speaker verification device that determines whether a user's voice is a legitimate person by determining whether the voice of the user is the voice power of the person who declared it.
- a user utters and registers voices corresponding to a plurality of words in advance, and indexes the plurality of registered words.
- a user power index and a personal identifier are designated, and a word corresponding to this index is uttered. It is determined whether or not the user is a valid user by collating the voice that has been uttered with the corresponding voice registered in advance.
- Patent Document 1 JP 2000-181490 A
- Patent Document 2 JP 2002-269047
- Patent Document 3 JP 2000-99090
- Patent Document 4 JP 2000-338987 A
- Patent Document 5 Japanese Patent Laid-Open No. 11082492
- Patent Document 6 Japanese Patent Laid-Open No. 10-214096
- Patent Document 7 JP 2001-331196 A
- Non-Patent Document 1 "Speech Information Processing” Sadahiro Furui Morikita Publishing Co., Ltd.
- Non-Patent Document 2 Satoshi Nakamura, Eri Yamamoto, Ryo Nagai, Kiyohiro Shikano “Speech Recognition and Lip Image Generation by Integration of Speech and Lip Images Using HMM” Spoken Language Information Processing P15-P17 Disclosure of Invention
- the present invention has been made to solve the above-described problems. Even if it was recorded by another person, it was a misrepresentation.
- the speaker verification device switches the previously presented speech unit to another speech unit when presenting to the authentication requester the speech unit to be uttered by the authentication requester.
- the voice standard pattern prepared in association with the means, the voice unit to be presented and the personal identifier is compared with the voice standard pattern prepared and the voice uttered by the authentication requester to determine the similarity of the voice.
- the speaker verification device is configured to switch the previously presented speech unit to another speech unit when presenting to the authentication requester the speech unit to be uttered by the authentication requester. Therefore, even if the content that the requester of the authentication uttered last time is recorded by another person, the recorded content cannot be used this time, so that the spoofing by another person can be prevented.
- FIG. 1 is a configuration diagram of a speaker verification apparatus according to Embodiment 1 of the present invention.
- 2, 5 and 6 are diagrams showing the contents presented by the voice unit presenting means according to the first embodiment of the present invention.
- FIG. 3 is a conceptual diagram showing how the voice standard pattern 118 is stored in the registration database according to the first embodiment of the present invention.
- FIG. 4 is a flowchart showing the processing contents of the speaker verification device according to Embodiment 1 of the present invention.
- reference numeral 100 denotes user registration means.
- the registration requester 101 previously registers the registration requester voice 102, the personal identifier 103, and the personal identification character string 104 at the time of registration, so that the voice standard pattern 118 corresponding to the personal identifier 103 is obtained. (Refer to FIG. 3) and the password 104 are stored in the registration database 106.
- Reference numeral 101 denotes a registration requester.
- the registration requester 101 is expected to be a resident of this building. Applicants who are allowed to enter the building.
- [0012] 102 is the voice of the registration requester.
- the registration requester voice 102 generates a voice standard pattern 118 described later.
- the voice recording method is obtained when the registration requester 101 reads out the text specified by the speaker verification device. Note that the text specified by the speaker verification device includes many syllable types, so that the speech standard pattern 118 can be generated with good quality.
- speech standard pattern 118 expresses the characteristics of registration requester speech 102 efficiently.
- the voice standard pattern is, for example, AD converted from the speech waveform of the registration requester's voice 102, and a digital signal is generated, and a feature value analysis is performed on this signal.
- Subword HMM Hidden Markov Model. Details of the HMM are described in Non-Patent Document 1.
- the voice feature amount is an expression of the voice signal efficiently, and for example, a cepstrum is used.
- the personal identifier 103 is a personal identifier.
- the personal identifier 103 is a code assigned to the registration requester 101 in order to identify the plurality of registration requesters 101. For example, a combination of alphanumeric characters is used. If the personal identifier is composed of alphabets, the first registered “Taro Suzuki” will be “AAA”, the second registered “Jiro Suzuki” will be “AAB”, and the third registered “Saburo Suzuki” Is assigned as “AAC” t.
- the personal identification character string 104 is a code that is kept secret in order to prove that the user is an official user, and is registered in advance. For example, a combination of alphanumeric characters is used. If the password 104 is specified as a 4-digit number string, “9768”, “4361”, etc. can be set.
- Reference numeral 105 denotes registration means. By inputting the registration requester voice 102, the personal identifier 103, and the password character string 104 into the registration means 105, the voice standard pattern 118 and the password character string 104 corresponding to the personal identifier 103 are registered in the registration database 106 described later.
- the registration unit 105 includes a microphone, a keyboard, and the like. Each registration requester 101 registers a registration requester voice 102 using a microphone, and registers a password character string 104 using the keyboard.
- Reference numeral 106 denotes a registration database.
- the registration database 106 stores the voice standard pattern 118 generated by the registration means 105, the personal identifier 103, and the password character string 104.
- voice The standard pattern 118 is stored so as to correspond to the personal identifier 103 and the standard pattern voice unit.
- the standard pattern speech unit is composed of syllables, it is stored so as to correspond to the personal identifier 103 and each syllable 117.
- the voice standard pattern 118 which is the voice standard pattern “A (/ a /)” of the syllable unit of the speaker registered with the personal identifier “AAB” is stored.
- the voice standard pattern 118 corresponding to the authentication individual identifier 109 and the voice unit 130 (see FIG. 2) presented by the voice unit presenting means 111 can be selected.
- the user recognition means 150 is a means for determining whether or not the authentication requester 107 is a valid user at the time of authentication. The similarity between the authentication requester voice 108 and the selected voice standard pattern 118 is obtained. It is to calculate.
- Reference numeral 107 denotes an authentication requester.
- the authentication requester 107 corresponds to the person who intends to enter the building when the speaker verification device that works in this embodiment is used for authentication when entering the building. For example, the registration requester 101 or a person who tries to enter the building by misrepresenting it.
- Reference numeral 108 denotes an authentication requester voice.
- the authentication requester voice 108 is a voice uttered by the authentication requester 107 in response to a voice unit 130 (see FIG. 2) presented by the voice unit presentation unit 111 described later.
- 109 is an authentication personal identifier.
- the authentication individual identifier 109 is a code for identifying the speaker declared by the authentication requester 107 at the time of authentication, and matches the registered individual identifier 103! /, Must be! /, .
- [0022] 110 is a voice unit database that stores the voice unit 130 to be presented to the authentication requester 107.
- the voice unit presenting means 111 presents a secret character string constituent character group 121 composed of the secret character string constituent characters 120, the voice unit 130, and the correspondence between the two.
- “password character string constituent character 120” is a character constituting the password character string 104.
- the password character string component character 120 corresponds to “0”, “1”, “2”.
- the “voice unit 13 0” is a character string that the authentication requester 107 should utter.
- FIG. 2 shows a plurality of combinations in which the speech unit 130 composed of words and the character string constituting character group 121 are associated with each other. “0”, “2”, and “6” correspond to the voice unit 130 of “Hachinohe (/ hatinohe /)”, and “1”, “2”, and “6” correspond to “/ keseNnuma /”.
- the authentication requester 107 utters the voice unit 130 corresponding to the number string of the personal identification character string 104. If the password 104 of the authentication requester 107 is “5218”, the voice unit 130 “Sapporo (/ saQporo /)” corresponding to the first number 5 is spoken, and then the second number 2 is supported. Say “Hachinohe (/ hatinohe /)”. Furthermore, “Kesenuma (/ keseNnuma /)” corresponding to the third number 1 and “Sapporo (/ saQporo /)” corresponding to the fourth number 8 are continuously uttered.
- the voice unit presenting means 111 switches the voice unit 130 presented last time to another voice unit 130 when presenting it to the authentication requester 107 and presents it. That is, the voice unit presenting means 111 switches at regular intervals when the voice unit 130 is switched every time it is used, when it is switched every time it is used twice, when it is switched every time it is used three times, or when it is switched randomly. In some cases, the same authentication requester 107 may be switched each time it is used.
- Reference numeral 112 denotes a voice similarity calculation unit.
- the voice similarity calculation means 112 selects the voice standard pattern 118 from the registration database 106 based on the input of the authentication personal identifier 109 and the presentation of the voice unit presentation means 111. For example, the authentication requester 107 inputs “AAB” as the authentication individual identifier 109, and the voice unit presenting means 111 continuously utters “Sapporo, Hachinohe, Kensuma, Sapporo” by the personal identification character string 104.
- the voice similarity calculation means 112 reads “sa (/ sa /)”, “one (/ Q /”) corresponding to the authentication individual identifier 109 “AAB” from the registration database 106.
- the selected speech standard pattern 118 is compared with each syllable of the authentication requester speech 108 uttered by the authentication requester 107, the speech similarity is calculated, and the similarity is output.
- the calculation of speech similarity may be judged on the whole sentence by comparing the acoustic features of each syllable.
- the similarity is described, for example, in “Speech Information Processing” Sadaaki Furui, June 1998, Chapter 5 of Morikita Publishing Co., Ltd. (hereinafter referred to as Reference 1). It is calculated by the method.
- 113 is a threshold value.
- the threshold value 113 is a predetermined reference value, and the authentication requester voice 108 serves as a reference for determination of voice power rejection by a legitimate user. If the similarity in the voice similarity calculation unit 112 is greater than the threshold 113, the authentication requester 107 is determined to be a valid user.
- Reference numeral 114 denotes determination means for determining whether or not the authentication requester 107 has a valid user power. Based on the result of the voice similarity calculation means 112, the judgment means 114 judges whether or not the authentication requester voice 108 is a sound by a proper user. If the degree of similarity is greater than or equal to the threshold value 113, it is determined that the user is a legitimate user.
- the authentication result 115 is an authentication result.
- the authentication result 115 is an output from the determination means 114. If the authentication requester 107 is determined to be a legitimate user, it is “accepted”, and if it is determined to be a user for the purpose of misrepresentation, “rejected”. " For example, when the speaker verification device according to the present embodiment is used for authentication when entering a building, the door is unlocked when it is accepted, and when it is rejected. It remains locked.
- FIG. 4 is a flowchart showing the processing contents of the speaker verification apparatus according to Embodiment 1 of the present invention. The operation is described below with reference to Fig. 4.
- Step 11 of FIG. 4 is a step of registering information of the registration requester 101. That is, the registration requester 101 inputs his information, that is, the registration requester voice 102, the personal identifier 103, and the personal identification character string 104 to the registration means 105.
- the registration unit 105 generates a voice standard pattern 118 based on the registration requester voice 102 and stores the voice standard pattern and the password character string in the registration database 106.
- Step 12 is a step of presenting to the authentication requestor 107 the correspondence between the voice unit uttered by the authentication requestor 107 and the authentication character (group). That is, the authentication requester 107 is made to input the authentication personal identifier 109 to the speaker verification device.
- the voice unit presenting means 111 presents the voice unit 130 to be uttered by the authentication requester 107.
- the voice unit presenting means 111 presents the correspondence between the password character string constituting character group 121 and the voice unit 130 as shown in FIG.
- the speech unit presenting means 111 switches the speech unit 130 presented last time to another speech unit 130 and presents it.
- the voice unit presenting means 111 switches at regular intervals when the voice unit 130 is switched every time it is used, when it is switched every time it is used twice, when it is switched every time it is used three times, or when it is switched randomly. In some cases, the same authentication requester 107 may be switched each time it is used. If the administrator of the verification device wants to register a new voice unit 130, the voice unit database 110 should be updated! By adding one audio unit 130 to the audio unit database 110, it is possible to obtain the effect that the audio unit 130 can be presented to each registration requester 101. In addition, once the registration requester 101 registers the registration requester voice 102, the voice standard pattern 118 is automatically generated. Therefore, even if a new voice unit 130 is added, the registration requester voice 102 is newly added. You can get the effect of not having to record.
- Step 13 is a step of comparing the authentication requester voice 108 uttered by the authentication requester 107 and the voice standard pattern 118 corresponding thereto at the time of authentication.
- step 14 at the time of authentication, the determination unit 114 compares the audio similarity output from the audio similarity calculation unit 112 with a predetermined threshold 113, and the audio similarity is the threshold 113. If it is above, “accept” as a legitimate user, and if the voice similarity is smaller than the threshold value 113, “reject” is output as an authentication result 115 as a user for the purpose of fraud.
- the registration means 105, the voice unit presentation means 111, the voice similarity calculation means 112, and the determination means 114 may be configured in one piece of software, but speaker verification describing the processing contents of each means A program may be created and the computer may execute the speaker verification program.
- a force that explains the case of accepting or rejecting with one threshold 113 is determined.
- a value may be set. For example, two types of thresholds A and B are prepared, and if the voice similarity is larger than the threshold A, it is determined that the user is a valid user, and if the voice similarity is between A and B, clear determination is impossible. If the voice similarity is smaller than B, then the person is determined to be the person who is the objection. By setting in this way, for example, when the speaker verification device is used for authentication when entering a building, if it is determined that the user is a valid user, the door is unlocked and determination is impossible. If it is determined, the voice unit presenting means 111 is presented to the authentication requester 107 again. If it is determined that the person is an impersonator, the door is not unlocked.
- FIG. 5 there may be a case where a plurality of combinations in which one-to-one correspondence is made in one-to-one correspondence between a speech unit 130 having one hiragana character and one character string constituent character 120. For example, if the password 104 of the authentication requester 107 is “5218”, the authentication requester 107 will say “se”, “yu”, “ke”, “no”.
- FIG. 6 there may be a case where a plurality of combinations in which a voice unit 130 having a single hiragana character and a password character string constituent character group 121 are associated with each other are presented.
- “0”, “2”, “6” correspond to the voice unit 130 “ha”, “1”, “9” correspond to “ke”, and “yu” “3”, “4”, and “7” correspond, and “sa” is assigned so that “5” and “8” correspond.
- the authentication requester 107 utters the voice unit 130 corresponding to the number string of the personal identification character string 104.
- the voice unit 130 “sa” corresponding to the first number is uttered, and then “ha” corresponding to 2 is uttered. Furthermore, “K” corresponding to the third number “1” and “Sa” corresponding to “8” corresponding to the fourth number are continuously spoken. As described above, since a plurality of numbers are assigned to the speech unit 130, even if the content of the utterance is known to another person, the number string of the personal identification character string 104 is not uniquely known.
- FIG. 7 is a configuration diagram of a speaker verification apparatus according to Embodiment 2 of the present invention.
- FIG. 8 is a flowchart showing the processing contents of the speaker verification device according to Embodiment 2 of the present invention.
- the configuration of the speaker verification device according to the present embodiment will be described with reference to FIG. Note that description of portions common to Embodiment 1 is omitted.
- reference numeral 201 denotes a registration requester attribute.
- Registration requester attribute 201 is a registration request. This is information on the attributes of the person 101, such as gender, age, zodiac, blood type, and birthplace. For example, there is information such as “male, 22 years old, mouse year, type A, from Tokyo”.
- Reference numeral 202 denotes registration means.
- the registration means 202 includes a microphone, a keyboard, and the like.
- Each registration requester 101 registers the registration requester voice 102 using a microphone so as to correspond to his / her personal identifier 103, and uses the keyboard.
- Registration requester attribute 201 is registered.
- Reference numeral 203 denotes a registration database.
- the registration database 203 stores the voice standard pattern 118 and the registration requester attribute 201 generated by the registration unit 202.
- the voice unit presenting means 204 presents a question regarding the attribute to the authentication requester 107. For example, "How many years are you?", “Which are you from?”, "What is your blood type?” This question is presented by switching the voice unit 130 to which the authentication requester 107 should answer the voice unit 130 previously presented to another voice unit 130. For example, if you present a requester 107 with "How many years are you?" And then ask the requester 107 who uses this speaker verification device, "Where are you from?” May be presented.
- the response time calculation means 205 measures the time taken for the authentication requester 107 to utter the authentication requester voice 108. For example, the time required for the authentication requester 107 to respond until the start of utterance is measured after a question is made by the voice unit presentation unit 204.
- the determination score calculation unit 206 is a score calculation means for determination.
- the determination score calculation unit 206 calculates a determination score S from the response time output from the response time calculation unit 205 and the speech similarity output from the speech similarity calculation unit 112.
- the determination score S is obtained by, for example, Equation 1 when the speech similarity is L (similarity is high if it is large) and the response time is Tr.
- Equation 1 a is a weighting factor. According to Equation 1, the longer the response time Tr, the lower the judgment score.
- Step 21 is a step in which the registration requester 101 inputs his / her registration requester voice 102, personal identifier 103, and registration requester attribute 201 to the registration means 202.
- the registration requester voice 102 and the registration requester attribute 201 are input so as to correspond to the personal identifier 103.
- the registration means 202 generates a voice standard pattern 118 based on the registration requester voice 102 and stores it in the registration database 203.
- Step 22 is a step in which the speaker verification device asks the authentication requester 107 about the attribute. For example, ask "How many years are you?" These questions are presented by the voice unit presenting means 204 by switching the content presented previously. Therefore, the same voice unit 130 is presented even if another person tries to enter the building by recording the voice uttered by the requester 107. The possibility of being misrepresented is reduced because the possibility of being spoofed is reduced.
- Step 23 is a step of measuring the time for which the authentication requester 107 utters the authentication requester voice 108.
- Response time calculation means 205 measures the time for which the authentication requester 107 utters the authentication requester voice 108. Since legitimate users know their attributes, response time is generally shorter. For others, it takes time to prepare the correct answer to this question, and the response time increases.
- step 24 based on the response time of the response time calculation means 205 and the voice similarity of the voice similarity calculation means 112, the authentication score requester 107 uses the judgment score calculation means 206 to determine the valid user power.
- This is a step of calculating a score for determining whether or not.
- the judgment means 114 compares the output of the judgment score calculation means with a predetermined threshold 113, and “accepts” if the output is greater than or equal to the threshold 113. For example, “reject” is output as the authentication result 115 because it is another person.
- the voice unit 130 is less likely to be spoofed by the recorded voice.
- the authentication requester 107 only has one action of answering the question, and the authentication requester 107 is judged whether or not it is a legitimate user from the two viewpoints of voice similarity and response time.
- the requester 107 can perform a high-accuracy voice collation without troublesome procedures.
- FIG. 9 is a configuration diagram of a speaker verification apparatus according to Embodiment 3 of the present invention.
- Figure 10 shows the 12 is a flowchart showing processing contents of the speaker verification device according to the third embodiment.
- the configuration of the speaker verification device according to the present embodiment will be described with reference to FIG. Note that description of parts common to Embodiment 1 or 2 is omitted.
- reference numeral 301 denotes a registration requester lip image.
- the registration requester lip image 301 is created in advance by allowing the registration requester 101 to read an article in a newspaper or magazine and recording the shape and movement of the lips at this time.
- a lip image standard pattern to be described later is generated based on the registration requester lip image 301.
- the “lip image pattern” is a subword HMM (Hidden
- the lip image standard pattern refers to an expression that efficiently represents the image features of the above pattern (hereinafter, the lip image standard pattern will be described as being composed of syllable units).
- the lip image standard pattern will be described as being composed of syllable units.
- Satoshi Nakamura, Eri Yamamoto, Ryo Nagai, Kiyohiro Shikano “Speech recognition and lip image generation by integrating speech and lip images using HMM” Spoken Language Information Processing Study Group 15-17, Prepared by the method reported in February 1997 (hereinafter referred to as Reference 2).
- Frequency analysis is performed by 256 FFT (Fast Fourier Transform). Then, the power spectrum in the spatial frequency domain is calculated and logarithmic scale smoothing is performed. Furthermore, dynamic features are obtained by taking the difference between frames.
- the lip image standard pattern is an HMM with a structure of 256 distributions of powers vector and 256 distributions of the difference, and is created by the power spectrum and difference obtained from the lip image as described above.
- 302 is a registration means.
- the voice standard pattern 118 corresponding to the personal identifier 103 is entered into a registration database 303 described later.
- the lip image standard pattern, and the registration requester attribute 201 are stored.
- the registration means 302 includes a microphone and a camera. Each registration requester 101 corresponds to his / her personal identifier 103, registers the registration requester voice 102 by using the microphone, and requests registration by using the camera. A person's lip image 301 is registered.
- Reference numeral 303 denotes a registration database.
- the registration database 303 stores the voice standard pattern 118, the lip image standard pattern, and the registration requester attribute 201 generated by the registration unit 302.
- Reference numeral 304 denotes an authentication requester lip image.
- the authentication requester lip image 304 is obtained by recording a lip image in a state in which the question presented by the voice unit presentation unit 204 is answered.
- Reference numeral 305 denotes a sound and lip image similarity calculation means.
- the voice and lip image similarity calculation means 305 selects the voice standard pattern 118 and the lip image standard pattern from the registration database 303 based on the input of the authentication personal identifier 109 and the voice unit 130 of the voice unit presentation means 204. To do. For example, the authentication requester 107 inputs “AAB” as the authentication personal identifier 109, and in response to the question “Where are you from?” In the voice unit presentation means 204, the registration database 303 has the place of origin of the authentication personal identifier 109.
- the voice and lip image similarity calculation means 305 reads “sa (/ sa /)” and “tsu (/ Q /)” corresponding to “AAB” from the registration database 303. , “Po (/ po /)”, “ro (/ ro /)” voice standard pattern 118 and lip image standard pattern are sequentially selected. Then, the authentication requester voice 108 and the authentication requester lip image 304 corresponding to the selected voice standard pattern 118 and lip image standard pattern are compared.
- FIG. 10 is a flowchart showing the processing contents of the speaker verification device according to Embodiment 3 of the present invention. The operation is described below with reference to Fig. 10.
- step 31 is a step of registering information of the registration requester 101.
- the registration requester 101 inputs the registration requester voice 102, the personal identifier 103, the registration requester attribute 201, and the registration requester lip image 301 to the registration means 302.
- the registration unit 302 generates a voice standard pattern 118 based on the registration requester voice 102, generates a lip image standard pattern based on the registration requester lip image 301, and stores both in the registration database 303.
- the voice standard pattern 118 and the lip image standard pattern are stored so as to correspond to the personal identifier 103 and the syllable 117. By storing in this way, the voice standard pattern 118 and the lip image standard pattern corresponding to the authentication individual identifier 109 and the voice unit presentation means 204 can be selected.
- step 32 the authentication requester 107 inputs the authentication personal identifier 109 to the speaker verification device.
- the voice unit presenting means 204 makes a question regarding the attribute.
- the authentication requester 107 utters the authentication requester voice 108 corresponding to the voice unit 130 requested by the voice unit presentation means 204.
- Step 33 is a step of comparing the authentication requester voice 108 uttered by the authentication requester 107 with the corresponding voice standard pattern 118. Also, the authentication requester lip image 304 photographed when the authentication requester 107 speaks is compared with the corresponding lip image standard pattern. Compare the requester lip image 304 with the lip image standard pattern. For example, the authentication requester 107 inputs the authentication personal identifier 109 of “AAB” and the voice unit presenting means 204 asks “Who are you from? If the user asks "?", The lip image similarity calculation means searches the registration database 303 for "sa (/ sa /)", “one (/ Q /)" corresponding to the authentication individual identifier 109 "AAB".
- Each selected lip image standard pattern and each authentication requester lip image 304 corresponding to the selected lip image standard pattern are compared to calculate the lip image similarity and output the similarity.
- the authentication requester lip image 304 is an image of the lips from which the authentication requester 107 uttered the authentication requester voice 108, and the power spectrum and dynamic features of the power spectrum are extracted as feature amounts. To do.
- the similarity between the authentication requester lip image 304 and the lip image standard pattern selected from the registration database 303 is calculated by the method shown in Chapter 5 of Reference 1.
- the similarity for determination is obtained using the similarity L by voice and the similarity M by lip image.
- the similarity for determination is, for example, a score shown in Formula 2 in which both are weighted and added.
- Step 34 is a step in which the speaker verification device determines whether or not the authentication requester 107 is a valid user.
- the previously presented speech unit 130 is switched to another speech unit 130, and the lip image is input together with the speech to calculate the similarity. It is difficult for others to be spoofed by sound recorded by a tape recorder or the like.
- the authentication requester 107 can determine whether or not the authentication requester 107 is a legitimate user from the two viewpoints of sound and lip image by only one act of answering the question.
- FIG. 11 is a configuration diagram of a speaker verification apparatus according to Embodiment 4 of the present invention.
- FIG. 12 is a flowchart showing the processing contents of the speaker verification apparatus according to Embodiment 4 of the present invention.
- the configuration of the speaker verification apparatus according to the present embodiment will be described with reference to FIG. Note that description of portions common to Embodiment 1, 2, or 3 is omitted.
- 401 is a registration means.
- the registration requester attribute 201 corresponding to the personal identifier 103 is registered in the registration database 402 described later.
- Reference numeral 402 denotes a registration database.
- the registration database 402 stores the registration requester attribute 201 in association with the personal identifier.
- Reference numeral 403 denotes a collation voice standard pattern group.
- the collation voice standard pattern group 403 is a group of voice standard patterns depending on attributes. For example, when the attributes are age and gender, the group is divided into a group of voice standard patterns for men in their 10s, a group of voice standard patterns for men in their 20s, and so on.
- the collation speech standard pattern group 403 is created using, for example, speech feature amounts obtained by performing feature amount analysis on a digital signal obtained by performing AD conversion on a speech waveform of a male teenager.
- Reference numeral 404 denotes a collation voice standard pattern group set.
- the collation voice standard pattern group set 404 is a set of the collation voice standard pattern group 403.
- Reference numeral 405 denotes collation voice standard pattern selection means.
- the verification voice standard pattern selection means 405 takes out the data of the registration requester attribute 201 from the registration database 402 based on the inputted authentication personal identifier 109, and based on the data, the verification voice standard pattern
- the voice standard pattern group for verification is selected from the group group set 405. For example, if the attribute of the authentication requester declared by the personal identifier 103 is a male in their 20s, the standard pattern group for matching in the 20s is selected.
- FIG. 12 is a flowchart showing the processing contents of the speaker verification apparatus according to Embodiment 1 of the present invention. The operation is described below with reference to Fig. 12.
- step 41 is a step of registering information of the registration requester 101.
- the registration requester 101 inputs his / her personal identifier 103 and registration requester attribute 201 to the registration means 401.
- the registration requester attribute 201 is registered so as to correspond to the personal identifier 103.
- Step 42 is a step in which the authentication requester 107 inputs the authentication personal identifier 109 to the speaker verification device.
- the voice unit presenting means 204 presents a question related to the attribute. For example, "Where are you from?" This question is presented by switching the previously presented content by the utterance unit presenting means 204.
- Step 43 is a step of comparing the authentication requester voice 108 uttered by the authentication requester 107 and the corresponding verification voice standard pattern 118.
- the verification voice standard pattern selection means 405 From the collation voice standard pattern group set 405, the voice standard pattern group for verification corresponding to the attribute of the authentication personal identifier 109 “AAB” is selected, and “sa (/ sa /)”, “one (/ Q /) ",” po (/ po /) ",” ro (/ ro /) "standard voice pattern 118 for matching is selected.
- the selected speech standard pattern 118 for verification is compared with each syllable of the authentication requester speech 108 to calculate speech similarity and output the similarity.
- Step 44 is a step in which the speaker verification device determines whether the authentication requester 107 is a valid user.
- the voice unit 130 since the previously presented voice unit 130 is switched to the other voice unit 130, the voice unit 130 is presented to the other person by the voice recorded by the tape recorder or the like. Hateful. Also, since speaker verification is performed using attributes, prior voice registration Speaker verification can be realized even when registration requester 101 cannot register voice.
- FIG. 1 is a configuration diagram of a speaker verification apparatus showing Embodiment 1 of the present invention.
- FIG. 2 is an example showing the concept of voice unit presenting means in the first embodiment of the present invention.
- FIG. 3 is a conceptual diagram of a storage mode of a voice standard pattern in a registration database according to Embodiment 1 of the present invention.
- FIG. 4 is a flowchart showing the processing contents of the speaker verification device in the first embodiment of the present invention.
- FIG. 5 is an example showing the concept of voice unit presentation means in the first embodiment of the present invention.
- FIG. 6 is an example showing the concept of voice unit presentation means in the first embodiment of the present invention.
- FIG. 7 is a configuration diagram of a speaker verification apparatus showing Embodiment 2 of the present invention.
- FIG. 8 is a flowchart showing the processing contents of the speaker verification device in the second embodiment of the present invention.
- FIG. 9 is a configuration diagram of a speaker verification apparatus showing Embodiment 3 of the present invention.
- FIG. 10 is a flowchart showing the processing contents of the speaker verification device in the third embodiment of the present invention.
- FIG. 11 is a configuration diagram of a speaker verification apparatus showing Embodiment 4 of the present invention.
- FIG. 12 is a flowchart showing the processing contents of the speaker verification device in the fourth embodiment of the present invention.
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2004/013197 WO2006027844A1 (ja) | 2004-09-10 | 2004-09-10 | 話者照合装置 |
JP2006534954A JPWO2006027844A1 (ja) | 2004-09-10 | 2004-09-10 | 話者照合装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2004/013197 WO2006027844A1 (ja) | 2004-09-10 | 2004-09-10 | 話者照合装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006027844A1 true WO2006027844A1 (ja) | 2006-03-16 |
Family
ID=36036141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/013197 WO2006027844A1 (ja) | 2004-09-10 | 2004-09-10 | 話者照合装置 |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2006027844A1 (ja) |
WO (1) | WO2006027844A1 (ja) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015191391A (ja) * | 2014-03-28 | 2015-11-02 | 本田技研工業株式会社 | アルコールインタロックシステム |
JP2016511475A (ja) * | 2013-03-05 | 2016-04-14 | アリババ・グループ・ホールディング・リミテッドAlibaba Group Holding Limited | 人間を機械から区別するための方法及びシステム |
JP2019028464A (ja) * | 2017-07-26 | 2019-02-21 | ネイバー コーポレーションNAVER Corporation | 話者認証方法及び音声認識システム |
JP2019535025A (ja) * | 2017-09-11 | 2019-12-05 | ピン・アン・テクノロジー(シェンゼン)カンパニー リミテッドPing An Technology (Shenzhen) Co., Ltd. | 声紋識別によるエージェントログイン方法、電子装置及び記憶媒体 |
US10789960B2 (en) | 2016-11-07 | 2020-09-29 | Pw Group | Method and system for user authentication by voice biometrics |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7339116B2 (ja) * | 2019-10-11 | 2023-09-05 | グローリー株式会社 | 音声認証装置、音声認証システム、および音声認証方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04326409A (ja) * | 1991-04-26 | 1992-11-16 | Mitsubishi Electric Corp | 使用者確認方式 |
JP2000338987A (ja) * | 1999-05-28 | 2000-12-08 | Mitsubishi Electric Corp | 発話開始監視装置、話者同定装置、音声入力システム、および話者同定システム、並びに通信システム |
JP2002311992A (ja) * | 2001-04-13 | 2002-10-25 | Fujitsu Ltd | 話者認証方法及び装置 |
-
2004
- 2004-09-10 JP JP2006534954A patent/JPWO2006027844A1/ja not_active Withdrawn
- 2004-09-10 WO PCT/JP2004/013197 patent/WO2006027844A1/ja active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04326409A (ja) * | 1991-04-26 | 1992-11-16 | Mitsubishi Electric Corp | 使用者確認方式 |
JP2000338987A (ja) * | 1999-05-28 | 2000-12-08 | Mitsubishi Electric Corp | 発話開始監視装置、話者同定装置、音声入力システム、および話者同定システム、並びに通信システム |
JP2002311992A (ja) * | 2001-04-13 | 2002-10-25 | Fujitsu Ltd | 話者認証方法及び装置 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016511475A (ja) * | 2013-03-05 | 2016-04-14 | アリババ・グループ・ホールディング・リミテッドAlibaba Group Holding Limited | 人間を機械から区別するための方法及びシステム |
JP2015191391A (ja) * | 2014-03-28 | 2015-11-02 | 本田技研工業株式会社 | アルコールインタロックシステム |
US10789960B2 (en) | 2016-11-07 | 2020-09-29 | Pw Group | Method and system for user authentication by voice biometrics |
EP3319085B1 (fr) * | 2016-11-07 | 2022-04-13 | PW Group | Procédé et système d'authentification par biométrie vocale d'un utilisateur |
JP2019028464A (ja) * | 2017-07-26 | 2019-02-21 | ネイバー コーポレーションNAVER Corporation | 話者認証方法及び音声認識システム |
JP2019535025A (ja) * | 2017-09-11 | 2019-12-05 | ピン・アン・テクノロジー(シェンゼン)カンパニー リミテッドPing An Technology (Shenzhen) Co., Ltd. | 声紋識別によるエージェントログイン方法、電子装置及び記憶媒体 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2006027844A1 (ja) | 2008-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4672003B2 (ja) | 音声認証システム | |
US10013972B2 (en) | System and method for identifying speakers | |
JP6394709B2 (ja) | 話者識別装置および話者識別用の登録音声の特徴量登録方法 | |
WO2017215558A1 (zh) | 一种声纹识别方法和装置 | |
US7447632B2 (en) | Voice authentication system | |
US9792912B2 (en) | Method for verifying the identity of a speaker, system therefore and computer readable medium | |
Das et al. | Development of multi-level speech based person authentication system | |
US6496800B1 (en) | Speaker verification system and method using spoken continuous, random length digit string | |
WO2017162053A1 (zh) | 一种身份认证的方法和装置 | |
JPH06175680A (ja) | 最も近い隣接距離を使用した発声者確認装置 | |
JP2007133414A (ja) | 音声の識別能力推定方法及び装置、ならびに話者認証の登録及び評価方法及び装置 | |
US7490043B2 (en) | System and method for speaker verification using short utterance enrollments | |
Hamid et al. | Makhraj recognition for Al-Quran recitation using MFCC | |
CN112309406A (zh) | 声纹注册方法、装置和计算机可读存储介质 | |
WO2006027844A1 (ja) | 話者照合装置 | |
Wahidah et al. | Makhraj recognition using speech processing | |
Das et al. | Multi-style speaker recognition database in practical conditions | |
US10957318B2 (en) | Dynamic voice authentication | |
KR20110079161A (ko) | 이동 단말기에서 화자 인증 방법 및 장치 | |
Shirali-Shahreza et al. | Verifying human users in speech-based interactions | |
JP4245948B2 (ja) | 音声認証装置、音声認証方法及び音声認証プログラム | |
KR20230156145A (ko) | 하이브리드 다국어 텍스트 의존형 및 텍스트 독립형 화자 검증 | |
KR20060062287A (ko) | 문맥 요구형 화자 독립 인증 시스템 및 방법 | |
GORAI et al. | A GAUSSIAN MIXTURE MODELBASED SPEAKER RECOGNITION SYSTEM | |
JPH11344992A (ja) | 音声辞書作成方法、個人認証装置および記録媒体 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006534954 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |