JP2007017840A

JP2007017840A - Speech authentication device

Info

Publication number: JP2007017840A
Application number: JP2005201336A
Authority: JP
Inventors: Mitsunobu Kaminuma; 充伸神沼
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2005-07-11
Filing date: 2005-07-11
Publication date: 2007-01-25

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech authentication device capable of improving accuracy of speech authentication. <P>SOLUTION: A feature parameter of uttered speech of a person collected by a first microphone for collecting air-borne sound of the uttered speech of the person and a second microphone for collecting body transmission sound of the uttered speech of the person is extracted and stored beforehand, and a feature parameter is extracted from the uttered speech of the person collected by the first and second microphones and is verified with the stored feature parameter to authenticate whether or not the person is qualified as a user based on this comparison results. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、利用者が発する音声と予め記憶してある音声とを照合して適正な利用者か否かの認証を行う音声認証装置に関する。 The present invention relates to a voice authentication device that authenticates whether a user is an appropriate user by comparing a voice uttered by a user with a voice stored in advance.

予め決められれた利用者の名前などの言葉を発話してもらい、利用者の発話音声を分析して特徴パラメーターとして記憶しておき、認証時に利用者が発話した音声を分析して特徴パラメーターを抽出し、記憶してある特徴パラメーターと照合して一致または不一致を判定するようにした音声認証装置が知られている（例えば、特許文献１参照）。 Speak words such as a predetermined user name, analyze the user's speech and store it as a feature parameter, analyze the speech uttered by the user during authentication, and extract the feature parameter In addition, a voice authentication apparatus is known in which matching or mismatching is determined by comparing with stored feature parameters (see, for example, Patent Document 1).

この出願の発明に関連する先行技術文献としては次のものがある。
特開２００４−１７０５２２号公報 Prior art documents related to the invention of this application include the following.
JP 2004-170522 A

しかしながら、上述した従来の音声認証装置では、空気伝導音を集音して認証を行うので、利用者の音声に環境雑音が混入し易く、認証精度の向上が図れないという問題がある。 However, since the conventional voice authentication apparatus described above collects the air conduction sound and performs authentication, there is a problem that environmental noise is likely to be mixed into the user's voice and the authentication accuracy cannot be improved.

予め、人の発話音声の空気伝導音を集音する第１マイクと、人の発話音声の体内伝導音を集音する第２マイクとにより集音した人の発話音声から特徴パラメーターを抽出して記憶しておき、第１マイクと第２マイクにより集音した人の発話音声から特徴パラメーターを抽出し、記憶されている特徴パラメーターとマイクごとに比較照合し、この比較照合結果に基づいて利用者として適格か否かを認証する。 Extracting feature parameters from the speech of a person collected in advance by a first microphone that collects the air conduction sound of the person's speech and a second microphone that collects the body conduction sound of the person's speech The feature parameters are extracted from the speech of the person collected by the first microphone and the second microphone, compared with each stored feature parameter and each microphone, and the user is based on the comparison result. It certifies whether or not it is eligible.

本発明によれば、音声認証精度を向上させることができる。 According to the present invention, voice authentication accuracy can be improved.

本願発明の音声認証装置を車両に適用した一実施の形態を説明する。なお、本願発明の音声認証装置は車両用に限定されるものではない。 An embodiment in which the voice authentication device of the present invention is applied to a vehicle will be described. The voice authentication device of the present invention is not limited to vehicles.

図１は一実施の形態の構成を示す図である。非接触型マイクロホン（以下、単にマイクという）１は例えば運転席周辺のインストルメントパネルに設置され、運転者の発話音声の空気伝導音を集音する。この非接触型マイク１には、例えばコンデンサーマイクを用いることができる。非接触型マイク１は空気伝導音を集音するので、運転者の発話音声に環境雑音が混入する。 FIG. 1 is a diagram showing a configuration of an embodiment. A non-contact type microphone (hereinafter simply referred to as a microphone) 1 is installed, for example, on an instrument panel around the driver's seat, and collects the air conduction sound of the driver's speech. For example, a condenser microphone can be used as the non-contact type microphone 1. Since the non-contact type microphone 1 collects air conduction sound, environmental noise is mixed in the voice of the driver.

接触型マイク２は一般に骨伝導マイクや肉伝導マイクと呼ばれ、利用者の例えば頭部などに接触させて利用者の発話音声の体内伝導音を集音する。接触型マイク２は利用者が固定器具を用いて体に装着するか、あるいはヘッドレストなどの体に接触する部位に埋め込んで使用する。 The contact-type microphone 2 is generally called a bone conduction microphone or a meat conduction microphone, and collects the body conduction sound of the user's uttered voice by contacting the user's head, for example. The contact-type microphone 2 is used by a user wearing on the body using a fixing device or being embedded in a part that comes into contact with the body, such as a headrest.

なお、マイクで収音できる周波数帯域は非接触型と接触型とで異なる。空気伝導音には高域の情報が含まれやすいため、非接触型マイク１には高域情報を強調して集音できるマイクを用いる。一方、体内伝導音には低域の情報が含まれやすいため、低域情報に対して敏感な接触型マイク２を用いる。また、低域特性が優れたマイクによって集音された音声の高域情報は、フィルター処理を施してパラメーター化しないほうが認証精度の向上につながる場合もある。 The frequency band that can be picked up by the microphone differs between the non-contact type and the contact type. Since high-frequency information is likely to be included in the air conduction sound, the non-contact type microphone 1 uses a microphone capable of collecting sound with emphasis on high-frequency information. On the other hand, since the low-frequency information is likely to be included in the body conduction sound, the contact microphone 2 sensitive to the low-frequency information is used. In addition, the high frequency information of the sound collected by the microphone having excellent low frequency characteristics may improve the authentication accuracy if it is not parameterized by filtering.

マイクアンプ３は非接触型マイク１および接触型マイク２で集音した音声をそれぞれ別個に増幅し、ＡＤ変換装置４はマイクアンプ３で増幅した非接触型マイク１および接触型マイク２のアナログ音声をそれぞれ別個にデジタル音声に変換する。 The microphone amplifier 3 separately amplifies the sounds collected by the non-contact type microphone 1 and the contact type microphone 2, and the AD converter 4 analog sounds of the non-contact type microphone 1 and the contact type microphone 2 amplified by the microphone amplifier 3. Are individually converted into digital audio.

演算装置５はマイクロコンピューターやメモリなどを備え、利用者の発話音声の特徴をパラメーター分析して記憶装置６に記憶されている利用者の特徴パラメーターと照合し、適正な利用者か否かの認証を行う。なお、パラメーター分析には音声のピッチ、ホルマント周波数、スペクトル、ケプストラム、話速、韻律、スペクトル回帰、パワー、イントネーション、発話内容などの情報を単独または複数個組み合わせて行う。 The arithmetic unit 5 includes a microcomputer, a memory, etc., and analyzes the characteristics of the user's uttered voice and compares them with the user's characteristic parameters stored in the storage device 6 to authenticate whether the user is an appropriate user. I do. In the parameter analysis, information such as voice pitch, formant frequency, spectrum, cepstrum, speech speed, prosody, spectral regression, power, intonation, utterance content, etc. is used alone or in combination.

記憶装置６は、予め決められている名前などを事前に利用者に発話してもらい、その音声を収録して分析し、特徴パラメーターを抽出して利用者ごとに記憶する。音声認証では予めこの記憶装置６に記憶しておいた利用者の特徴パラメーターと、認証時に入力された利用者の発話音声の特徴パラメーターとを比較する。特徴パラメーターとは、音声信号に対して周波数分析、ケプストラム分析、ピッチ抽出などの処理を行い、音声信号に含まれる個性に関するパラメーターを抽出し、ベクトル化したものである。図２にピッチ情報と周波数分析から抽出したホルマント情報を用いて５次元のベクトル、すなわち特徴パラメーターを作成した一例を示す。 The storage device 6 asks a user to speak a predetermined name in advance, records and analyzes the voice, extracts a characteristic parameter, and stores it for each user. In the voice authentication, the user's feature parameter stored in advance in the storage device 6 is compared with the feature parameter of the user's utterance voice input at the time of authentication. The characteristic parameters are obtained by performing processing such as frequency analysis, cepstrum analysis, pitch extraction, etc. on the speech signal, extracting parameters related to individuality included in the speech signal, and vectorizing them. FIG. 2 shows an example of creating a five-dimensional vector, that is, a feature parameter, using pitch information and formant information extracted from frequency analysis.

図３は一実施の形態の話者認証動作を示すフローチャートであり、図４は一実施の形態の話者認証原理を示す図である。これらの図により、一実施の形態の動作を説明する。ステップ１において初期化処理を行い、記憶装置６から予め収集された利用者の音声に関する特徴パラメーターを読み込み、メモリ空間に展開する。ステップ２では利用者の音声入力の有無を検出し、非接触型マイク１と接触型マイク２から音声が入力されたらステップ３へ進む。 FIG. 3 is a flowchart showing the speaker authentication operation of the embodiment, and FIG. 4 is a diagram showing the speaker authentication principle of the embodiment. The operation of one embodiment will be described with reference to these drawings. In step 1, initialization processing is performed, and feature parameters relating to the user's voice collected in advance from the storage device 6 are read and expanded in the memory space. In step 2, the presence or absence of a voice input by the user is detected. If voice is input from the non-contact microphone 1 and the contact microphone 2, the process proceeds to step 3.

ステップ３では非接触型マイク１と接触型マイク２から入力した音声から上述した手法により特徴パラメーターを抽出する。続くステップ４においてマイクごとに抽出した特徴パラメーターを予め記憶されている特徴パラメーターと比較する。すなわち、非接触型マイク１の入力音声から抽出した特徴パラメーターを、非接触型マイク１で収録して予め作成した特徴パラメーターと比較するとともに、接触型マイク２の入力音声から抽出した特徴パラメーターを、接触型マイク２で収録して予め作成した特徴パラメーターと比較する。 In step 3, feature parameters are extracted from the voices input from the non-contact microphone 1 and the contact microphone 2 by the method described above. In the subsequent step 4, the feature parameter extracted for each microphone is compared with a feature parameter stored in advance. That is, the feature parameter extracted from the input sound of the non-contact type microphone 1 is compared with the characteristic parameter recorded in advance by recording with the non-contact type microphone 1, and the characteristic parameter extracted from the input sound of the contact type microphone 2 is Compared with the characteristic parameters recorded in advance by recording with the contact microphone 2.

この特徴パラメーターの比較手法としては、例えば２つの特徴パラメーターのユークリッド距離を比較する手法がある。図５は二次元のベクトル（特徴パラメーター）を用いて３人の話者認証を行った場合の例を示す。入力信号がＣ１からＣ３のいずれかの範囲に入っていれば話者認証が成功、つまり利用者として適格とされる。なお、Ｃ１の範囲を変更することによって、話者認証の精度を制御することができる。認証精度を上げて厳密にしたければＣ１の範囲を小さくすればよい。 As a feature parameter comparison method, for example, there is a method of comparing Euclidean distances between two feature parameters. FIG. 5 shows an example in which three speaker authentication is performed using a two-dimensional vector (feature parameter). If the input signal is in the range from C1 to C3, the speaker authentication is successful, that is, the user is qualified. Note that the accuracy of speaker authentication can be controlled by changing the range of C1. In order to increase the accuracy of authentication and make it strict, the range of C1 may be reduced.

認証結果はマイクごとに出力する。一致の照合結果が得られた場合は利用者として適格“１”とし、不一致の照合結果が得られた場合は利用者として不的確“０”とする。ステップ５においてマイクごとの認証結果の論理積をとり、最終的な認証結果を出力する。つまり、非接触型マイク１による認証結果が適格“１”で、かつ接触型マイク１による認証結果が適格“１”の場合のみ、利用者として適格であるとする。なお、マイクごとの認証結果の論理和をとって最終的な認証結果とすれば、認証の範囲を広げることができる。 The authentication result is output for each microphone. When a matching result of matching is obtained, the user is qualified as “1”, and when a matching result of mismatching is obtained, the user is uncertainly “0”. In step 5, the logical product of the authentication results for each microphone is calculated, and the final authentication result is output. That is, it is assumed that the user is qualified only when the authentication result by the non-contact microphone 1 is qualified “1” and the authentication result by the contact microphone 1 is qualified “1”. Note that the range of authentication can be expanded by taking the logical sum of the authentication results for each microphone to obtain the final authentication result.

このように、一実施の形態によれば、予め、人の発話音声の空気伝導音を集音する非接触型マイクと、人の発話音声の体内伝導音を集音する接触型マイクとにより集音した人の発話音声から特徴パラメーターを抽出して記憶しておき、非接触型マイクと接触型マイクにより集音した人の発話音声から特徴パラメーターを抽出し、記憶されている特徴パラメーターとマイクごとに比較照合し、この比較照合結果に基づいて利用者として適格か否かを認証するようにしたので、認証に用いる情報量が多くなり、しかも環境雑音の混入が少ない体内伝導音から得られた情報を用いることによって、音声認証精度を向上させることができる。 Thus, according to one embodiment, the non-contact type microphone that collects the air conduction sound of the person's uttered voice and the contact type microphone that collects the body conduction sound of the person's uttered voice are collected in advance. Feature parameters are extracted and stored from the uttered speech of the person who has made the sound, extracted from the utterance speech of the person collected by the non-contact microphone and the contact microphone, and the stored feature parameters and each microphone The result of the comparison and verification was used to authenticate whether or not the user was eligible. Based on the result of the comparison and verification, the amount of information used for authentication was increased, and it was obtained from the body conduction sound that contained less environmental noise. The voice authentication accuracy can be improved by using the information.

また、一実施の形態によれば、非接触型マイクと接触型マイクの両方において一致の照合結果が得られた場合に、利用者として適格であると認証するようにしたので、認証に用いる情報量が多くなり、しかも環境雑音の混入が少ない体内伝導音から得られた情報を用いることによって、音声認証精度を向上させることができる。 In addition, according to the embodiment, when matching results are obtained in both the non-contact type microphone and the contact type microphone, the user is authenticated as being qualified. The voice authentication accuracy can be improved by using the information obtained from the body conduction sound that increases in volume and has little environmental noise.

さらに、一実施の形態によれば、非接触型マイクと接触型マイクのいずれか一方において一致の照合結果が得られた場合に、利用者として適格であると認証するようにしたので、両マイクで一致の照合結果が得られた場合よりも音声認証精度は低くなるが、環境雑音の混入が少ない体内伝導音から得られた情報を用いることによって、従来の音声認証装置よりも音声認証精度を向上させることができる。 Furthermore, according to the embodiment, when a matching result is obtained in either one of the non-contact type microphone and the contact type microphone, it is authenticated that the user is qualified. The voice authentication accuracy is lower than the case where the matching result is obtained in, but the voice authentication accuracy is higher than that of the conventional voice authentication device by using the information obtained from the body conduction sound with less environmental noise. Can be improved.

《発明の一実施の形態の変形例》
上述した一実施の形態では、マイクごとに認証時の発話音声の特徴パラメーターを予め記憶されている特徴パラメーターと比較照合する例を示したが、２つのマイクの特徴パラメーターを統合して比較照合してもよい。 << Modification of Embodiment of Invention >>
In the above-described embodiment, an example in which the feature parameter of the uttered voice at the time of authentication is compared and collated with the feature parameter stored in advance for each microphone is shown. However, the feature parameter of the two microphones is integrated and collated. May be.

図６に変形例の話者認証方法を示す。予め決められている利用者の名前などを事前に発話してもらい、非接触型マイク１と接触型マイク２を用いて音声を収録する。非接触型マイク１で収録した音声と接触型マイク２で収録した音声をそれぞれ別個に上述した手法により分析し、特徴パラメーターを抽出する。そして、非接触型マイク１の特徴パラメーターの一部と接触型マイク２の特徴パラメーターの一部を持ち寄って一つの特徴パラメーターに統合し、利用者の特徴パラメーターとして記憶装置６に記憶しておく。 FIG. 6 shows a modified speaker authentication method. A user's name determined in advance is uttered in advance, and voice is recorded using the non-contact microphone 1 and the contact microphone 2. The voice recorded by the non-contact type microphone 1 and the voice recorded by the contact type microphone 2 are separately analyzed by the method described above, and feature parameters are extracted. Then, a part of the characteristic parameters of the non-contact microphone 1 and a part of the characteristic parameters of the contact microphone 2 are brought together and integrated into one characteristic parameter, and stored in the storage device 6 as a user characteristic parameter.

認証時には予め決められている利用者の名前などを発話してもらい、その発話音声を非接触型マイク１と接触型マイク２で収録する。非接触型マイク１で収録した音声と接触型マイク２で収録した音声をそれぞれ別個に上述した手法により分析し、特徴パラメーターを抽出する。そして、非接触型マイク１の特徴パラメーターの一部と接触型マイク２の特徴パラメーターの一部とを持ち寄って一つの特徴パラメーターに統合し、記憶装置６に記憶されている特徴パラメーターと比較照合する。一致の照合結果が得られた場合は利用者として適格とし、不一致の照合結果が得られた場合は利用者として不的確とする。 At the time of authentication, the user's predetermined name is uttered, and the uttered voice is recorded by the non-contact microphone 1 and the contact microphone 2. The voice recorded by the non-contact type microphone 1 and the voice recorded by the contact type microphone 2 are separately analyzed by the method described above, and feature parameters are extracted. Then, a part of the characteristic parameters of the non-contact type microphone 1 and a part of the characteristic parameters of the contact type microphone 2 are brought together and integrated into one characteristic parameter, and compared with the characteristic parameter stored in the storage device 6. . If a matching result is obtained, the user is qualified, and if a mismatching result is obtained, the user is inaccurate.

このように、一実施の形態の変形例によれば、予め、人の発話音声の空気伝導音を集音する非接触型マイクと、人の発話音声の体内伝導音を集音する接触型マイクとにより集音した人の発話音声から特徴パラメーターを抽出するとともに、非接触型マイクの特徴パラメーターの一部と接触型マイクの特徴パラメーターの一部を統合して特徴パラメーターを生成して記憶しておき、非接触型マイクと接触型マイクにより集音した人の発話音声から特徴パラメーターを抽出するとともに、両マイクの特徴パラメーターを統合し、記憶されている統合結果の特徴パラメーターと比較照合し、一致の照合結果が得られた場合に利用者として適格であると認証するようにしたので、認証に用いる情報量が多くなり、しかも環境雑音の混入が少ない体内伝導音から得られた情報を用いることによって、音声認証精度を向上させることができる。 As described above, according to the modification of the embodiment, the non-contact microphone that collects the air conduction sound of the person's uttered voice and the contact microphone that collects the body conduction sound of the person's uttered voice in advance. The feature parameters are extracted from the uttered voice of the person who collected the sound and the feature parameters of the non-contact microphone and the feature parameters of the contact microphone are integrated to generate and store the feature parameters. In addition, the feature parameters are extracted from the speech uttered by the person collected by the non-contact microphone and the contact microphone, and the feature parameters of both microphones are integrated, compared with the stored feature parameters of the integrated results, and matched. When the verification result is obtained, it is authenticated that the user is qualified as a user, so that the amount of information used for authentication increases, and there is little contamination with environmental noise. By using the information obtained from the sound, it is possible to improve the voice authentication accuracy.

また、一実施の形態の変形例によれば、非接触型マイクの特徴パラメーターの高音域部分と接触型マイクの低音域部分とを統合するようにしたので、両マイクの優れた音響特性を有効に活用してＳ／Ｎ比の高い音声を集音でき、音声認証精度をさらに向上させることができる。 In addition, according to the modification of the embodiment, the high frequency range portion of the characteristic parameter of the non-contact type microphone and the low frequency range portion of the contact type microphone are integrated, so that the excellent acoustic characteristics of both microphones are effective. This makes it possible to collect voices with a high S / N ratio and further improve voice authentication accuracy.

特許請求の範囲の構成要素と一実施の形態の構成要素との対応関係は次の通りである。すなわち、非接触型マイク１が第１マイクを、接触型マイク２が第２マイクを、演算装置５が特徴抽出手段、比較照合手段および統合手段を、記憶装置５が特徴記憶手段をそれぞれ構成する。なお、以上の説明はあくまで一例であり、発明を解釈する際、上記の実施の形態の記載事項と特許請求の範囲の記載事項との対応関係になんら限定も拘束もされない。 The correspondence between the constituent elements of the claims and the constituent elements of the embodiment is as follows. That is, the non-contact type microphone 1 constitutes the first microphone, the contact type microphone 2 constitutes the second microphone, the arithmetic device 5 constitutes the feature extraction means, the comparison / collation means, and the integration means, and the storage device 5 constitutes the feature storage means. . The above description is merely an example, and when interpreting the invention, the correspondence between the items described in the above embodiment and the items described in the claims is not limited or restricted.

一実施の形態の構成を示す図である。It is a figure which shows the structure of one embodiment. 発話音声の特徴パラメーターの一例を示す図である。It is a figure which shows an example of the characteristic parameter of speech sound. 一実施の形態の音声認証動作を示すフローチャートである。It is a flowchart which shows the audio | voice authentication operation | movement of one Embodiment. 一実施の形態の音声認証原理を示す図である。It is a figure which shows the audio | voice authentication principle of one Embodiment. 話者認証原理を説明するための図である。It is a figure for demonstrating a speaker authentication principle. 非接触型マイクの特徴パラメーターと接触型マイクの特徴パラメーターを統合して認証する方法を説明する図である。It is a figure explaining the method to integrate and authenticate the characteristic parameter of a non-contact type microphone, and the characteristic parameter of a contact type microphone.

Explanation of symbols

１非接触型マイク
２接触型マイク
３マイクアンプ
４ＡＤ変換装置
５演算装置
６記憶装置 DESCRIPTION OF SYMBOLS 1 Non-contact-type microphone 2 Contact-type microphone 3 Microphone amplifier 4 AD converter 5 Arithmetic device 6 Storage device

Claims

A first microphone that collects air conduction sound of human speech;
A second microphone that collects the body conduction sound of human speech,
Feature extraction means for extracting feature parameters from the speech of a person collected by the first microphone and the second microphone;
Feature storage means for collecting human speech using the first microphone and the second microphone, and extracting and storing feature parameters by the feature extraction means;
A comparison collation in which feature parameters are extracted by the feature extraction unit from speech uttered by a person collected by the first microphone and the second microphone, and the feature parameter stored in the feature storage unit is compared with each microphone. Means and
A voice authentication apparatus that authenticates whether or not the user is qualified based on a result of the comparison and collation by the comparison and collation means.

The voice authentication device according to claim 1,
A voice authentication device that authenticates a user as a user when both of the first microphone and the second microphone obtain matching results by the comparison and verification unit.

The voice authentication device according to claim 1,
A voice authentication device that authenticates a user as a user when a matching result is obtained in either one of the first microphone and the second microphone by the comparison and collation means.

A first microphone that collects air conduction sound of human speech;
A second microphone that collects the body conduction sound of human speech,
Feature extraction means for extracting feature parameters from the speech of a person collected by the first microphone and the second microphone;
Integration means for generating a characteristic parameter by integrating a part of the characteristic parameter of the first microphone and a part of the characteristic parameter of the second microphone;
In advance, human speech is collected by the first microphone and the second microphone, the feature parameters of both microphones are extracted by the feature extraction means, and the feature parameters of both microphones are integrated and stored by the integration means. A feature storage means,
The feature extraction unit extracts feature parameters of both microphones from the speech of a person collected by the first microphone and the second microphone, and the feature storage unit integrates the feature parameters of both microphones. A comparison / matching means for comparing and matching with the feature parameter of the integrated result stored in
A voice authentication device that authenticates a user as a user when a matching result is obtained by the comparison and matching unit.

The voice authentication device according to claim 4,
The voice authentication apparatus, wherein the integration unit integrates a high-frequency part of a characteristic parameter of the first microphone and a low-frequency part of the second microphone.