JP2006010809A

JP2006010809A - Personal identification system

Info

Publication number: JP2006010809A
Application number: JP2004184664A
Authority: JP
Inventors: Shogo Kameyama; 昌吾亀山
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2004-06-23
Filing date: 2004-06-23
Publication date: 2006-01-12
Anticipated expiration: 2024-06-23
Also published as: JP4359887B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a personal identification system which can increase security level more and whose security can hardly be broken so long as the person himself or herself who is alive does not operate it directly. <P>SOLUTION: Speech information of an object person to be authenticated is detected by both a bone conduction sound detection part 340 and an air duct sound detection part 304 to perform identification processing based upon both their bone conduction speech information and air duct speech information. Feature information which can not be known with a bone conduction sound or air duct sound alone and originates from differences between both the waveforms can be grasped to greatly increase the security level of the personal identification. The bone conduction speech information and air duct speech information are homogeneous speech information in terms of information kind, so processes by hardware and software can easily be made common and the feature information originating from the differences between the waveforms can easily be extracted through operations. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は、音声を用いた個人認証システムに関する。 The present invention relates to a personal authentication system using voice.

特開２０００−２５９８２８号公報JP 2000-259828 A 特開２００４−８００８０号公報Japanese Patent Laid-Open No. 2004-80080 特開２００３−５８１９０号公報JP 2003-58190 A

個人の認証方式として、認証対象者の音声波に含まれる個人性情報を利用した、いわゆる話者認識の技術が広く活用されている。例えば、最近では、特許文献１〜４に開示されているごとく、携帯電話のセキュリティレベルを高めるために、話者認識を含めた個人認証方式が種々提案されている。最近、携帯電話の普及台数が急増しており、新機種開発競争も激化していることから、機種の買い替えサイクルも短縮している。携帯電話機は電話帳やメールアドレスリストなどの個人データの蓄積媒体となることから、データの残された廃電話機がジャンクとして売買され、個人情報流出を引き起こす問題が指摘されている。また、インターネット接続などの情報通信端末機能を搭載した携帯電話機が標準化してしつつあり、情報提供課金や買い物などの決済あるいはモバイルバンキングなどにも広く利用されていることや、住居や建物などの建造物や自動車などのロック操作端末への流用も考えられていることから、より高度のセキュリティレベルが要求されている。特許文献１及び２には、音声による認証だけでなく、顔画像照合や指紋照合などの別の手段による認証方式を組み合わせることでセキュリティレベルを高める技術が開示されている。 As a personal authentication method, a so-called speaker recognition technique using personality information included in a voice wave of a person to be authenticated is widely used. For example, recently, as disclosed in Patent Documents 1 to 4, various personal authentication methods including speaker recognition have been proposed in order to increase the security level of mobile phones. In recent years, the proliferation of mobile phones has been increasing rapidly, and competition for new model development has intensified, so the model replacement cycle has also been shortened. Since mobile phones serve as a storage medium for personal data such as a telephone directory and mail address list, a problem has been pointed out that waste telephones with data remaining are sold as junk and cause personal information leakage. In addition, mobile phones equipped with information communication terminal functions such as the Internet connection are becoming standardized, and are widely used for payment for information provision, payments for shopping, mobile banking, etc. Since diversion to lock operation terminals such as buildings and automobiles is also considered, a higher security level is required. Patent Documents 1 and 2 disclose a technique for raising a security level by combining not only voice authentication but also an authentication method using another means such as face image matching or fingerprint matching.

また、別の問題として、騒音レベルの高い場所では話者認識の精度が低下することから、特許文献３には、骨伝導マイクを用いた話者認識による認証機能を搭載した携帯電話機が開示されている。骨伝導マイクは、人体骨格と組織とを媒介として音声を検出するので、気中騒音の影響を受けにくい利点がある。 As another problem, since the accuracy of speaker recognition is lowered in a place with a high noise level, Patent Document 3 discloses a mobile phone equipped with an authentication function based on speaker recognition using a bone conduction microphone. ing. Since the bone conduction microphone detects sound through the human skeleton and tissue, there is an advantage that the bone conduction microphone is hardly affected by air noise.

近年、セキュリティシステムが高度化するにつれ、それを不法に破る犯罪的手法も高度化ないし大胆化してきている。例えば、特許文献１や特許文献２のように、指紋や顔などの画像による認証と、音声による認証とを組み合わせる方法が採用されている場合、セキュリティの突破は一見非常に難しいように見える。しかし、次のような方法を採用すれば、複数段に張り巡らされたセキュリティステップを全てかいくぐることも不可能ではないのである。すなわち、顔については写真や映像を、音声については録音テープを、指紋については写真製版したスタンプや、果ては認証対象者本人から切り取った腕や指などを個別に用いて、正規利用者の存在状態をバーチャルに再現し、順次的に受理認証を得て行くのである。この方法は、生きた本人がその場にいなくてもセキュリティ突破が可能であり、誘拐・拉致などのリスクの大きな方法を必ずしも必要としない。また、仮に誘拐などを犯す凶悪犯罪が絡む場合でも、認証に必要な情報を一旦本人から得てしまえば、あとは複製や取得品（指など）を活用すればこと足りるので、用済みとなった本人を口封じ等のために抹殺することにも躊躇が働かなくなる惧れがある。 In recent years, as security systems become more sophisticated, criminal techniques that illegally break them have become more sophisticated or bold. For example, as in Patent Document 1 and Patent Document 2, when a method that combines authentication using images such as fingerprints and faces and authentication using voice is adopted, it seems that it is very difficult to break through security. However, if the following method is adopted, it is not impossible to pass through all the security steps spread over a plurality of stages. In other words, the presence or absence of a legitimate user using a photo or video for the face, a recording tape for the voice, a photo-engraved stamp for the fingerprint, or an arm or finger cut from the person to be authenticated. Is reproduced virtually and the acceptance certification is obtained sequentially. This method can break through security even if the living person is not on the spot, and does not necessarily require a risky method such as kidnapping and abduction. In addition, even if a violent crime involving kidnapping is involved, once the information necessary for authentication is obtained from the person, it is sufficient to use a copy or an acquired product (finger etc.), so it is used There is a possibility that the trap will not work even if the person is killed for the purpose of sealing.

一方、特許文献３の技術は、確かに騒音に対する話者認識の感度の向上には寄与しても、骨伝導音しか用いないので、個人を確実に特定・認証するための特徴情報を必ずしも豊富に取得できるわけではなく、セキュリティレベルそのものの向上に関しては貢献度が小さい。 On the other hand, although the technique of Patent Document 3 certainly contributes to the improvement of the sensitivity of speaker recognition to noise, it uses only bone conduction sound, so it does not necessarily have abundant feature information for reliably identifying and authenticating individuals. However, it does not contribute much to improving the security level itself.

本発明の課題は、セキュリティレベルの更なる向上が可能であり、かつ、生きた本人が直接操作しない限り、セキュリティ突破することが困難な個人認証システムを提供することにある。 It is an object of the present invention to provide a personal authentication system that can further improve the security level and is difficult to break through unless a living person directly operates.

Means and actions / effects for solving the problems

本発明は、認証処理対象者を、当該認証処理対象者の発する音声に基づいて認証する個人認証システムに係り、上記の課題を解決するために、
認証処理対象者の音声情報を骨伝導音にて検出する骨伝導音検出部と、
検出された骨伝導音声情報を記憶する骨伝導音声情報記憶部と、
認証処理対象者の音声情報を気導音にて検出する気導音検出部と、
検出された気導音声情報を記憶する気導音声情報記憶部と、
骨伝導音声情報と気導音声情報との双方に基づいて認証処理を行なう認証処理手段と、を備えたことを特徴とする。 The present invention relates to a personal authentication system that authenticates a person who is a subject of authentication processing based on a voice issued by the person subject to the authentication process.
A bone conduction sound detection unit for detecting the voice information of the person to be authenticated by bone conduction sound;
A bone conduction speech information storage unit for storing the detected bone conduction speech information;
An air conduction sound detection unit for detecting voice information of the person to be authenticated by air conduction sound;
An air conduction sound information storage unit for storing detected air conduction sound information;
And an authentication processing means for performing an authentication process based on both the bone conduction voice information and the air conduction voice information.

話者認識による認証方式では、特許文献１〜３の開示内容からも明らかな通り、音声検知のステップに関しては、騒音等による検出精度のみが考慮され、声帯から気道を通って空中に放出される気中伝導音（本発明では、これを「気導音」と称する）を通常のマイクで検知するか、骨伝導音を専用の骨伝導マイクで検知するかは、システムをどのような音環境下で使用するかに応じて適宜選択すればよく、両者を併用する思想は全くなかった。 In the authentication method based on speaker recognition, as is clear from the disclosures of Patent Documents 1 to 3, only the detection accuracy due to noise or the like is taken into consideration in the voice detection step, and is released from the vocal cords through the airway into the air. Whether the air conduction sound (in the present invention, this is referred to as “air conduction sound”) is detected by a normal microphone or the bone conduction sound is detected by a dedicated bone conduction microphone, the sound environment of the system It may be selected appropriately depending on whether it is used below, and there was no idea of using both in combination.

しかしながら、気導音は音波の伝わる媒体が空気であるのに対し、骨伝導音の媒体は、骨伝導音検出部（具体的には骨伝導マイク）と声帯との間に介在する人体組織及び骨格であり、音響インピーダンス構造が全く異なる。その結果、検知される音声波形もその影響を受け、共通の声帯から発せられる音声であるにも関わらず、気導音と伝導音と検知波形には少なからぬ差異を生ずる。骨伝導音の伝播経路は人体組織及び骨格が介在するため、気導音媒体の空気に比べて複雑かつ不均質であり、伝播速度、振幅、音響的な共振周波数など、音声伝播に影響するパラメータにも分布があるため、声帯からの原音波形は、骨伝導音として伝播する過程において、気導音よりもはるかに大きな変質を受ける。当然、伝播経路となる人体組織や骨格には個人差があり、それに応じて気導音と骨伝導音との波形にも、人によって固有の差が生ずる。 However, air conduction sound is air as a medium through which sound waves are transmitted, whereas bone conduction sound medium is a human tissue that is interposed between a bone conduction sound detection unit (specifically, a bone conduction microphone) and a vocal cord. It is a skeleton and the acoustic impedance structure is completely different. As a result, the detected speech waveform is also affected, and there are considerable differences between the air conduction sound, the conduction sound, and the detection waveform even though the sound sound is emitted from a common vocal cord. The propagation path of bone conduction sound is complicated and inhomogeneous compared to air of air conduction sound medium because human tissue and skeleton are involved, and parameters that affect sound propagation such as propagation speed, amplitude, and acoustic resonance frequency Therefore, the original sound waveform from the vocal cords undergoes much greater alteration than the air conduction sound in the process of propagation as a bone conduction sound. Naturally, there are individual differences in the human body tissue and skeleton serving as a propagation path, and accordingly, there is a unique difference in the waveforms of the air conduction sound and the bone conduction sound.

本発明者は、骨伝導音声情報と気導音声情報とのこのような差異に着目し、その両者を組み合わせることで、個人認証技術上、種々の画期的な効果が生ずることを見出し、本発明を完成するに至ったのである。具体的には、骨伝導音声情報と気導音声情報との単独では達成できない以下のような特有の効果を生ずるのである。
（１）骨伝導音と気導音との単独では知りえなかった、両波形の差異に由来した特徴情報が新たに把握可能となる。その結果、個人認証のセキュリティレベルを大幅に高めることができる。
（２）骨伝導音声情報と気導音声情報とが、いずれも情報種別としては同質の音声情報であるため、ハードウェアやソフトウェアの処理上の共有化も容易であり、波形の差異に由来した特徴情報を演算によって抽出することも容易である。 The present inventor has paid attention to such a difference between the bone conduction voice information and the air conduction voice information, and found that combining these both produces various epoch-making effects on personal authentication technology. The invention has been completed. Specifically, the following specific effects that cannot be achieved by the bone conduction voice information and the air conduction voice information alone are produced.
(1) It becomes possible to newly grasp characteristic information derived from the difference between the two waveforms, which could not be known solely by the bone conduction sound and the air conduction sound. As a result, the security level of personal authentication can be significantly increased.
(2) Since both the bone conduction voice information and the air conduction voice information are the same kind of voice information as the information type, it is easy to share the processing of hardware and software, resulting from the difference in waveform It is also easy to extract feature information by calculation.

骨伝導音声情報と気導音声情報とは、認証処理対象者が発する音声を、骨伝導音検出部と気導音検出部とにより同時検出することにより生成されるものとすることがより望ましい。これにより、さらに以下のような新たな効果を生ずる。
（１）骨伝導音は、検知に際しての人体接触が介在するために録音等による正確な再現が比較的難しく、これと気導音とを同時サンプリングしなければならないので、生きた本人が直接操作しない限り、セキュリティ突破することが非常に困難である。
（２）骨伝導音と気導音との波形源が同一となり、別々に発声された音声を個別に骨伝導音又は気導音として検知する場合と比較して、骨伝導音と気導音との音声波形としての相関が強まるので、波形の差異に占める認証対象者固有の差異成分、つまり、認証に利用可能な特徴情報をより明確に把握でき、認証精度を高めることができる。 More preferably, the bone conduction sound information and the air conduction sound information are generated by simultaneously detecting the sound produced by the person to be authenticated by the bone conduction sound detection unit and the air conduction sound detection unit. As a result, the following new effects are produced.
(1) The bone conduction sound is relatively difficult to accurately reproduce by recording because of the human body contact at the time of detection, and it must be sampled simultaneously with the air conduction sound. Unless it is, it is very difficult to break through security.
(2) Bone conduction sound and air conduction sound are compared to the case where the bone conduction sound and air conduction sound have the same waveform source, and separately uttered sounds are detected as bone conduction sound or air conduction sound. Since the correlation as a voice waveform is strengthened, the difference component unique to the authentication target in the waveform difference, that is, the feature information that can be used for authentication can be grasped more clearly, and the authentication accuracy can be improved.

認証処理手段は、骨伝導音声情報と気導音声情報との双方に基づく照合元音声特徴情報の、その照合先となる標準音声特徴情報を記憶した標準音声特徴情報記憶部と、照合元音声特徴情報を該標準音声特徴情報と照合する照合手段とを有するものとして構成できる。認証特定対象者（受理認証されるべき（つまり、「正しい」と認証されるべき）対象者）の気導音情報と骨伝導音情報とに基づいて標準音声特徴情報を予め作成しておき、これを、認証時に認証処理対象者から取得した照合元音声特徴情報の照合先として利用することにより、認証処理の簡略化と精度の向上とを図ることができる。なお、標準音声特徴情報として後述のごとき位相差を用いて認証を行なう場合等においては、認証特定対象者の標準音声を、システム外に設けられた骨伝導音検出部と気導音検出部とにより検出して作成することも可能である。しかし、ハードウェア間の特性相違の影響等を軽減する観点からは、標準音声特徴情報を、（システム自体に設けられた）骨伝導音検出部と気導音検出部とにより検出して作成することがより有効であり、標準音声特徴情報の作成処理も当然簡単となる。 The authentication processing means includes: a standard voice feature information storage unit that stores standard voice feature information that is a collation destination of collation source voice feature information based on both bone conduction voice information and air conduction voice information; and a collation source voice feature It can be configured as having a matching means for matching information with the standard voice feature information. Standard voice feature information is created in advance based on the air conduction sound information and the bone conduction sound information of the authentication specific target person (the person who should be accepted and authenticated (that is, the person who should be authenticated as “correct”)), By using this as a verification destination of verification source voice feature information acquired from the authentication processing target person at the time of authentication, it is possible to simplify the authentication processing and improve accuracy. In the case where authentication is performed using a phase difference as described later as standard voice feature information, the standard voice of the person to be authenticated is used as a bone conduction sound detection unit and an air conduction sound detection unit provided outside the system. It is also possible to detect and create by. However, from the viewpoint of reducing the influence of differences in characteristics between hardware, standard voice feature information is detected and created by the bone conduction sound detection unit (provided in the system itself) and the air conduction sound detection unit. Is more effective, and the process of creating standard audio feature information is naturally simplified.

音声特徴情報は、骨伝導音の周波数スペクトルと気導音の周波数スペクトルを含むものとすることができる。この場合、照合手段は、それら周波数スペクトルを、標準音声特徴情報に含まれる骨伝導音と気導音との各標準周波数スペクトルと照合し、それらの双方において照合一致結果が得られた場合に受理認証するものとすることができる。同一人物の音声であっても、骨伝導音の周波数スペクトルと気導音の周波数スペクトルとは互いに相違するので、骨伝導音と気導音との周波数スペクトルをそれぞれ対応する標準周波数スペクトルと照合することで、高精度の個人認証が高くなる。この効果は、認証の対象となる周波数スペクトルと標準周波数スペクトルとのいずれについても、骨伝導音検出部と気導音検出部とにより認証処理対象者が発する音声を同時検出して作成したものを使用した場合に特に高められる。骨伝導音と気導音との双方の周波数スペクトルを用いて照合を行なうのであるから、各波形単独では特定し得ない、両波形の差異に由来した特徴情報を結果的に含んだ認証方式となる。 The audio feature information can include a frequency spectrum of bone conduction sound and a frequency spectrum of air conduction sound. In this case, the collation means collates the frequency spectrum with each standard frequency spectrum of the bone conduction sound and the air conduction sound included in the standard voice feature information, and accepts the collation match result in both of them. It can be authenticated. Even if the voice of the same person, the frequency spectrum of bone conduction sound and the frequency spectrum of air conduction sound are different from each other, so the frequency spectrum of bone conduction sound and air conduction sound is compared with the corresponding standard frequency spectrum. This increases the accuracy of personal authentication. This effect is obtained by simultaneously detecting the sound generated by the person to be authenticated by the bone conduction sound detection unit and the air conduction sound detection unit for both the frequency spectrum and the standard frequency spectrum to be authenticated. Increased especially when used. Since verification is performed using the frequency spectrum of both bone conduction sound and air conduction sound, the authentication method includes characteristic information derived from the difference between the two waveforms, which cannot be specified by each waveform alone. Become.

一方、本発明の個人認証システムは、骨伝導音検出部が検出する骨伝導音波形と、気導音検出部が検出する気導音波形との双方を用いたときにのみ演算可能となる複合音声特徴情報を演算する複合音声特徴情報演算手段を有したものとして構成でき、認証処理手段は、該複合音声特徴情報に基づいて認証処理を行なうものとすることができる。この方法は、骨伝導音と気導音との各波形単独では特定し得ない両波形の差異に由来した特徴情報を、複合音声特徴情報として演算により抽出把握する方法に他ならず、２種の音声情報の組み合わせによる認証精度及びセキュリティレベルの向上効果を一層高めることができる。 On the other hand, the personal authentication system of the present invention is a composite that can be calculated only when both the bone conduction sound waveform detected by the bone conduction sound detection unit and the air conduction sound waveform detected by the air conduction sound detection unit are used. It can be configured as having composite voice feature information calculation means for calculating voice feature information, and the authentication processing means can perform authentication processing based on the composite voice feature information. This method is nothing but a method of extracting and grasping the feature information derived from the difference between the two waveforms that cannot be specified by each waveform of the bone conduction sound and the air conduction sound as composite voice feature information. The improvement effect of the authentication accuracy and the security level by the combination of voice information can be further enhanced.

複合音声特徴情報演算手段は、気導音波形と骨伝導音波形との位相差を複合音声特徴情報として演算することができる。前述のごとく、骨伝導音の伝播経路となる人体組織及び骨格は、その音響インピーダンスの分布状況に個人の生体的特徴が直接的に反映される。具体的には、生体（つまり、認証すべき個人）毎に、インピーダンス不連続部分（例えば組織境界など）等での反射波の形成状況や位相遅延状況などが相違するので、骨伝導音波形は気導音波形に対し認証すべき個人毎に異なる位相差を有するものとなり、個人識別性を有する。そこで、気導音波形と骨伝導音波形との位相差を演算によって求めれば、これを複個人認証のための有効かつ重要な情報として用いることができる。この場合、位相差を正確に演算するには、骨伝導音と気導音とは、同一音声に対して同時検出したものを用いることが必要である。 The composite voice feature information calculation means can calculate the phase difference between the air conduction sound waveform and the bone conduction sound waveform as the composite sound feature information. As described above, in the human body tissue and skeleton that are the propagation paths of the bone conduction sound, the biological characteristics of the individual are directly reflected in the distribution state of the acoustic impedance. Specifically, because the living wave (that is, the individual to be authenticated) differs in the formation of reflected waves and the phase delay at impedance discontinuities (eg, tissue boundaries), the bone conduction sound waveform is Each individual to be authenticated with respect to the air conduction sound waveform has a different phase difference, and has personal identification. Therefore, if the phase difference between the air conduction sound waveform and the bone conduction sound waveform is obtained by calculation, this can be used as effective and important information for multi-person authentication. In this case, in order to accurately calculate the phase difference, it is necessary to use the bone conduction sound and the air conduction sound that are simultaneously detected for the same sound.

この場合、予め特定されている認証特定対象者固有の気導音波形と骨伝導音波形との位相差を標準位相差として求めておき、認証処理手段は、演算された位相差が該標準位相差と一致しているか否かに基づいて認証処理を行なうことができる。波形位相差自体は、比較的単純な波形演算（例えば、２つの波形の位相差を種々に設定して差分ないし加算波形を演算し、積分振幅が最小化ないし最大化する位相差を求める方法）により求めることができ、スペクトル照合等と比較して演算負荷を軽減できる利点がある。 In this case, the phase difference between the air conduction sound waveform and the bone conduction sound waveform peculiar to the authentication identification subject specified in advance is obtained as a standard phase difference, and the authentication processing means determines that the calculated phase difference is the standard level. Authentication processing can be performed based on whether or not it matches the phase difference. The waveform phase difference itself is a relatively simple waveform calculation (for example, a method of calculating a difference or addition waveform by setting the phase difference between two waveforms in various ways to obtain a phase difference that minimizes or maximizes the integrated amplitude). There is an advantage that the calculation load can be reduced as compared with spectrum matching or the like.

なお、気導音と骨伝導音とは周波数スペクトルにも差があるので、両波形に共通に含まれる周波数成分を抽出して位相差を求めると、より正確な位相差の演算が可能である。この場合、該周波数成分の抽出は周知のディジタルフィルタ技術を活用して実施することができる。 Since the air conduction sound and the bone conduction sound also have a difference in frequency spectrum, more accurate calculation of the phase difference is possible by obtaining the phase difference by extracting the frequency components that are included in both waveforms in common. . In this case, the extraction of the frequency component can be performed using a known digital filter technique.

また、複合音声特徴情報は、上記のような両波形の位相差に限られるものではなく、例えば、気導音と骨伝導音との各周波数スペクトルの差分スペクトルを利用することもできる。骨伝導音は、伝播経路に介在する人体の、減衰ないし共振などの音響特性が個人差を有し、結果的に気導音に対し不足ないし強調される周波数成分も個人により差を生ずる。従って、気導音と骨伝導音との差分スペクトルは個人識別性を有することになる。また、気導音と骨伝導音との共通スペクトル（個々の周波数スペクトルから上記差分スペクトルを減じたもの）など、個々の周波数スペクトルと上記差分スペクトルとの数学的操作により等価的に得られるスペクトルも、複合音声特徴情報として当然に活用できる。 The composite voice feature information is not limited to the phase difference between the two waveforms as described above. For example, the difference spectrum of each frequency spectrum between the air conduction sound and the bone conduction sound can be used. The bone conduction sound has individual differences in acoustic characteristics such as attenuation or resonance of the human body intervening in the propagation path. As a result, the frequency component that is insufficient or emphasized with respect to the air conduction sound also varies depending on the individual. Therefore, the difference spectrum between the air conduction sound and the bone conduction sound has personal identification. There are also spectra obtained equivalently by mathematical operations of individual frequency spectra and the above difference spectra, such as the common spectrum of air conduction sound and bone conduction sound (subtracting the above difference spectrum from each frequency spectrum). Naturally, it can be used as composite voice feature information.

上記のような位相差や差分スペクトルの発生要因は、主として骨伝導音の伝播経路をなす骨格や人体組織の機械的構造に起因するものであるから、のどの調子などによって認証対象となる音声に多少変質が生じていても誤認等を生じにくい利点がある。 The cause of the phase difference and difference spectrum as described above is mainly due to the skeleton and the mechanical structure of the human tissue that form the propagation path of the bone conduction sound. There is an advantage that misidentification is less likely to occur even if some alteration occurs.

また、認証処理手段は、認証処理を、骨伝導音の周波数スペクトルと気導音の周波数スペクトルとの少なくともいずれかを標準周波数スペクトルと照合する第一の認証処理と、複合音声特徴情報に基づく第二の認証処理とを組み合わせて実施するものとすることもできる。骨伝導音の周波数スペクトルと気導音の周波数スペクトルとのいずれかに基づく従来の音声認証方式は、スペクトル照合の手法により高い個人識別性を有している反面、録音等を利用した誤魔化しなどへのセキュリティホールも抱えている。しかし、上記のような複合音声特徴情報（特に、演算が簡単な位相差）による認証処理を組み合わせることで、上記のようなセキュリティホールの発生を効果的に防止することができる。 Further, the authentication processing means performs the authentication process by performing a first authentication process for comparing at least one of the frequency spectrum of the bone conduction sound and the frequency spectrum of the air conduction sound with the standard frequency spectrum, and the first based on the composite voice feature information. It can also be implemented in combination with the second authentication process. The conventional voice authentication method based on either the frequency spectrum of the bone conduction sound or the frequency spectrum of the air conduction sound has high personal identification by the method of spectrum matching, but it is also used for misrepresentation using recording etc. There are also security holes. However, it is possible to effectively prevent the occurrence of the security hole as described above by combining the authentication processing based on the composite voice feature information (particularly, a phase difference that is easy to calculate) as described above.

以下、本発明の実施の形態を添付の図面を用いて詳しく説明する。
この実施形態では、本発明の個人認証システムの機能を携帯電話に組み込む場合を例にとって説明する。図１は、携帯電話１の一例を示す外観斜視図である。携帯電話１は、本体の上寄りに受話器３０３が、同じく下寄りに送話器３０４が配置されており、両者の間には、液晶表示装置（例えば、カラー液晶表示装置）にて構成された液晶モニタ３０８、入力部３０５、及び携帯電話１をオンフック状態とオフフック状態との間で切り換えるオンフック／オフフック切換スイッチ３０６が設けられている。本実施形態において携帯電話１は、線電話通信網だけでなく、インターネット等の情報通信網へのアクセスも可能なものとされている。入力部は、情報入力用のキーボードに兼用された通話ダイアルキー３０５ａ、カーソル移動キー３０５ｂ、及び通話モードや情報検索モード等の使用モードを切り換えるためのモード切替キー３０５ｃ等を含んでいる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
In this embodiment, a case where the function of the personal authentication system of the present invention is incorporated in a mobile phone will be described as an example. FIG. 1 is an external perspective view showing an example of a mobile phone 1. The mobile phone 1 is provided with a receiver 303 on the upper side of the main body and a transmitter 304 on the lower side, and a liquid crystal display device (for example, a color liquid crystal display device) between the two. An on-hook / off-hook switch 306 that switches the liquid crystal monitor 308, the input unit 305, and the mobile phone 1 between an on-hook state and an off-hook state is provided. In the present embodiment, the mobile phone 1 can access not only a line telephone communication network but also an information communication network such as the Internet. The input unit includes a call dial key 305a that is also used as an information input keyboard, a cursor movement key 305b, and a mode switching key 305c for switching use modes such as a call mode and an information search mode.

送話器３０４は、気導音検出部を兼ねるマイクにて構成される。他方、受話器３０３は本実施形態では骨伝導スピーカで構成され、これと近接して骨伝導音検出部としての骨伝導マイク３４０が配置されている。骨伝導スピーカの基本構成は、例えば特許第２９６７７７７号公報あるいは特開２００３-３４０３７０号公報等により、骨伝導マイクの基本構成は、例えば実開昭５５−１４６７８５号公報、特開昭５８−１８２３９７号公報、実開昭６３−１７３９９１号公報あるいは特許第３４８８７４９号公報等により、各々周知であるため詳細な説明は省略する。これらはいずれも耳か耳下の顎骨などに当てて使用するものである。 The transmitter 304 is composed of a microphone that also serves as an air conduction sound detection unit. On the other hand, the handset 303 is constituted by a bone conduction speaker in the present embodiment, and a bone conduction microphone 340 as a bone conduction sound detecting unit is disposed in the vicinity thereof. The basic configuration of the bone conduction speaker is, for example, Japanese Patent No. 2967777 or Japanese Patent Laid-Open No. 2003-340370, and the basic configuration of the bone conduction microphone is, for example, Japanese Utility Model Laid-Open No. 55-146785, Japanese Patent Laid-Open No. 58-18297. Since it is well known in Japanese Patent Publication No. 63-173991 or Japanese Patent No. 34888749, detailed description is omitted. All of these are applied to the ear or the jawbone below the ear.

図２は、携帯電話１の電気的構成の一例を示すブロック図である。回路の要部は、Ｉ／Ｏポート３１１と、これに接続されたＣＰＵ３１２（認証処理手段、照合手段、複合音声特徴情報演算手段を構成する）、ＲＯＭ３１３、ＲＡＭ３１４（骨伝導音声情報記憶部及び気導音声情報記憶部となる）等からなる制御部３１０を含む。Ｉ／Ｏポート３１１には、前述の入力部３０５、オンフック／オフフック切換スイッチ３０６が接続される。また、受話器３０３はアンプ３１５とＤ／Ａ変換器３１６を介して、送話器３０４はアンプ３１７とＡ／Ｄ変換器３１８を介して、さらに骨伝導マイク３４０はアンプ３２０とＡ／Ｄ変換器３２１を介して、それぞれＩ／Ｏポート３１１に接続されている。また、Ｉ／Ｏポート３１１には、通信接続回路３２３が接続されている。接続回路３２３は、制御部３１０と接続するための接続インターフェース３３１と、これに接続された変調器３３２、送信機３３３、周波数シンセサイザ３３４、受信機３３５、復調器３３６及び共用器３３７等により構成されている。制御部３１０からのデータ信号は変調器３３２により変調され、さらに送信機３３３により共用器３３７を介してアンテナ３３９から送信される。一方、受信電波はアンテナ３３９及び共用器３３７を介して受信器３３５により受信され、復調器３３６で復調された後、制御部３１０のＩ／Оポート３１１に入力される。なお、通話を行なう場合は、例えば送話器３０４から入力された音声信号がアンプ３１７で増幅され、さらにＡ／Ｄ変換器３１８によりデジタル変換されて制御部３１０に入力される。該信号は、必要に応じて制御部３１０にて加工された後、Ｄ／Ａ変換器３１６及びアンプ３１５を介して受話器３０３から出力される。 FIG. 2 is a block diagram showing an example of the electrical configuration of the mobile phone 1. The main parts of the circuit are an I / O port 311, a CPU 312 (which constitutes an authentication processing unit, a collating unit, and a composite voice feature information calculation unit), a ROM 313, and a RAM 314 (bone conduction voice information storage unit and memory unit) connected thereto. And a control unit 310 including a conductive voice information storage unit. The input unit 305 and the on-hook / off-hook switch 306 described above are connected to the I / O port 311. The receiver 303 is connected via an amplifier 315 and a D / A converter 316, the transmitter 304 is connected via an amplifier 317 and an A / D converter 318, and the bone conduction microphone 340 is connected to an amplifier 320 and an A / D converter. Each of them is connected to the I / O port 311 via the H.321. A communication connection circuit 323 is connected to the I / O port 311. The connection circuit 323 includes a connection interface 331 for connecting to the control unit 310, a modulator 332, a transmitter 333, a frequency synthesizer 334, a receiver 335, a demodulator 336, a duplexer 337, and the like connected thereto. ing. The data signal from the controller 310 is modulated by the modulator 332 and further transmitted from the antenna 339 via the duplexer 337 by the transmitter 333. On the other hand, the received radio wave is received by the receiver 335 via the antenna 339 and the duplexer 337, demodulated by the demodulator 336, and then input to the I / O port 311 of the control unit 310. When making a call, for example, an audio signal input from the transmitter 304 is amplified by the amplifier 317, further digitally converted by the A / D converter 318, and input to the control unit 310. The signal is processed by the control unit 310 as necessary, and then output from the receiver 303 via the D / A converter 316 and the amplifier 315.

一方、接続インターフェース３３１には、制御用電波Ｐを発信する制御用電波発信器３３８がつながれている。制御用電波Ｐは共用器３３７を介してアンテナ３３９から発信される。そして、携帯電話１が別の通信ゾーン１０２に移動した場合、網側の無線回線制御局１０４が制御用電波Ｐの受信状況に基づいて、周知のハンドオーバ処理を行なう。 On the other hand, a control radio wave transmitter 338 that transmits a control radio wave P is connected to the connection interface 331. The control radio wave P is transmitted from the antenna 339 via the duplexer 337. When the mobile phone 1 moves to another communication zone 102, the network-side radio network controller 104 performs a well-known handover process based on the reception status of the control radio wave P.

次に、ＲＯＭ３１４内には、無線電話通信の基本制御プログラムである通信プログラムと、液晶モニタ３０８の画面表示を司る表示プログラムが搭載される。また、図４に示すように、ＲＯＭ３１４内には、携帯電話１のユーザが正規ユーザであるか否かを認証するための認証用プログラム（ＣＰＵ３１２にて実行されることで、認証処理手段を実現する）も搭載されている。本実施形態において認証処理は、具体的には気導音の音声波形と骨伝導音の音声波形の双方を併用した話者認識・照合処理により行われ、上記の認証用プログラムは、メインプログラム２０１と、該メインプログラム２０１が利用するサブモジュール群、具体的には気導音サンプリングモジュール２０２、骨伝導音サンプリングモジュール２０３、気導音／骨伝導音位相差演算・照合判定モジュール２０４、気導音／骨伝導音差分スペクトル演算・照合判定モジュール２０５、波形スペクトル照合・判定モジュール２０６等からなる。これらのプログラム群は、いずれも図２のＲＡＭ３１３をワークエリアとしてＣＰＵ３１２により実行されるものである。 Next, in the ROM 314, a communication program, which is a basic control program for wireless telephone communication, and a display program for controlling the screen display of the liquid crystal monitor 308 are installed. Further, as shown in FIG. 4, in ROM 314, an authentication processing means is realized by being executed by CPU 312 for authenticating whether or not the user of mobile phone 1 is a regular user. Is also installed. In the present embodiment, the authentication process is specifically performed by speaker recognition / collation processing using both the sound waveform of the air conduction sound and the sound waveform of the bone conduction sound. The authentication program is the main program 201. And a sub-module group used by the main program 201, specifically, the air conduction sound sampling module 202, the bone conduction sound sampling module 203, the air conduction sound / bone conduction sound phase difference calculation / collation determination module 204, the air conduction sound. / Bone conduction sound difference spectrum calculation / collation judgment module 205, waveform spectrum collation / judgment module 206, etc. These programs are all executed by the CPU 312 using the RAM 313 in FIG. 2 as a work area.

また、認証用マスターデータ３２２（照合元音声特徴情報となる）として、音声による認証をスペクトル照合処理にて行なう場合（関与するモジュールは符号２０５，２０６）に使用する音声スペクトルのマスターデータ、具体的には気導音音声スペクトルマスターデータ３２１、骨伝導音音声スペクトルマスターデータ２２２及びそれらの差分スペクトルのマスターデータ２２３が用意されている。これらのデータは、認証処理を実施するのに先立って、正規ユーザ（認証特定対象者）に、照合用として予め定められた音（「おん」）、単語ないし文を発音させて、これを受話器３０３（気導音）及び骨伝導マイク３４０（骨伝導音）により波形検出し、周知のフーリエ変換演算を施してスペクトル化することにより作成されるものである。これらのデータは、ユーザ毎に異なるデータになることと、セキュリティレベル向上等のため照合元音声特徴情報を随時更新できるようにするために、書き換え可能なＲＯＭ、具体的には、図２のＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）３２２内に書き換え可能に記憶されており、必要に応じてＲＡＭ３１３の認証用データメモリ内にロードして利用される。 In addition, as master data for authentication 322 (to be verification source voice feature information), audio spectrum master data used when voice authentication is performed in spectrum verification processing (modules involved are reference numerals 205 and 206), specifically Are provided with air conduction sound spectrum master data 321, bone conduction sound sound spectrum master data 222, and master data 223 of their difference spectra. Prior to performing the authentication process, these data are generated by causing a legitimate user (authentication identification target person) to sound a predetermined sound (“on”), word or sentence for verification, and to receive it. The waveform is detected by 303 (air conduction sound) and a bone conduction microphone 340 (bone conduction sound), and a spectrum is generated by performing a well-known Fourier transform operation. These data are different for each user, and the rewritable ROM, specifically, the EEPROM shown in FIG. (Electrically Erasable Programmable Read Only Memory) 322 is stored so as to be rewritable, and is loaded into the authentication data memory of the RAM 313 and used as necessary.

なお、以下においては、複数の具体的な音声認証方式についての説明を行なうが、方式によっては特に用いないモジュール及びデータも存在するので、必要なモジュールとデータを取捨選択して用いるものとする（当該の認証方式にて用いないモジュール及びデータを省略することももちろん可能である）。 In the following description, a plurality of specific voice authentication methods will be described. However, there are modules and data that are not particularly used depending on the method, and therefore, necessary modules and data are selected and used ( Of course, it is possible to omit modules and data not used in the authentication method).

携帯電話１の使用方法は、電話部分については周知であるので詳細な説明は省略し、その使用に先立つ認証処理について以下に詳しく説明する。図１０は、メインプログラム２０１（図４）による認証主処理の流れである。認証処理を行なうためには、照合用のデータ登録などを含む初期化処理を行なう必要がある（Ｓ１）。この初期化処理は、照合用データの更新等を行なう場合を除き、一度行なえば、その後はスキップされるものである。Ｓ３は処理の中心をなす音声認証処理であり、その認証結果により、携帯電話１の機能利用を許可するか否かを示す認証用フラグが、例えばＲＡＭ３１３（図２）内に立てられる。Ｓ５では、その認証フラグをリードし、規定の条件を満たしている場合にロック解除（Ｓ７：つまり、利用許可）、満たさない場合にロック非解除（Ｓ８：つまり、利用不許可）の流れとなる。 Since the method for using the mobile phone 1 is well known for the telephone portion, detailed description thereof is omitted, and authentication processing prior to use will be described in detail below. FIG. 10 is a flow of authentication main processing by the main program 201 (FIG. 4). In order to perform the authentication process, it is necessary to perform an initialization process including registration of data for verification (S1). This initialization process is skipped once it is performed, except when the verification data is updated. S3 is a voice authentication process that is the center of the process, and an authentication flag indicating whether or not to permit the use of the function of the mobile phone 1 is set in, for example, the RAM 313 (FIG. 2) based on the authentication result. In S5, the authentication flag is read, and when the prescribed condition is satisfied, the lock is released (S7: that is, usage is permitted), and when not satisfied, the lock is released (S8: that is, that the usage is not permitted). .

認証によりロック解除される携帯電話１の機能については、周知の電話機能（電話通信網ないしインターネットなどへの接続やメール機能等も含む）に限られるものではなく、例えば、自動車のロック／ロック解除や、エンジン始動、ヘッドライトや車内灯の点灯・消灯など、自動車機能の無線遠隔操作ユニット機能とすることもできる。 The function of the cellular phone 1 that is unlocked by authentication is not limited to a well-known telephone function (including connection to a telephone communication network or the Internet, a mail function, etc.), for example, lock / unlock of a car. Or, it can be a wireless remote control unit function for automobile functions, such as starting the engine, turning on / off the headlights and interior lights.

初期化処理と音声認識処理との、各処理の流れは図１１及び図１５〜図１８に示している。そのいずれにおいても、処理の主要部は、音声データの取得と加工を受け持つ音声データ処理からなる（初期処理ではＳ３０１、音声認証処理ではＳ４０２）。図１２を用いて、この音声データ処理をまず詳細に説明する。Ｓ５０１では音声の入力を行なう。話者認証技術では、セキュリティ向上等を目的として、認証処理対象者に認証用の音声を発音させるための手法が種々考案され、方式によって初期データの取得方法も異なるが、いずれも手法としては周知であるので概略だけ説明する。 The flow of each process of the initialization process and the voice recognition process is shown in FIGS. 11 and 15 to 18. In either case, the main part of the process consists of voice data processing responsible for acquisition and processing of voice data (S301 in the initial process and S402 in the voice authentication process). The audio data processing will be described in detail first with reference to FIG. In S501, voice is input. In the speaker authentication technology, various methods have been devised for causing the authentication target person to pronounce the voice for authentication for the purpose of improving security and the initial data acquisition method differs depending on the method, but both methods are well known. Therefore, only an outline will be described.

（１）文字（あるいは音（例えば母音））を一文字だけ発声させる方法
発声させる文字を表示等により指定して発生させ、サンプリングを行なう。
（２）複数文字を組み合わせて逐次発声させる方法
基本的には（１）と同じ。発声の順序を表示等により誘導し、順次波形のサンプリングを行なう。実際の照合時には、発声順序を固定にしてもよいし、乱数を用いて発声順序を毎回変えるようにしてもよい（後者の場合、認証時に指定される文字の発声順序がランダムに変化されるので、固定順序で発声したものを録音しておいても役に立たなくできる利点がある。
（３）単語を発声させる方法
使用する単語は１種類のみとしてもよいし（この場合（２）と同じになる）、複数種類の中から選択させる方法もある。後者の場合（以下、図１を参照のこと）、照合先となる単語の選択リストを画面１０８に表示し、入力部３０５にて選択を行った後、選んだ単語の発声・サンプリングを行なう。また、文字数（あるいは録音時間）を指定して、ユーザの好みの単語を入力部３０５にて任意に入力し、発声・サンプリングする方法もある。この場合、その単語がパスワード代わりになることが明らかである。また、より手の込んだ方法としては、正規ユーザにしか回答がわからない質問を音声出力させ、これに対応した登録済みの回答を音声入力させる方法もある。この場合、初期化処理では、出力すべき質問内容と、それに対する回答内容の、各入力ないし選択が必要となる。
（４）文を入力する方法
基本的には（３）と同じであり、質問／回答形式を採用する場合は、複数の質問と回答とを対話形式で入力する方法もありえる。 (1) Method of uttering only one character (or sound (for example, vowel)) A character to be uttered is designated and generated by display or the like, and sampling is performed.
(2) Method of sequentially uttering by combining multiple characters Basically the same as (1). The order of utterance is guided by display or the like, and waveform sampling is performed sequentially. During actual verification, the utterance order may be fixed, or the utterance order may be changed every time using a random number (in the latter case, the utterance order of characters designated at the time of authentication is changed randomly) There is an advantage that even if you record what you say in a fixed order, you can use it.
(3) Method of uttering a word There may be only one type of word to be used (in this case, it is the same as (2)), or there is a method of selecting from a plurality of types. In the latter case (refer to FIG. 1 below), a selection list of words to be collated is displayed on the screen 108 and selected by the input unit 305, and then the selected word is uttered and sampled. In addition, there is a method in which the number of characters (or recording time) is designated, a user's favorite word is arbitrarily input by the input unit 305, and utterance / sampling is performed. In this case, it is clear that the word is substituted for the password. Further, as a more elaborate method, there is a method in which a question that only a legitimate user can understand is voice-output and a registered answer corresponding to this is voice-inputted. In this case, in the initialization process, it is necessary to input or select the question contents to be output and the answer contents to the question contents.
(4) Method of inputting a sentence This is basically the same as (3), and when a question / answer format is adopted, there may be a method of inputting a plurality of questions and answers in an interactive format.

骨伝導音と気導音とで比較した場合、骨伝導音の方が声帯に近い分、母音などの声帯振動に由来した音波成分が気導音より強調される傾向にある。また、摩擦音や破裂音は、舌や唇などの声帯以外の作音要素が関与するため、気導音のほうがより強調されて現れる。従って、骨伝導音と気導音との波形ないしスペクトル上の差（特に差分スペクトルなど）に基づいて認証を行なう場合は、認証対象となる音声波形データ（骨伝導音及び気導音）として、母音、摩擦音及び破裂音を含むもの（好ましくは、最も多く含まれる音がこれらの音種のいずれかとなる音列：例えば、「さしすせそ」、「しししんちゅうのむし」、「あいうえお」など：もちろん、「さ行」、「た行」あるいは「あ行」の単音でも可）を指定することが望ましいといえる。また、同じ母音でも、調音に舌の前部を使う「い、え」などの音は気導音でより明瞭であり、逆に舌後部を使う「う、お」などの音は骨伝導音で明瞭であるから、「いえ（家）」「こうぼ（酵母）」など、前者又は後者のどちらかを主体に含む音列を指定することも効果的である。 When comparing the bone conduction sound and the air conduction sound, the sound component derived from the vocal cord vibration such as a vowel tends to be emphasized from the air conduction sound because the bone conduction sound is closer to the vocal cord. In addition, since the frictional sound and the plosive sound are associated with sound-generating elements other than the vocal cords such as the tongue and lips, the air conduction sound appears more emphasized. Therefore, when performing authentication based on the difference in the waveform or spectrum between bone conduction sound and air conduction sound (especially the difference spectrum), as speech waveform data (bone conduction sound and air conduction sound) to be authenticated, Sounds that include vowels, friction sounds, and plosives (preferably, sound strings in which the most abundant sounds are one of these sound types: for example, “Sashisuseso”, “Sushishinchu no Mushi”, “Aiueo”, etc .: Of course, it may be desirable to specify “sa line”, “ta line” or “a line”. In addition, even in the same vowel, sounds such as “i, e” that use the front part of the tongue for articulation are air conduction sounds, and conversely, sounds such as “u, o” that use the back part of the tongue are bone conduction sounds. Therefore, it is also effective to specify a sound string mainly including either the former or the latter, such as “No (house)” and “Koubo (yeast)”.

図１２は、音声データ処理の流れを示すフローチャートである。Ｓ５０１では、指定された音声の入力を送話器３０４と骨伝導マイク３４０の双方を用いて入力する。Ｓ５０２では、そのサンプリングを行なう（図４の気導音サンプリングモジュール２０２及び骨伝導音サンプリングモジュール２０３の実行により実施される）。ユーザは要求された音列を１回だけ発するので、サンプリングは時系列的には同時に行われなくてはならない。この場合、単一のＣＰＵを用いる場合は、図１３に示すような時分割による並列処理として実行する。具体的には、Ｓ１０１においてサンプリングカウンタをリセットし、以下、サンプリングカウンタをインクリメントしながら、気導音用のマイク入力ポートのリード（Ｓ１０２）とリード値のメモリ（ＲＡＭ３１３）への書き込み（Ｓ１０３）、骨伝導マイクの入力ポートのリード（Ｓ１０４）とリード値のメモリへの書き込み（Ｓ１０５）を交互に繰り返えす。サンプリングするべき音声データの長さに応じて総サンプリング時間（サンプリングカウンタの値で代用できるが、他のタイマー手段を用いてもよい）を決めておき、タイムアップによりサンプリングを打ち切るようにしておくと（Ｓ１０７）、骨伝導音音声波形と気導音音声波形とを同時サンプリングしない限りは、両音声のデータを正常に取得することは不可能となり、例えばテープレコーダ等を用いた順次音声入力等による誤魔化し等を効果的に防止することができる。 FIG. 12 is a flowchart showing the flow of audio data processing. In step S <b> 501, the designated voice input is input using both the transmitter 304 and the bone conduction microphone 340. In S502, the sampling is performed (implemented by executing the air conduction sound sampling module 202 and the bone conduction sound sampling module 203 of FIG. 4). Since the user emits the requested sound sequence only once, sampling must be performed simultaneously in time series. In this case, when a single CPU is used, it is executed as parallel processing by time division as shown in FIG. Specifically, the sampling counter is reset in S101, and thereafter, while the sampling counter is incremented, reading of the microphone input port for air conduction sound (S102) and writing of the read value to the memory (RAM 313) (S103), Read of the input port of the bone conduction microphone (S104) and writing of the read value to the memory (S105) are repeated alternately. Depending on the length of the audio data to be sampled, the total sampling time (the value of the sampling counter can be used instead, but other timer means may be used), and the sampling is aborted when the time is up (S107) Unless both the bone conduction sound waveform and the air conduction sound waveform are sampled simultaneously, it is impossible to normally obtain the data of both sounds, for example, by sequential sound input using a tape recorder or the like. It is possible to effectively prevent deception and the like.

なお、単語や文による音声データの入力を行なう場合は、定められた内容（意味）の音声の入力が完了したか否かを周知の音声認識技術により判別し、完了していればサンプリングを打ち切るように構成することもできる。この場合、タイマー手段は必ずしも必要でなくなる。また、ハードウェアは幾分複雑化するが、気導音音声と骨伝導音音声のサンプリングを、個別の（つまり、２つの）ＣＰＵにて独立して行なうこともでき、この場合は、時分割処理を行なわなくとも両音声波形の並列サンプリングが可能となる。 In addition, when inputting speech data using words or sentences, it is determined whether or not the input of speech with a predetermined content (meaning) is completed by using a well-known speech recognition technology, and if completed, sampling is terminated. It can also be configured as follows. In this case, the timer means is not always necessary. Although the hardware is somewhat complicated, the sampling of the air conduction sound and the bone conduction sound can be performed independently by separate (ie, two) CPUs. Both audio waveforms can be sampled in parallel without processing.

図１２に戻り、上記のようにして気導音と骨伝導音との各音声波形のサンプリングを終了したら、Ｓ５０３にて、各音声が同時にサンプリングされたものであるかどうかをチェックする。チェック方法としては種々考えられるが、例えば、気導音と骨伝導音とが故意にずれたタイミングで入力されていれば、どちらかがサンプリング時間外にはみ出し、取得したデータには大きな空白期間が生ずるはずであるから、これを利用する方法がある。この場合、取得した気導音波形と骨伝導音波形の少なくともいずれかに、音声振幅が予め定められた下限値以下となる期間が一定以上継続しているか否かをチェックし、そのような期間が存在していれば同時性なしと判定する。Ｓ５０３にて同時性なしと判定された場合はＳ５１１に進んで処理を打ち切り、エラーないし警告出力を行なう。 Returning to FIG. 12, when the sampling of the sound waveforms of the air conduction sound and the bone conduction sound is completed as described above, it is checked in step S503 whether each sound is sampled simultaneously. There are various possible check methods. For example, if the air conduction sound and the bone conduction sound are input at a deliberately shifted timing, one of them will protrude outside the sampling time, and the acquired data will have a large blank period. There is a way to take advantage of this because it should happen. In this case, it is checked whether at least one of the acquired air conduction sound waveform and bone conduction sound waveform has a period during which the audio amplitude is equal to or lower than a predetermined lower limit value continuing for a certain period. If is present, it is determined that there is no simultaneity. If it is determined in S503 that there is no simultaneity, the process proceeds to S511, where the processing is terminated and an error or warning is output.

同時性を充足していたらＳ５０５及びＳ５０６へ進み、検出された気導音音声波形データと骨伝導音音声波形データをメモリに記憶・登録する。以下は、認証に用いる複合音声特徴情報の演算処理となる（複合音声特徴情報演算手段の機能が実現されている）。Ｓ５０７では複合音声特徴情報として、気導音音声波形と骨伝導音音声波形との位相差を演算する（気導音／骨伝導音位相差演算・照合判定モジュール２０４の実行により実施される）。図８に示すように、気導音音声波形と骨伝導音音声波形とを同一の音声を個別のマイクにより同時にサンプリングしたものであり、サンプリング開始タイミングを基準に波形を重ね合せたときの両波形の位相を基準重ね合わせ位相とする。２つの波形は、同一の音声に基づき共通の周波数成分を多く含むので、図９に示すように、両波形データの重ね合わせ位相を、基準重ね合わせ位相にて固有に存在していた位相差（つまり、求めるべき位相差）φが解消されるように相対的にシフトして差分波形を演算すれば、該差分波形の積分振幅（平均振幅）は、その重ね合わせ位相にて最小化される（図９の一番下を参照）。そこで、差分波形の積分振幅を演算しつつ両波形データの重ね合わせ位相を種々に変化させ、該積分振幅が最小化される重ね合わせ位相を見出せば、これを求めるべき両波形の位相差φとして得ることができる。 If simultaneity is satisfied, the process proceeds to S505 and S506, and the detected air conduction sound waveform data and bone conduction sound waveform data are stored and registered in the memory. The following is a calculation process of the composite voice feature information used for authentication (the function of the composite voice feature information calculation means is realized). In S507, the phase difference between the air conduction sound waveform and the bone conduction sound speech waveform is calculated as the composite sound feature information (implemented by executing the air conduction sound / bone conduction sound phase difference calculation / collation determination module 204). As shown in FIG. 8, both the air conduction sound waveform and the bone conduction sound sound waveform are obtained by simultaneously sampling the same sound by individual microphones, and both waveforms when the waveforms are superimposed on the basis of the sampling start timing. Is the reference superposition phase. Since the two waveforms include many common frequency components based on the same sound, as shown in FIG. 9, the overlapping phase of both waveform data is the phase difference (inherently present in the reference overlapping phase) ( That is, if the differential waveform is calculated by relatively shifting so as to eliminate the phase difference (φ to be obtained), the integrated amplitude (average amplitude) of the differential waveform is minimized by the superposition phase ( (See the bottom of FIG. 9). Therefore, if the superposition phase of both waveform data is changed variously while calculating the integral amplitude of the difference waveform and the superposition phase where the integral amplitude is minimized is found, this is obtained as the phase difference φ between both waveforms to be obtained. Obtainable.

なお、認証処理に用いる個人特徴情報として利用することを考慮すると、求めるべき位相差φに一義的に対応したパラメータが得られればこと足りるので、複合音声特徴情報としては、差分波形の積分振幅が最小化される位相差に限らず、以下のもので代用することも可能である。
(1)差分波形の積分振幅が最大化される位相差
(2)加算波形の積分振幅が最小化される位相差
(3)加算波形の積分振幅が最大化される位相差 In consideration of use as personal feature information used for authentication processing, it is only necessary to obtain a parameter that uniquely corresponds to the phase difference φ to be obtained. The phase difference is not limited to the following, and the following can be substituted.
(1) Phase difference that maximizes the integrated amplitude of the differential waveform
(2) Phase difference that minimizes the integrated amplitude of the sum waveform
(3) Phase difference that maximizes the integrated amplitude of the added waveform

以下、差分波形の積分振幅が最小化される位相差φを求める処理を例にとって、図１４のフローチャートにより説明する。Ｓ２０１では、重ね合わせ位相差Σｔ（波形は種々の正弦波波形の重ね合わせになるので、位相差の演算単位は角度ではなく時間とする）をリセットする。次いで、気導音音声波形と骨伝導音音声波形との一方を第一波形、他方を第二波形として、Ｓ２０２で第二波形の位相を予め定められた微小時間Δｔだけシフトし、第一波形は固定として、Ｓ２０３で差分波形を演算する。Ｓ２０４では、その差分波形の積分振幅Ａを演算する。積分振幅の演算方法は周知であるが、例えば次のようにして計算できる。まず、波形をｆ（ｔ）として、各サンプリングタイミングｔに対応するｆ（ｔ）の値を全て加算してサンプリング数Ｎで割り、波形中心線ｆ０を求める。次いで、各ｔの値につき｜ｆ（ｔ）−ｆ０｜を演算し、これを全てのｔについて加算してＮで割れば積分振幅が得られる。Ｓ２０５では、そのときのΣｔの値を位相差φとし、積分振幅Ａの値と対応付けて記憶する。 Hereinafter, the process of obtaining the phase difference φ that minimizes the integral amplitude of the differential waveform will be described with reference to the flowchart of FIG. In S201, the superposition phase difference Σt (the waveform is a superposition of various sinusoidal waveforms, so the calculation unit of the phase difference is not an angle but time) is reset. Next, one of the air conduction sound waveform and the bone conduction sound waveform is set as the first waveform, and the other is set as the second waveform, and the phase of the second waveform is shifted by a predetermined minute time Δt in S202 to obtain the first waveform. Is fixed, and the differential waveform is calculated in S203. In S204, the integral amplitude A of the difference waveform is calculated. The method of calculating the integral amplitude is well known, but can be calculated as follows, for example. First, assuming that the waveform is f (t), all the values of f (t) corresponding to each sampling timing t are added and divided by the number of samplings N to obtain the waveform center line f0. Then, | f (t) −f0 | is calculated for each value of t, and this is added for all t and divided by N to obtain an integrated amplitude. In S205, the value of Σt at that time is set as the phase difference φ and stored in association with the value of the integral amplitude A.

次いで、Ｓ２０６でΣｔをΔｔだけインクリメントし、Σｔが予め定められた最大値Σｔmaxに到達するまでＳ２０２〜Ｓ２０６の処理を繰り返す。認証用に指定された音声としてユーザが自然に発声できることを考慮すれば、音声サンプルの長さは例えば１秒以上確保することが望ましい。位相差を見出すのに必要な波形シフト量は、０．５〜２波長分もあれば十分なので、人の声の周波数が平均的には１〜２ｋＨｚであることを考えれば、Σｔは０．５〜２ｍｓ位に設定するのがよい。また、サンプリング周期Δｔは、Σｔの１／１０００〜１／１０程度とすることが望ましい。なお、第二波形のシフトの区間は、基準重ね合わせ位相差を原点として、正又は負の一方向にのみ区間設定して演算してもよいし、正負のそれぞれに区間設定して演算するようにしてもよい。 Next, in S206, Σt is incremented by Δt, and the processes in S202 to S206 are repeated until Σt reaches a predetermined maximum value Σtmax. Considering that the user can speak naturally as the voice designated for authentication, it is desirable to secure the length of the voice sample, for example, for 1 second or longer. The amount of waveform shift required to find the phase difference is sufficient for 0.5 to 2 wavelengths. Therefore, considering that the human voice frequency is on average 1 to 2 kHz, Σt is 0. It should be set to about 5 to 2 ms. The sampling period Δt is preferably about 1/1000 to 1/10 of Σt. Note that the second waveform shift interval may be calculated by setting the interval only in one direction, positive or negative, with the reference overlap phase difference as the origin, or by setting the interval in each of positive and negative. It may be.

以上の演算が終了すれば、Ｓ２０８に進み、記憶されている積分振幅Ａの最小値Ａ０を見出し、Ｓ２０９でそのＡ０に対応する位相差φを求めるべき位相差φ０として決定する。なお、骨伝導音と気導音との間には、図６に示すように、スペクトル上少なからぬ差異があり、互いに共通しない周波数成分が存在する（例えば、骨伝導音の場合、周波数の高い音域のスペクトル強度が欠落しがちとなる）。従って、上記位相差を演算する際には、共通成分の多い周波数域をフィルタリングにより抽出してから波形演算を行なう方が望ましい場合がある。以上で位相差演算の説明を終わる。 When the above calculation is completed, the process proceeds to S208, the stored minimum value A0 of the integrated amplitude A is found, and the phase difference φ0 corresponding to A0 is determined as the phase difference φ0 to be obtained in S209. As shown in FIG. 6, there is a considerable difference in spectrum between the bone conduction sound and the air conduction sound, and there are frequency components that are not common to each other (for example, in the case of bone conduction sound, the frequency is high). The spectral intensity of the range tends to be missing). Therefore, when calculating the phase difference, it may be desirable to perform waveform calculation after extracting a frequency region with many common components by filtering. This is the end of the description of the phase difference calculation.

図１２に戻り、Ｓ５０８及びＳ５０９では、気導音と骨伝導音との各波形の周波数スペクトルを演算し、結果を記憶する。この演算は、既に述べたごとく原波形に対し周知のフーリエ変換処理を施すことにより実施できる。ただし、話者認識においては、図５の上に示すような微細構造を含んだスペクトル波形よりも、下に示すようなスペクトル概形（主に、声の質を反映した情報である）の方が測定の再現性に優れ、かつ個人識別情報としても十分に有効であり、照合処理も容易であることが知られている。このスペクトル概形はスペクトル包絡とも称され、周知の種々の音声分析アルゴリズム（例えば、ノンパラメトリック分析法による場合は、短時間事故相関分析法、短時間スペクトル分析法、ケプストラム分析法、帯域フィルタバンク分析法あるいは零交差数分積法など、パラメトリック分析法による場合は、線形予測分析法、最尤スペクトル推定法、共分散法、ＰＡＲＣＯＲ分析法、ＬＳＰ分析法など）により抽出・演算が可能である。 Returning to FIG. 12, in S508 and S509, the frequency spectrum of each waveform of the air conduction sound and the bone conduction sound is calculated, and the result is stored. As described above, this calculation can be performed by performing a well-known Fourier transform process on the original waveform. However, in speaker recognition, the spectral outline shown below (mainly information that reflects voice quality) is used rather than the spectral waveform containing the fine structure shown in FIG. However, it is known that it has excellent measurement reproducibility, is sufficiently effective as personal identification information, and can be easily verified. This spectral outline is also referred to as spectral envelope, and various well-known speech analysis algorithms (for example, short-time accident correlation analysis method, short-time spectrum analysis method, cepstrum analysis method, band filter bank analysis in the case of non-parametric analysis method) In the case of using a parametric analysis method such as a method or a zero-crossing product method, extraction and calculation can be performed by a linear prediction analysis method, a maximum likelihood spectrum estimation method, a covariance method, a PARCOR analysis method, an LSP analysis method, or the like.

図１２に戻り、Ｓ５１０では、図６に示すごとく、上記のようにして得られた気導音と骨伝導音との周波数スペクトルの差分を演算し、差分スペクトルデータとして記憶する。以上の処理は、図４の気導音／骨伝導音差分スペクトル演算・照合判定モジュール２０５、波形スペクトル照合・判定モジュール２０６の実行により実施される。以上で、音声データ処理の説明を終わる。 Returning to FIG. 12, in S510, as shown in FIG. 6, the difference between the frequency spectra of the air conduction sound and the bone conduction sound obtained as described above is calculated and stored as difference spectrum data. The above processing is performed by executing the air conduction sound / bone conduction sound difference spectrum calculation / collation determination module 205 and the waveform spectrum comparison / determination module 206 of FIG. This is the end of the description of the audio data processing.

図１１に戻り、初期化処理の流れについて説明する。
Ｓ３０１の音声データ処理では、正規ユーザ（認証特定対象者）自身の声により音声入力を行い、既に説明した通りの方法で位相差、気導音ないし骨伝導音の周波数スペクトルないし差分スペクトルのデータを作成し、Ｓ３０２にて、これらを、この後の音声認証処理で使用するマスターデータ（標準音声特徴情報：標準位相差、標準周波数スペクトルあるいは標準差分スペクトル）としてＥＥＰＲＯＭ３２２（図４）に登録する。 Returning to FIG. 11, the flow of the initialization process will be described.
In the voice data processing of S301, voice input is performed by the voice of the authorized user (authentication specific target person), and the phase difference, the air conduction sound or the bone conduction sound frequency spectrum or the difference spectrum data is obtained by the method described above. In step S302, these are registered in the EEPROM 322 (FIG. 4) as master data (standard voice feature information: standard phase difference, standard frequency spectrum, or standard difference spectrum) used in the subsequent voice authentication process.

図１５は音声認識処理の一例である。Ｓ４０１では、ユーザは認証のための指定の音声を入力する。Ｓ４０２で前述の音声データ処理が実行され、位相差φが演算される。Ｓ４０３では、その位相差φをマスターデータとして記憶されている標準位相差φ０と比較する。ここでは、差分φ−φ０を演算している。Ｓ４０６では、位相差φと標準位相差φ０との偏差が許容範囲内であるか否かを調べ、許容範囲内であれば認証フラグを許可にセットし（Ｓ４０７）、範囲外であれば非許可にセットする（Ｓ４０８）。なお、標準位相差φ０をマスターとして登録するのに代え、標準位相差φ０を包含する許容位相差範囲（最大値φmaxとφminとで与えられる）を登録しておき、φが当該範囲に属しているか否かにより認証を行なうこともできる。 FIG. 15 shows an example of voice recognition processing. In S401, the user inputs a designated voice for authentication. In S402, the audio data processing described above is executed, and the phase difference φ is calculated. In S403, the phase difference φ is compared with the standard phase difference φ0 stored as master data. Here, the difference φ−φ0 is calculated. In S406, it is checked whether or not the deviation between the phase difference φ and the standard phase difference φ0 is within the allowable range. If the deviation is within the allowable range, the authentication flag is set to be permitted (S407). (S408). Instead of registering the standard phase difference φ0 as the master, an allowable phase difference range (given by the maximum values φmax and φmin) including the standard phase difference φ0 is registered, and φ belongs to the range. Authentication can also be performed depending on whether or not there is.

図１６は、位相差に代えて差分スペクトルを用いる音声認証処理の例である（図１５と共通のステップに同じステップ番号を付与し、説明を省略する）。Ｓ４０２で音声データ処理が実行され、Ｓ４１０で、図６に示すごとく、気導音と骨伝導音との差分スペクトルの演算結果を読み出し、Ｓ４１１にて差分スペクトルのマスターデータ（図４：符号２２３）と比較する。Ｓ４１２で両者が一致と判定されれば認証フラグを許可にセットし（Ｓ４１３）、範囲外であれば非許可にセットする（Ｓ４１４）。 FIG. 16 is an example of a voice authentication process using a difference spectrum instead of a phase difference (the same step numbers are assigned to the steps common with FIG. 15 and the description is omitted). In S402, voice data processing is executed. In S410, as shown in FIG. 6, the calculation result of the difference spectrum between the air conduction sound and the bone conduction sound is read, and in S411, the master data of the difference spectrum (FIG. 4: reference numeral 223). Compare with If it is determined in S412 that both match, the authentication flag is set to permit (S413), and if it is out of range, it is set to non-permitted (S414).

図６に示すように、気導音スペクトルと骨伝導音スペクトルとは、主要部分は共通しているが、特定の周波数帯ではスペクトル強度に顕著な差が見られる（例えば、高域側の成分は気導音スペクトルのほうが骨伝導音スペクトルよりも強く現れる）。そこで、該周波数帯での差分スペクトルの形状をマスターと比較することにより、一致・不一致の照合を行なうことができる。特に、気導音スペクトルと骨伝導音スペクトルとの一方に存在し、他方には存在しないスペクトル包絡のピーク（図６で「×」にて示すようなもの）であって、当該ピーク位置が認証すべき個人によって変動する場合、差分スペクトルにて該ピークを検出し、そのピーク位置（周波数）の照合により、精度の高い認証照合を簡便に行なうことができる。 As shown in FIG. 6, the air conduction sound spectrum and the bone conduction sound spectrum have the same main part, but a significant difference is observed in the spectrum intensity in a specific frequency band (for example, a high frequency component). The air conduction sound spectrum appears stronger than the bone conduction sound spectrum). Therefore, matching / mismatching matching can be performed by comparing the shape of the difference spectrum in the frequency band with that of the master. In particular, it is a spectrum envelope peak (as indicated by “x” in FIG. 6) that exists in one of the air conduction sound spectrum and the bone conduction sound spectrum and does not exist in the other, and the peak position is authenticated. When it varies depending on an individual to be detected, the peak can be detected in the difference spectrum, and verification of the peak position (frequency) can be easily performed with high accuracy authentication verification.

図１７は、骨伝導音と気導音との各スペクトルを個別にマスターと照合する音声認証処理の例である（図１５と共通のステップに同じステップ番号を付与し、説明を省略する）。Ｓ４０２で音声データ処理が実行され、気導音と骨伝導音との各周波数スペクトルの演算結果を読み出す（Ｓ４２０，Ｓ４２３）。これらは個別にマスターデータ（図４：符号２２１，２２２）と比較する。Ｓ４２２及びＳ４２５で、骨伝導音と気導音との両者にて一致と判定された場合にのみ、認証フラグを許可にセットし（Ｓ４２６）、範囲外であれば非許可にセットする（Ｓ４２７）。 FIG. 17 is an example of a voice authentication process in which each spectrum of the bone conduction sound and the air conduction sound is individually verified with the master (the same step numbers are assigned to the steps common to FIG. 15 and the description is omitted). Audio data processing is executed in S402, and calculation results of each frequency spectrum of the air conduction sound and the bone conduction sound are read out (S420, S423). These are individually compared with master data (FIG. 4: reference numerals 221, 222). Only in cases where it is determined in S422 and S425 that both the bone conduction sound and the air conduction sound match, the authentication flag is set to permit (S426), and if it is out of range, it is set to non-permitted (S427). .

気導音と骨伝導音とのいずれの周波数スペクトルも、図６に示すように、スペクトル包絡において、音声に応じて固有のピーク位置を生ずるので、このピークの個数と位置により、入力された音声（例えば単語や文字）が、マスターが示す音声と同じであるか否かを識別できる（つまり、音声認識）。また、音声の内容が同じであれば、ピークの位置や強度（あるいは、ピーク間の強度比）をマスターと比較し、その一致・不一致に応じて正規ユーザかそうでないかを認証できる（つまり、話者認識）。 As shown in FIG. 6, each frequency spectrum of the air conduction sound and the bone conduction sound generates a unique peak position in accordance with the sound in the spectrum envelope. Therefore, the input sound is determined according to the number and position of the peaks. It is possible to identify whether (for example, a word or a character) is the same as the voice indicated by the master (that is, voice recognition). If the content of the voice is the same, the peak position and intensity (or intensity ratio between peaks) can be compared with the master, and it can be authenticated whether it is a legitimate user or not according to the match / mismatch (that is, Speaker recognition).

また、図１８の音声認証処理は、図１５の位相差による認証処理（第二の認証処理：Ｓ４０１〜Ｓ４０６）と、図１７のスペクトル照合による認証処理（第一の認証処理：Ｓ４２０〜Ｓ４２２）とを組み合わせ、双方において一致と判断された場合のみ、認証フラグを許可にセットし（Ｓ４２６）、範囲外であれば非許可にセットする（Ｓ４２７）。スペクトル照合では、気導音のみを用いているが、骨伝導音を用いてもよいし、両方を用いてもよい。しかし、位相差の演算はスペクトル演算に比べると簡単であり、スペクトル照合を気導音と骨伝導音との一方のみとして（他方については、スペクトル演算自体を省略する）、位相差による認証を補助的に用いると、処理の軽量化と認証精度の向上とを同時に図ることができる。 Further, the voice authentication process in FIG. 18 includes an authentication process based on the phase difference in FIG. 15 (second authentication process: S401 to S406) and an authentication process based on spectrum matching in FIG. 17 (first authentication process: S420 to S422). The authentication flag is set to permit only when it is determined that both match (S426), and when it is out of the range, it is set to non-permit (S427). In spectral matching, only air conduction sound is used, but bone conduction sound may be used, or both may be used. However, the calculation of the phase difference is simpler than the spectrum calculation, and spectrum verification is performed only for one of the air conduction sound and the bone conduction sound (for the other, the spectrum calculation itself is omitted), and authentication by the phase difference is assisted. If it is used, it is possible to simultaneously reduce the processing weight and improve the authentication accuracy.

なお、上記の実施形態では認証必要なデータ取得と、そのデータを用いた認証処理を全て携帯電話（上位概念は認証用端末）の内部で完結するようにしていたが、認証処理の前部又は一部を携帯電話外の装置に担わせることも可能である。例えば、携帯電話においては音声波形データの取得のみを行い、その波形データを直接又はスペクトル等への加工後に、通信により他のコンピュータで構成された認証用データ処理装置へ転送する（この場合、照合用のマスターデータは認証用データ処理装置へ事前に転送しておく必要がある）。認証用データ処理装置では、転送されてくるデータを受け取り、既に説明したのと同様の方法により照合による認証処理を行い、その結果（認証用フラグと同じ形式のデータ内容でよい）を携帯電話へ返す。携帯電話は、受けた結果の内容に応じて、既に説明したロック解除（利用許可）、ないしロック非解除（利用不許可）の処理を行う。 In the above embodiment, the acquisition of data required for authentication and the authentication process using the data are all completed within the mobile phone (the upper concept is an authentication terminal). It is also possible to place a part on a device outside the mobile phone. For example, in a mobile phone, only acquisition of voice waveform data is performed, and the waveform data is directly or after being processed into a spectrum or the like, transferred to an authentication data processing apparatus configured by another computer by communication (in this case, verification Master data must be transferred to the authentication data processing device in advance). The authentication data processing device receives the transferred data, performs authentication processing by collation in the same manner as described above, and sends the result (data content in the same format as the authentication flag) to the mobile phone. return. The mobile phone performs the unlocking (use permission) or unlocking release (use disapproval) process already described according to the contents of the received result.

図２においては、認証用データ処理装置はインターネット等の通信ネットワーク３５１に接続された認証ホストコンピュータ３５２であり、携帯電話１は、通信接続回路３２３による電波通信により、無線基地局３５０を介して認証ホストコンピュータ３５２に接続される。なお、認証ホストコンピュータ３５２とは無線ＬＡＮやBlue Toothなどの近距離無線通信網を介して接続するようにしてもよいし、コネクタ及びケーブルを介して有線接続することも可能である。 In FIG. 2, the authentication data processing apparatus is an authentication host computer 352 connected to a communication network 351 such as the Internet, and the mobile phone 1 is authenticated via a radio base station 350 by radio wave communication by a communication connection circuit 323. A host computer 352 is connected. Note that the authentication host computer 352 may be connected via a short-range wireless communication network such as a wireless LAN or Blue Tooth, or may be connected via a connector and a cable.

さらに、以上の実施形態においては、携帯電話への適用を具体例として引きながら説明を行ったが、本発明の個人認証システムは携帯電話に限られるものではない。例えば、図３に示すように、建物の入り口や、同一建物（あるいは敷地）内の高セキュリティゾーンへの進入ゲートなどに設けられるインターホン形式の個人認証システム１００に適用することも可能である。この例では、気導音用のマイク３０４がインターホンの本体に設けられ、カールコード３３９にて本体に接続されたハンドユニット１０１側に、骨伝導マイク３４０と受話器となる骨伝導スピーカ３０３を設けている。ハンドユニット１０１を顎骨などに当てて発声することで、既に説明したのと全く同じ流れにより認証処理を行なうことができる（なお、気導音用のマイク３０４をハンドユニット側に設けてもよい。なお、電気的な構成は図２とほぼ同じであるが、通信に関与する部分（例えば、通信接続回路３２３など）は当然に省略できる。 Furthermore, in the above embodiment, the description has been given with reference to application to a mobile phone as a specific example, but the personal authentication system of the present invention is not limited to a mobile phone. For example, as shown in FIG. 3, the present invention can also be applied to an intercom personal authentication system 100 provided at the entrance of a building or an entrance gate to a high security zone in the same building (or site). In this example, a microphone 304 for air conduction sound is provided in the main body of the interphone, and a bone conduction microphone 340 and a bone conduction speaker 303 serving as a receiver are provided on the hand unit 101 side connected to the main body by a curl cord 339. Yes. By uttering the hand unit 101 against the jawbone or the like, the authentication process can be performed in exactly the same flow as already described (note that a microphone 304 for air conduction sound may be provided on the hand unit side). Although the electrical configuration is almost the same as that in FIG. 2, a part related to communication (for example, the communication connection circuit 323) can be omitted as a matter of course.

本発明の個人認証システムを搭載した携帯電話の一例を示す外観斜視図。1 is an external perspective view showing an example of a mobile phone equipped with a personal authentication system of the present invention. 図１の個人認証システムを搭載した携帯電話の電気的構成の一例を示すブロック図。The block diagram which shows an example of the electrical constitution of the mobile telephone carrying the personal authentication system of FIG. 本発明の個人認証システムをインターホンに適用した例を示す外観斜視図。The external appearance perspective view which shows the example which applied the personal authentication system of this invention to the intercom. 図２のＲＯＭ及びＥＥＰＲＯＭの記憶内容を示す模式図。FIG. 3 is a schematic diagram showing storage contents of a ROM and an EEPROM in FIG. 2. 音声スペクトルとスペクトル包絡の例を示すグラフ。The graph which shows the example of an audio | voice spectrum and a spectrum envelope. 気導音と骨伝導音との個別の周波数スペクトルと、それらの差分スペクトルとの概念図。The conceptual diagram of the individual frequency spectrum of an air conduction sound and a bone conduction sound, and those difference spectra. 音声波形にフィルタリングを施して用いる概念を示す模式波形図。The schematic waveform diagram which shows the concept used after filtering an audio | voice waveform. 気導音と骨伝導音との位相差を説明する模式波形図。The schematic waveform diagram explaining the phase difference of an air conduction sound and a bone conduction sound. 気導音と骨伝導音との位相差を波形差分により求める方法の説明図。Explanatory drawing of the method of calculating | requiring the phase difference of an air conduction sound and a bone conduction sound by a waveform difference. 認証主処理の流れを示すフローチャート。The flowchart which shows the flow of an authentication main process. 初期化処理の流れを示すフローチャート。The flowchart which shows the flow of an initialization process. 音声データ処理の流れを示すフローチャート。The flowchart which shows the flow of an audio | voice data process. 気導音／骨伝導音波形サンプリング処理の流れを示すフローチャート。The flowchart which shows the flow of an air conduction sound / bone conduction sound wave form sampling process. 気導音／骨伝導音位相差演算処理の流れを示すフローチャート。The flowchart which shows the flow of an air conduction sound / bone conduction sound phase difference calculation process. 音声認識処理の第一例の流れを示すフローチャート。The flowchart which shows the flow of the 1st example of a speech recognition process. 同じく第二例の流れを示すフローチャート。The flowchart which similarly shows the flow of a 2nd example. 同じく第三例の流れを示すフローチャート。The flowchart which similarly shows the flow of a 3rd example. 同じく第四例の流れを示すフローチャート。The flowchart which similarly shows the flow of a 4th example.

Explanation of symbols

１携帯電話（個人認証システム）
１００インターホン（個人認証システム）
３０４送話器（マイク：気導音検出部）
３４０骨伝導マイク（骨伝導音検出部）
３１２ＣＰＵ（認証処理手段、照合手段、複合音声特徴情報演算手段）
３１３ＲＡＭ（骨伝導音声情報記憶部、気導音声情報記憶部）
３２２ＥＥＰＲＯＭ（標準音声特徴情報記憶部） 1 Mobile phone (personal authentication system)
100 intercom (personal authentication system)
304 Mic (microphone: air conduction sound detector)
340 Bone conduction microphone (bone conduction sound detector)
312 CPU (authentication processing means, verification means, composite voice feature information calculation means)
313 RAM (bone conduction voice information storage unit, air conduction voice information storage unit)
322 EEPROM (standard audio feature information storage unit)

Claims

A personal authentication system for authenticating a person who is an authentication process based on a voice uttered by the person who is the authentication process,
A bone conduction sound detector for detecting the voice information of the person to be authenticated by bone conduction sound;
A bone conduction speech information storage unit for storing the detected bone conduction speech information;
An air conduction sound detecting unit for detecting voice information of the person to be authenticated by air conduction sound;
An air conduction sound information storage unit for storing detected air conduction sound information;
Authentication processing means for performing authentication processing based on both the bone conduction voice information and the air conduction voice information;
A personal authentication system characterized by comprising:

The bone conduction sound information and the air conduction sound information are generated by simultaneously detecting the sound uttered by the person to be authenticated by the bone conduction sound detection unit and the air conduction sound detection unit. The personal authentication system according to claim 1.

The authentication processing means includes a standard voice feature information storage unit storing standard voice feature information as a collation destination of collation source voice feature information based on both the bone conduction voice information and the air conduction voice information, The personal authentication system according to claim 2, further comprising collation means for collating the original voice feature information with the standard voice feature information.

The personal authentication system according to claim 3, wherein the standard voice feature information is created by detecting a standard voice of a person to be authenticated by the bone conduction sound detection unit and the air conduction sound detection unit.

The voice feature information includes a frequency spectrum of the bone conduction sound and a frequency spectrum of the air conduction sound, and the collating means uses the frequency spectrum as the bone conduction sound included in the standard voice feature information. 5. The personal authentication system according to claim 4, wherein the personal authentication system is configured to collate with each standard frequency spectrum with the air conduction sound and to accept and authenticate when a collation match result is obtained in both.

Composite sound for calculating composite sound feature information that can be calculated only when both the bone conduction sound waveform detected by the bone conduction sound detection unit and the air conduction sound waveform detected by the air conduction sound detection unit are used. The personal authentication system according to claim 1, further comprising a feature information calculation unit, wherein the authentication processing unit performs the authentication process based on the composite voice feature information.

The personal authentication system according to claim 6, wherein the composite voice feature information calculation means calculates a phase difference between the air conduction sound waveform and the bone conduction sound waveform as the composite sound feature information.

The phase difference between the air conduction sound waveform and the bone conduction sound waveform peculiar to the authentication identification target specified in advance is obtained as a standard phase difference, and the authentication processing means determines that the calculated phase difference is the standard difference. The personal authentication system according to claim 7, wherein the authentication processing is performed based on whether or not the phase difference matches.

The authentication processing means includes a first authentication process for comparing at least one of a frequency spectrum of the bone conduction sound and a frequency spectrum of the air conduction sound with a standard frequency spectrum, and the composite voice feature information. The personal authentication system according to any one of claims 6 to 8, wherein the personal authentication system is implemented in combination with a second authentication process based on the above.