JPH09305196A

JPH09305196A - Speech input device

Info

Publication number: JPH09305196A
Application number: JP8115898A
Authority: JP
Inventors: Kenichi Hirayama; 健一平山
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1996-05-10
Filing date: 1996-05-10
Publication date: 1997-11-28

Abstract

PROBLEM TO BE SOLVED: To provide a speech input device to input a speech so that it is not heard by a third party in such a situation as the third party may hear it at an automatic cash dispenser, etc. SOLUTION: A video camera 11 reads a pattern of the eyes 2a of an utterer 2. An iris code generation part 12 extracts features of an iris pattern and generates iris data. The collation part 13 collates the iris data of the registered utterer in registration data base 20 to identify the utterer 2. A musical sound signal generation part 30 retrieves the speech data of the utterer 2 from the registered data base 20 and generates a musical sound signal to counteract the speech of the utterer 2 based on the speech data, to output it to a third party 3 from an acoustic output part 40. Thus, it becomes difficult for the third party 3 to recognize the speech of the utterer 2 by hearing. On the other hand, the speech of the utterer 2 is inputted to an acoustic input part 50 with the musical sound from the acoustic output part 40 synthesized, but the musical sound component is suppressed by a musical sound signal suppression part 60, and has no influence on the speech recognition.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、自動入出金装置等
に用いられる音声入力装置、特に第三者に聞かれる可能
性のある状況下において、その第三者に聞かれないよう
に音声を入力するための音声入力装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice input device used in an automatic depositing / dispensing device, etc., and particularly, in a situation where a third party may hear the voice, the voice is not heard by the third party. The present invention relates to a voice input device for inputting.

【０００２】[0002]

【従来の技術】従来、音声入力装置は、キーボード等か
らの手入力が困難な状況での入力手段として、例えば荷
物の仕分け作業や、製品の目視検査等で使用されてい
る。例えば、荷物の仕分け作業では、ベルトコンベアで
運ばれてくる荷物に添付された送り状の宛先を見て、送
り先の地域名を音声で入力する。音声入力装置は、入力
された音声を認識して該当する地域コードを仕分け装置
に出力している。また、もともと手入力が不可能な電話
機を使用し、電話回線を介して音声による預金残高の問
い合わせに答えるようなシステムも実用化されている。
更に、音声入力技術の進展に伴い、音声入力は上記のよ
うな特殊な場合だけでなく、オフィスコンピュータにお
ける帳票入力やワードプロセッサへの入力等、従来のキ
ーボード等からの手入力に代わるデータ入力方法として
応用されてきている。例えば、現在、銀行等で使用され
ている自動入出金装置は、口座番号の入力にカードを使
用し、本人確認のために暗証番号をキーボード等から入
力するような手入力による入力方法が採用されている。
この方法は、機械操作に慣れていない人にとっては、入
力操作の煩わしさや、複雑な入力を必要とするような場
合には、スムーズに操作ができないというような問題点
がある。この様な問題点を解決するための方法として音
声入力が考えられる。その場合利用者は、不特定の人の
出入りが可能な銀行に設置された自動入出金装置のガイ
ダンスに従って、氏名や金額等を音声で入力することに
なる。2. Description of the Related Art Conventionally, a voice input device has been used as input means in a situation where it is difficult to manually input from a keyboard or the like, for example, for sorting work of packages, visual inspection of products and the like. For example, in the sorting work of the parcels, the destination name of the invoice attached to the parcels conveyed by the belt conveyor is viewed, and the area name of the destination is input by voice. The voice input device recognizes the input voice and outputs the corresponding area code to the sorting device. In addition, a system has been put into practical use, which uses a telephone that cannot be manually input originally and answers voice inquiries about the deposit balance via a telephone line.
Furthermore, with the progress of voice input technology, voice input is not limited to the special cases described above, but as a data input method that replaces the conventional manual input from a keyboard such as form input in an office computer or input to a word processor. It has been applied. For example, the automatic deposit / withdrawal device currently used in banks and the like uses a card for inputting an account number, and a manual input method of inputting a personal identification number from a keyboard or the like is used for identification. ing.
This method has a problem that a person who is not accustomed to the machine operation is troublesome in the input operation and cannot operate smoothly when complicated input is required. Voice input can be considered as a method for solving such a problem. In that case, the user will input his / her name, amount of money, etc. by voice in accordance with the guidance of the automatic deposit / withdrawal device installed in a bank that allows unspecified persons to enter / exit.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
音声入力装置は、騒音のある場所で発声された音声や、
限られた周波数帯域の伝送路を介して伝送された音声
を、いかに明瞭な音声として入力するかという点に主眼
をおいて開発されており、入力している音声が他人に聞
かれるというプライバシー保護の面での考慮はほとんど
されていなかった。そのため、例えば、銀行に設置され
た自動入出金装置で音声入力を使用する場合、他人に発
声内容を聞かれる場合が考えられ、プライバシーの保護
の面での問題が生ずる。これを解決するために、単純に
発声者の周囲に防音壁を巡らす等の防音設備を設けるこ
とはコストの面で、また第三者に聞き取り難くするため
に発声者の周囲で雑音を発生することは音声認識率の低
下の面で、それぞれ問題があった。本発明は、前記従来
技術が持っていた課題として、プライバシーの保護につ
いて解決した音声入力装置を提供するものである。However, in the conventional voice input device, the voice uttered in a noisy place,
It has been developed with a focus on how to input clear voice as voice transmitted via a transmission path of a limited frequency band, and protects the privacy that the input voice is heard by others. There was little consideration in terms of. Therefore, for example, when using voice input with an automatic depositing / dispensing device installed in a bank, it is possible that another person may ask for the utterance content, which causes a problem in terms of privacy protection. In order to solve this, it is costly to simply provide soundproof equipment around the speaker such as a soundproof wall, and noise is generated around the speaker to make it difficult for a third party to hear. That is, there was a problem in that the voice recognition rate was lowered. The present invention provides a voice input device which solves the problem of privacy protection as a problem that the above-mentioned conventional technique has.

【０００４】[0004]

【課題を解決するための手段】前記課題を解決するた
め、第１の発明は、音声入力装置において、所定の位置
で発声者の発する音声と、その所定の位置の周囲からの
楽音を含めた音とを入力して電気信号に変換する音響入
力手段と、音声の母音に含まれるホルマント周波数成分
を主体とし、前記発声者が発する音声に対して打ち消し
可能な楽音信号を発生する楽音信号発生手段と、前記所
定の位置の近傍で、かつ前記音響入力手段からずれた方
向に音の放射範囲が設定され、前記楽音信号を音に変換
して前記発声者の発する音声の平均レベルよりも大きな
平均パワーの前記楽音を、その音の放射範囲に出力する
楽音出力手段と、前記音響入力手段の出力信号から前記
楽音信号発生手段で発生された楽音信号を減じて、該音
響入力手段の出力信号中の該楽音信号のパワーを音声認
識用の閾値以下に抑制する楽音信号抑制手段と、前記楽
音信号抑制手段の出力信号のパワーが前記閾値を超えた
ときに音声信号として検出する音声検出手段とを、備え
ている。In order to solve the above-mentioned problems, a first aspect of the present invention includes, in a voice input device, a voice produced by a speaker at a predetermined position and a musical sound from the surroundings of the predetermined position. A sound input means for inputting a sound and converting it into an electric signal, and a musical tone signal generating means for generating a cancelable musical tone signal for a voice uttered by the utterer, mainly composed of a formant frequency component included in a vowel of the voice. And a sound emission range is set in the vicinity of the predetermined position and in a direction deviated from the sound input means, and the musical tone signal is converted into a sound and an average greater than the average level of the sound produced by the speaker. Musical sound output means for outputting the musical sound of power to the emission range of the sound, and subtracting the musical sound signal generated by the musical sound signal generation means from the output signal of the acoustic input means, and outputting the output signal of the acoustic input means. A musical tone signal suppressing means for suppressing the power of the musical tone signal below a threshold for voice recognition, and a voice detecting means for detecting as a voice signal when the power of the output signal of the musical tone signal suppressing means exceeds the threshold. Is equipped with.

【０００５】第１の発明によれば、以上のように音声入
力装置を構成したので、次のような作用が行われる。楽
音信号発声手段から音声の母音に含まれるホルマント周
波数成分を主体として発声者の発する音声を打ち消すよ
うな楽音信号が発生される。この楽音信号は、楽音出力
手段により音に変換されて、発声者の発する音声よりも
大きなパワーで出力される。これにより、発声者の近傍
にいる第三者には、発声者の声を聞き取ることが困難に
なる。一方、楽音出力手段から出力される楽音は音響入
力手段からずれた方向に放射されるように設定されてい
るので、音響入力手段に入力される楽音は、発声者の発
する音声よりも低レベルの音になる。発声者の音声に低
レベルの楽音が重畳された音は、音響入力手段で電気信
号に変換される。この電気信号に含まれる楽音信号成分
は、楽音信号抑制手段において音声認識用の閾値以下に
抑制される。楽音信号抑制手段で得られた出力信号は、
前記閾値を超えたときに音声検出手段によって、音声信
号として検出される。第２の発明は、音声入力装置にお
いて、音声入力を行う発声者を識別するために、図２に
示すような虹彩（アイリス）を利用した発声者識別方法
を採用している。According to the first aspect of the invention, since the voice input device is constructed as described above, the following operation is performed. The tone signal voicing means generates a tone signal such that the formant frequency component contained in the vowel of the voice is mainly used to cancel the voice uttered by the utterer. This musical sound signal is converted into a sound by the musical sound output means and is output with a power larger than that of the voice produced by the speaker. This makes it difficult for a third party near the speaker to hear the voice of the speaker. On the other hand, since the musical sound output from the musical sound output means is set so as to be emitted in a direction deviated from the acoustic input means, the musical sound input to the acoustic input means has a lower level than the voice produced by the speaker. Becomes a sound. The sound in which a low-level musical sound is superimposed on the voice of the speaker is converted into an electric signal by the sound input means. The musical tone signal component included in this electric signal is suppressed below the threshold for voice recognition by the musical tone signal suppressing means. The output signal obtained by the tone signal suppression means is
When the threshold value is exceeded, the voice detection means detects the voice signal. A second invention employs a speaker identification method using an iris as shown in FIG. 2 in order to identify a speaker who inputs a voice in a voice input device.

【０００６】図２は、本発明の原理を説明するための虹
彩の構造図である。目Ａの虹彩Ｂは、黒目の瞳孔Ｃの周
囲部分で瞳孔Ｃの開き具合を調節する筋肉であり、その
パターンは、皺裂Ｄ、虹彩巻縮輪Ｅ、小窩Ｆ等から構成
されている。人の目Ａは、妊娠６か月までにほぼ形成さ
れ、その時点で瞳の部分に孔が開き、その開口部（瞳）
から皺裂Ｄが外側に向かってカオス状に発生することが
知られている。この皺裂Ｄの成長はおよそ生後２年目ま
でに止まり、それ以降一生涯にわたって変わらずその人
固有のパターンを形成する。この様な虹彩Ｂの特徴に着
目して個人の識別を行う虹彩識別方法は、指紋照合、サ
イン照合、暗証番号等の他の識別方法に比べて次のよう
な特長がある。（ア）偽造が困難である。（イ）認識精度が高い。（ウ）非接触で認識可能である。（エ）年齢による変化がない。FIG. 2 is a structural diagram of the iris for explaining the principle of the present invention. The iris B of the eye A is a muscle that adjusts the opening degree of the pupil C in the peripheral portion of the pupil C of the black eye, and its pattern is composed of wrinkles D, an iris winding ring E, a fovea F, and the like. . The human eye A is almost formed by 6 months of pregnancy, at which point a hole opens in the pupil, and the opening (pupil)
It is known that the wrinkle D is chaotically generated outward. The growth of this cleft D stops by about the second year of life, and thereafter, it does not change for the rest of its life and forms a pattern peculiar to the person. The iris identification method for identifying an individual by paying attention to such characteristics of the iris B has the following features as compared with other identification methods such as fingerprint matching, signature matching, and personal identification number. (A) It is difficult to forge. (A) The recognition accuracy is high. (C) It can be recognized without contact. (D) There is no change with age.

【０００７】第２の発明では、第１の発明の音声入力装
置に、複数の登録発声者の音声における母音のホルマン
ト周波数成分に応じた音声データと、その各登録発声者
の虹彩Ｂのパターンに応じた虹彩データとが登録された
登録手段と、所定の位置において発声者の虹彩Ｂのパタ
ーンを検出して該虹彩Ｂの特徴を抽出し、その抽出結果
と前記登録手段に登録された虹彩データとを照合して、
該発声者が前記複数の登録発声者中のいずれの該当者で
あるかを識別する発声者識別手段とを追加している。更
に、楽音信号発生手段は、前記発声者識別手段で識別さ
れた前記発声者の前記音声データを前記登録手段から検
索し、その音声データに基づいて該発声者が発する音声
に対して打ち消し可能な楽音信号を発生するように変更
している。第３の発明では、第２の発明の音声入力装置
の楽音信号発生手段を、予め作成された複数の楽音デー
タを格納する記憶部と、周波数成分が前記発声者の前記
音声データに最も類似する前記楽音データを選択して前
記記憶部から読み出す選択部とで構成している。第２及
び第３の発明によれば、発声者識別手段によって、発声
者の虹彩Ｂのパターンが検出され、登録手段に登録され
た複数の登録発声者の虹彩データと照合され、発声者の
識別が行われる。更に、楽音信号発声手段で、識別され
た発声者の音声データが登録手段から検索され、その音
声データに基づいて発声者の音声を打ち消すような楽音
信号が発生される。そして、この様な楽音が出力された
状態で、第１の発明と同様の音声入力が行われる。According to a second aspect of the present invention, the voice input device according to the first aspect is provided with voice data corresponding to the formant frequency components of vowels in the voices of a plurality of registered speakers, and an iris B pattern of each registered speaker. Registration means in which the corresponding iris data is registered, and a pattern of the iris B of the speaker is detected at a predetermined position to extract the feature of the iris B, the extraction result and the iris data registered in the registration means. And match
A speaker identification unit for identifying which one of the plurality of registered speakers is the speaker is added. Further, the tone signal generating means is capable of retrieving the voice data of the speaker identified by the speaker identifying means from the registering means, and canceling the voice produced by the speaker based on the voice data. Changed to generate a tone signal. In the third invention, the tone signal generating means of the voice input device of the second invention is the most similar in frequency component to the voice data of the speaker, as compared with a storage unit for storing a plurality of tone data created in advance. And a selection unit for selecting the musical tone data and reading it from the storage unit. According to the second and third aspects of the invention, the speaker identification means detects the pattern of the iris B of the speaker and collates it with the iris data of a plurality of registered speakers registered in the registration means to identify the speaker. Is done. Further, the sound signal voicing means retrieves the voice data of the identified utterer from the registration means, and generates a tone signal for canceling the voice of the utterer based on the voice data. Then, with such a musical sound being output, the same voice input as in the first aspect of the invention is performed.

【０００８】[0008]

【発明の実施の形態】図１は、本発明の実施形態を示す
音声入力装置の構成図である。この音声入力装置は、例
えば銀行等の自動入出金装置１に組み込まれ、氏名や金
額等の入力に使用されるものであり、発声者識別手段１
０を有している。発声者識別手段１０は、所定の位置に
立つ発声者２の目２ａのイメージを捉えるビデオカメラ
１１と、ビデオカメラ１１で捉えられた目２ａのイメー
ジから虹彩のパターンの特徴を抽出してアイリスコード
ＩＲを生成するアイリスコード生成部１２と、生成され
たアイリスコードＩＲを基にデータベースを照合して発
声者２の識別を行う照合部１３とで構成されている。照
合部１３には、登録手段（例えば、登録データベース）
２０が接続されている。登録データベース２０は、音声
入力を行う者として登録された複数の登録発声者の、母
音のホルマント周波数成分に応じた音声データＶＤと、
その各登録発声者の虹彩データＩＤ等の個人情報が登録
されたデータベースである。1 is a block diagram of a voice input device showing an embodiment of the present invention. This voice input device is incorporated in an automatic depositing / dispensing device 1 of a bank, for example, and is used for inputting a name, an amount of money, etc.
It has 0. The speaker identification means 10 extracts a feature of the iris pattern from the video camera 11 which captures the image of the eye 2a of the speaker 2 standing at a predetermined position, and the feature of the iris pattern from the image of the eye 2a captured by the video camera 11. It is composed of an iris code generation unit 12 that generates an IR, and a collation unit 13 that collates a database based on the generated iris code IR to identify the speaker 2. The collation unit 13 has a registration means (for example, a registration database).
20 are connected. The registration database 20 includes voice data VD corresponding to the formant frequency components of vowels of a plurality of registered vocalists registered as voice input persons.
It is a database in which personal information such as the iris data ID of each registered speaker is registered.

【０００９】照合部１３の出力側は、楽音信号発生手段
（例えば、楽音信号発生部）３０に接続されている。楽
音信号発生部３０は、登録データベース２０から照合部
１３で識別された発声者２の音声データＶＤを検索し、
その音声データＶＤに基づいて発声者２の音声に対し
て、その音声を打ち消すようなマスキング効果のある楽
音信号ＭＳを発生するものである。楽音信号発生部３０
の出力側は、楽音出力手段（例えば、楽音出力部）４０
へ接続されている。楽音出力部４０は、楽音信号ＭＳを
楽音ＭＴに変換して出力するものであり、その楽音ＭＴ
の放射範囲４０ａは、例えば、発声者２の後方または両
側等の近傍で順番を待つ第三者３に向けられている。ま
た、出力される楽音ＭＴの大きさは、発声者２の音声よ
りも大きなパワーとなるように設定されている。The output side of the collating unit 13 is connected to a musical tone signal generating means (for example, a musical tone signal generating unit) 30. The tone signal generator 30 searches the registration database 20 for the voice data VD of the speaker 2 identified by the collator 13,
Based on the voice data VD, a tone signal MS having a masking effect of canceling the voice of the speaker 2 is generated. Musical sound signal generator 30
The output side of is a musical sound output means (for example, a musical sound output section) 40.
Connected to The musical tone output section 40 converts the musical tone signal MS into a musical tone MT and outputs the musical tone MT.
The radiation range 40a is directed toward the third party 3 who waits for a turn, for example, behind or on both sides of the speaker 2. Further, the magnitude of the output musical sound MT is set to have a power larger than that of the voice of the speaker 2.

【００１０】一方、この音声入力装置は、発声者２の発
する音声を入力するための音響入力手段（例えば、音響
入力部）５０を有している。この音響入力部５０は、楽
音出力部４０の音の放射範囲４０ａからずれた位置に配
置されたマイクロホン５１を有しており、このマイクロ
ホン５１により所定の位置にいる発声者２の発する音声
を最大感度で入力するほか、回り込んで来る楽音ＭＴ等
を入力して、電気信号に変換するものである。音響入力
部５０の出力側は、楽音信号抑制手段（例えば、楽音信
号抑制部）６０に接続されている。この楽音信号抑制部
６０には、楽音信号発生部３０の出力側も接続されてい
る。楽音信号抑制部６０は、音響入力部５０の出力信号
中に含まれる楽音信号成分を音声認識に影響を与えない
ようなパワーに抑制し、楽音抑制信号ＭＲとして出力す
るものである。楽音信号抑制部６０の出力側は、音声検
出手段（例えば、音声認識部）７０に接続されている。
音声認識部７０は、楽音信号抑制部６０の出力信号のパ
ワーが閾値ＴＨを超えたときに音声信号であると判断し
更にその内容を認識するものである。On the other hand, this voice input device has a sound input means (for example, a sound input section) 50 for inputting a voice uttered by the speaker 2. The acoustic input unit 50 has a microphone 51 arranged at a position deviated from the sound emission range 40a of the musical sound output unit 40. The microphone 51 maximizes the sound produced by the speaker 2 who is in a predetermined position. In addition to inputting with sensitivity, the incoming musical sound MT or the like is input and converted into an electric signal. The output side of the sound input unit 50 is connected to a musical sound signal suppressing unit (for example, a musical sound signal suppressing unit) 60. The output side of the tone signal generator 30 is also connected to the tone signal suppressor 60. The musical tone signal suppressing unit 60 suppresses the musical tone signal component included in the output signal of the acoustic input unit 50 to a power that does not affect the voice recognition, and outputs the musical tone suppressing signal MR. The output side of the tone signal suppressing section 60 is connected to a voice detecting means (for example, a voice recognizing section) 70.
The voice recognition section 70 determines that the output signal of the tone signal suppression section 60 is a voice signal when the power of the output signal exceeds the threshold value TH, and further recognizes its content.

【００１１】図３は、図１の楽音信号発生部３０、楽音
出力部４０、音響入力部５０及び楽音信号抑制部６０の
詳細を示す構成図である。楽音信号発生部３０は、それ
ぞれ異なる周波数成分を主体として構成される楽音信号
ＭＳを楽音データＭＤとして格納する複数の記憶部（例
えば、メモリ）３１−１，３１−２，…，３１−ｍを有
している。各メモリ３１−１〜３１−ｍは、選択部３２
へ接続されている。選択部３２は、図１の照合部１３か
ら与えられる発声者２の識別情報を基に登録データベー
ス２０を検索し、発声者２の音声データＶＤに最も類似
する楽音データＭＤを選択するものである。また、選択
部３２は、選択した楽音データＭＤを該当するメモリ３
１−ｉ（ｉ＝１〜ｍ）から読み出して出力するものであ
る。選択部３２の出力側は、ディジタル／アナログ（以
下、Ｄ／Ａという）変換部３３へ接続されている。Ｄ／
Ａ変換部３３は、ディジタルデータで与えられた楽音デ
ータＭＤをアナログの楽音信号ＭＳに変換するものであ
る。Ｄ／Ａ変換部３３の出力側は、楽音出力部４０及び
楽音信号抑制部６０に接続されている。FIG. 3 is a block diagram showing the details of the tone signal generator 30, tone output unit 40, sound input unit 50 and tone signal suppressor 60 shown in FIG. The musical tone signal generation unit 30 includes a plurality of storage units (for example, memories) 31-1, 31-2, ..., 31-m that store musical tone signals MS mainly composed of different frequency components as musical tone data MD. Have Each of the memories 31-1 to 31-m includes a selection unit 32.
Connected to The selection unit 32 searches the registration database 20 based on the identification information of the speaker 2 given from the collation unit 13 of FIG. 1 and selects the musical sound data MD most similar to the voice data VD of the speaker 2. . Further, the selection unit 32 stores the selected musical sound data MD in the corresponding memory 3
1-i (i = 1 to m) is read and output. The output side of the selection unit 32 is connected to a digital / analog (hereinafter referred to as D / A) conversion unit 33. D /
The A converter 33 converts the musical sound data MD given as digital data into an analog musical sound signal MS. The output side of the D / A conversion unit 33 is connected to the musical sound output unit 40 and the musical sound signal suppression unit 60.

【００１２】楽音出力部４０は、Ｄ／Ａ変換部３３から
出力される楽音信号ＭＳを所要のパワーに増幅する増幅
器４１と、増幅器４１の出力を楽音ＭＴに変換して放射
するスピーカ４２とで構成されている。音響入力部５０
は、発声者２の発する音声の方向に最大感度方向がほぼ
一致するように配置されたマイクロホン５１と、マイク
ロホン５１で得られた電気信号を所定のレベルに増幅す
る増幅器５２とで構成されている。楽音信号抑制部６０
は、楽音信号発声部３０の出力側に共通に接続された複
数の遅延回路６１−１，６１−２，…，６１−ｎを有し
ている。遅延回路６１−１〜６１−ｎは、スピーカ４２
から出力された楽音ＭＴが空間を通ってマイクロホン５
１へ到達する時間に相当する遅延を発生させる回路であ
る。これらの遅延回路６１−１〜６１−ｎは、楽音ＭＴ
が周囲の壁等における反射によって生ずる種々の伝搬経
路に対応するため、それぞれ異なる遅延時間が設定され
ている。各遅延回路６１−１〜６１−ｎの出力側は、そ
れぞれ減衰器６２−１，６２−２，…，６２−ｎに接続
されている。減衰器６２−１〜６２−ｎは、遅延回路６
１−１〜６１−ｎの出力信号のレベルを空間を通って音
響入力部５０で得られた楽音信号ＭＳのレベルに合わせ
るものである。減衰器６２−１〜６２−ｎの出力側は、
加算器６３の入力側に接続されている。加算器６３は、
入力される信号の瞬時値を加算して出力するものであ
り、その出力側は、減算器６４の一方の入力端子へ接続
されている。減算器６４の他方の入力端子には、音響入
力部５０の出力側が接続されている。減算器６４は、音
響入力部５０の出力信号から加算器６３の出力信号を差
し引いて、その結果を楽音抑制信号ＭＲとして出力する
ものである。この減算器６４の出力側は、図１の音声認
識部７０に接続されている。The tone output section 40 includes an amplifier 41 that amplifies the tone signal MS output from the D / A converter 33 to a required power, and a speaker 42 that converts the output of the amplifier 41 into a tone MT and emits the tone MT. It is configured. Sound input unit 50
Is composed of a microphone 51 arranged such that the direction of maximum sensitivity is substantially coincident with the direction of the voice emitted by the speaker 2, and an amplifier 52 for amplifying the electric signal obtained by the microphone 51 to a predetermined level. . Music signal suppression unit 60
Has a plurality of delay circuits 61-1, 61-2, ..., 61-n commonly connected to the output side of the tone signal vocalization section 30. The delay circuits 61-1 to 61-n include the speaker 42.
The musical sound MT output from the microphone 5 passes through the space
It is a circuit that generates a delay corresponding to the time to reach 1. These delay circuits 61-1 to 61-n are used for the musical tone MT.
Corresponds to various propagation paths caused by reflection on the surrounding walls, etc., so that different delay times are set. The output sides of the delay circuits 61-1 to 61-n are respectively connected to attenuators 62-1, 62-2, ..., 62-n. The attenuators 62-1 to 62-n include the delay circuit 6
The levels of the output signals 1-1 to 61-n are adjusted to the level of the musical tone signal MS obtained by the acoustic input unit 50 through the space. The output side of the attenuators 62-1 to 62-n is
It is connected to the input side of the adder 63. The adder 63 is
The instantaneous value of the input signal is added and output, and the output side thereof is connected to one input terminal of the subtractor 64. The output side of the acoustic input unit 50 is connected to the other input terminal of the subtractor 64. The subtractor 64 subtracts the output signal of the adder 63 from the output signal of the sound input unit 50 and outputs the result as the musical tone suppression signal MR. The output side of the subtractor 64 is connected to the voice recognition unit 70 of FIG.

【００１３】次に、本発明の実施形態の音声入力装置の
使用に先立って行われる発声登録者の音声データＶＤ及
び虹彩データＩＤの登録方法（１）と、図１及び図３に
示す本発明の実施形態の音声入力装置の動作（２）につ
いて説明する。（１）音声データと虹彩データの登録方法図４は、図１の登録データベース２０に、登録発声者４
の音声データＶＤと虹彩データＩＤ等の個人情報を予め
登録するための登録装置の一例を示す構成図である。こ
の登録装置は、登録発声者４の目４ａを捉らえるビデオ
カメラ８１を有している。ビデオカメラ８１は、発声登
録者４の目４ａのイメージデータを虹彩データ生成部８
２へ出力するものである。虹彩データ生成部８２は、入
力される目４ａのイメージデータから虹彩の特徴を抽出
し、登録発声者４の虹彩データＩＤを生成するものであ
る。虹彩データ生成部８２の出力側は、登録装置全体の
制御を行う制御部８３に接続されている。制御部８３に
は、ガイドメッセージ表示部８４が接続されている。ガ
イドメッセージ表示部８４は、登録発声者４に対する登
録操作の指示を表示するものである。Next, a method (1) of registering the voice data VD and iris data ID of the utterance registrant, which is performed prior to using the voice input device according to the embodiment of the present invention, and the present invention shown in FIGS. 1 and 3. The operation (2) of the voice input device according to the embodiment will be described. (1) Method of registering voice data and iris data FIG. 4 shows the registered speaker 4 in the registration database 20 of FIG.
5 is a configuration diagram showing an example of a registration device for previously registering personal information such as the voice data VD and the iris data ID of FIG. This registration device has a video camera 81 that catches the eyes 4a of the registered speaker 4. The video camera 81 converts the image data of the eyes 4a of the utterance registrant 4 into the iris data generation unit 8
2 is output. The iris data generation unit 82 extracts the iris feature from the input image data of the eye 4a and generates the iris data ID of the registered speaker 4. The output side of the iris data generation unit 82 is connected to the control unit 83 that controls the entire registration device. A guide message display unit 84 is connected to the control unit 83. The guide message display unit 84 displays a registration operation instruction for the registered speaker 4.

【００１４】一方、この登録装置は、登録発声者４がガ
イドメッセージに従って発声する音声を入力するための
マイクロホン８５を有している。マイクロホン８５の出
力側は、増幅器８６を介してアナログ／ディジタル（以
下、Ａ／Ｄという）変換器８７へ接続されている。Ａ／
Ｄ変換器８７は、入力されるアナログ音声信号を一定周
期でサンプリングしてディジタル信号に変換するもので
ある。Ａ／Ｄ変換器８７の出力側は、音声分析部８８へ
接続されている。音声分析部８８は、ガイドメッセージ
に従って、登録発声者４が発声した音声のパワーが所定
レベル以上で、かつ所定時間以上継続した場合、有効な
音声入力と判断する。そして、音声分析部８８は、入力
された音声の母音に含まれるホルマント周波数分析等の
処理を行い、その分析結果を音声データＶＤとして制御
部８３へ出力するものである。制御部８３には、通信回
線網９０を介して登録データベース２０が接続されてい
る。制御部８３は、虹彩データ生成部８２で生成された
虹彩データＩＤと、音声分析部８８で分析された音声デ
ータＶＤとを含む登録発声者４の個人情報を登録データ
ベース２０に登録する機能を有している。On the other hand, this registration device has a microphone 85 for inputting a voice uttered by the registered speaker 4 according to a guide message. The output side of the microphone 85 is connected to an analog / digital (hereinafter referred to as A / D) converter 87 via an amplifier 86. A /
The D converter 87 samples the input analog audio signal at a constant cycle and converts it into a digital signal. The output side of the A / D converter 87 is connected to the voice analysis unit 88. According to the guide message, the voice analysis unit 88 determines that the voice input by the registered speaker 4 is valid voice input when the power of the voice uttered by the registered voice speaker 4 is equal to or higher than a predetermined level and continues for a predetermined time or longer. Then, the voice analysis unit 88 performs processing such as formant frequency analysis included in the vowels of the input voice, and outputs the analysis result to the control unit 83 as voice data VD. The registration database 20 is connected to the control unit 83 via the communication network 90. The control unit 83 has a function of registering the personal information of the registered speaker 4 including the iris data ID generated by the iris data generation unit 82 and the voice data VD analyzed by the voice analysis unit 88 in the registration database 20. are doing.

【００１５】次に、この図４の登録装置を使用して個人
情報を登録する手順の一例を説明する。この登録装置
は、音声入力がスムーズに行えるように、騒音の少ない
所に設置されているものとする。まず、登録発声者４
は、ガイドメッセージ表示部８４の表示に従い、図示さ
れていない操作部に対して、本人確認のために、例えば
個人登録カードを入力するとともに、暗証番号を入力す
る。制御部８３は、入力された情報から本人であること
を確認すると、図示されていない光源から登録発声者４
の目４ａに光を照射する。これに連動してビデオカメラ
８１は、登録発声者４の目４ａを捉らえ、目４ａのイメ
ージが虹彩データ生成部８２へ入力される。虹彩データ
生成部８２によって、入力された目４ａのイメージデー
タから虹彩の特徴が抽出され、登録発声者４の虹彩デー
タＩＤが生成されて制御部８３に出力される。Next, an example of a procedure for registering personal information using the registration device of FIG. 4 will be described. This registration device is installed in a place with little noise so that voice input can be performed smoothly. First, registered speaker 4
In accordance with the display of the guide message display unit 84, for example, a personal registration card and a personal identification number are input to an operation unit (not shown) for identity verification. When the control unit 83 confirms that the person is who he / she is from the input information, he / she registers the registered speaker 4 from a light source (not shown).
The eye 4a is irradiated with light. In conjunction with this, the video camera 81 captures the eyes 4a of the registered voice speaker 4, and the image of the eyes 4a is input to the iris data generation unit 82. The iris data generation unit 82 extracts iris features from the input image data of the eye 4a, generates the iris data ID of the registered speaker 4, and outputs the iris data ID to the control unit 83.

【００１６】次に、制御部８３は、登録発声者４の音声
データＶＤを得るために、ガイドメッセージ表示部８４
に、例えば、「“アイラブ沖銀行”と発声して下さ
い。」等のガイドメッセージを表示する。ここで発声す
る言葉は、登録発声者４の音声の特徴を正確に分析する
ために、すべての母音を含むような言葉が選ばれる。発
声された音声は、マイクロホン８５で電気信号に変換さ
れ、増幅器８６で所定のレベルに増幅されてＡ／Ｄ変換
器８７でディジタル信号に変換される。ディジタル信号
に変換された音声は、音声分析部８８に入力され、音声
のパワーが所定レベル以上で、かつ所定時間以上継続し
た場合に、有効な入力音声と判断される。有効な入力音
声であると判断されない場合は、再度登録発声者４に発
声を促すようなガイドメッセージを表示し、音声の入力
を繰り返す。この様にして得られた有効な音声入力は、
音声分析部８８においてホルマント周波数分析等の処理
が行われ、その分析結果は音声データＶＤとして制御部
８３へ出力される。この様にして得られた登録発声者４
の虹彩データＩＤと音声データＶＤは、登録発声者４の
その他の個人情報とともに、制御部８３から通信回線網
９０を介して登録データベース２０に登録される。Next, the control unit 83 obtains the voice data VD of the registered speaker 4 in order to obtain the guide message display unit 84.
For example, a guide message such as "Please say" I love Oki Bank "" is displayed. The words to be uttered here are selected to include all vowels in order to accurately analyze the characteristics of the voice of the registered speaker 4. The uttered voice is converted into an electric signal by the microphone 85, amplified to a predetermined level by the amplifier 86, and converted into a digital signal by the A / D converter 87. The voice converted into a digital signal is input to the voice analysis unit 88, and is determined to be a valid input voice when the power of the voice is equal to or higher than a predetermined level and continues for a predetermined time or longer. If it is not determined that the input voice is valid, a guide message for prompting the registered speaker 4 to speak again is displayed, and the voice input is repeated. The effective voice input obtained in this way is
Processing such as formant frequency analysis is performed in the voice analysis unit 88, and the analysis result is output to the control unit 83 as voice data VD. Registered speaker 4 obtained in this way
The iris data ID and the voice data VD are registered in the registration database 20 from the control unit 83 via the communication network 90 together with other personal information of the registered speaker 4.

【００１７】（２）音声入力装置の動作いま、発声者２が図１の音声入力装置の前の所定の位置
に立つと、図示されてないセンサが発声者２を検知し
て、図示されていない光源から発声者２の目２ａに光を
照射する。これに連動して、ビデオカメラ１１は、発声
者２の目２ａを捉らえ、目２ａのイメージがアイリスコ
ード生成部１２へ入力される。アイリスコード生成部１
２によって、発声者２の虹彩の特徴を示すアイリスコー
ドＩＲが生成される。このアイリスコードＩＲは、照合
部１３において登録データベース２０に登録された登録
発声者の虹彩データＩＤと照合されて、発声者２が誰で
あるかの識別が行われる。照合部１３で識別された発声
者２の識別結果は、楽音信号発生部３０の選択部３２へ
与えられる。選択部３２で登録データベース２０が検索
され、発生者２の音声データＶＤが読み出される。更に
選択部３２において、その音声データＶＤに一番類似し
た楽音データＭＤを格納したメモリ３１−ｉの内容が読
み出され、Ｄ／Ａ変換部３３へ出力される。Ｄ／Ａ変換
部３３でアナログに変換されて得られた楽音ＭＴは、楽
音出力部４０の増幅器４１で増幅され、スピーカ４２か
ら発声者２の後方で順番待ちをしている第三者３等へ向
かって出力される。第三者３等へ向かって出力される楽
音ＭＴの大きさは、発声者２の発声する声の大きさより
も大きくなるように調整されている。更に、楽音ＭＴの
周波数成分と発声者２の声のホルマント周波数成分は類
似しているので、第三者３には発声者２の声を聞き分け
ることが困難である。このような状態で、発声者２は、
図示されていないガイダンス出力部からの指示に従っ
て、音響入力部５０のマイクロホン５１に向かって音声
入力を行う。マイクロホン５１には、発声者２の発する
音声のほか、スピーカ４２から出力される楽音ＭＴも回
り込んで入力される。入力されたこれらの音響は、マイ
クロホン５１で電気信号に変換され、増幅器５２で所定
のレベルまで増幅されて、楽音信号抑制部６０の減算部
６４へ与えられる。(2) Operation of the voice input device Now, when the speaker 2 stands at a predetermined position in front of the voice input device of FIG. 1, a sensor (not shown) detects the speaker 2 and is shown. The eyes 2a of the speaker 2 are illuminated with light from a light source that is not present. In conjunction with this, the video camera 11 captures the eye 2a of the speaker 2, and the image of the eye 2a is input to the iris code generation unit 12. Iris code generator 1
2 produces an iris code IR that is characteristic of the iris of the speaker 2. The iris code IR is collated by the collating unit 13 with the iris data ID of the registered speaker registered in the registration database 20 to identify who the speaker 2 is. The identification result of the speaker 2 identified by the collation unit 13 is provided to the selection unit 32 of the tone signal generation unit 30. The registration database 20 is searched by the selection unit 32, and the voice data VD of the creator 2 is read. Further, in the selection unit 32, the contents of the memory 31-i storing the tone data MD most similar to the voice data VD are read out and output to the D / A conversion unit 33. The musical sound MT obtained by being converted into analog by the D / A conversion unit 33 is amplified by the amplifier 41 of the musical sound output unit 40, and the third party 3 or the like waiting in turn behind the speaker 2 from the speaker 42. Is output toward. The volume of the musical sound MT output to the third party 3 or the like is adjusted to be larger than the volume of the voice produced by the speaker 2. Furthermore, since the frequency component of the musical sound MT and the formant frequency component of the voice of the speaker 2 are similar, it is difficult for the third party 3 to distinguish the voice of the speaker 2. In this state, the speaker 2
According to an instruction from a guidance output unit (not shown), a voice is input to the microphone 51 of the sound input unit 50. To the microphone 51, in addition to the voice produced by the speaker 2, the musical tone MT output from the speaker 42 is also circulated and input. These input sounds are converted into electric signals by the microphone 51, amplified by the amplifier 52 to a predetermined level, and given to the subtraction unit 64 of the musical sound signal suppression unit 60.

【００１８】一方、楽音信号発声部３０の出力信号は、
楽音信号抑制部６０の遅延回路６１−１〜６１−ｎへ与
えられており、各遅延回路６１−１〜６１−ｎによっ
て、スピーカ４２からマイクロホン５１までの複数の音
響伝搬経路による伝搬時間に相当する遅延量が与えられ
る。更に、各減衰器６２−１〜６２−ｎによって、各音
響伝搬経路における減衰に相当する減衰量が与えられ
る。各減衰器６２−１〜６２−ｎの出力信号は、加算器
６３で加算されて、減算器６４へ与えられる。減算器６
４では、音響入力部５０から与えられた信号から、加算
器６３から与えられた信号が差し引かれ、その減算結果
が楽音抑制信号ＭＲとして出力される。スピーカ４２と
マイクロホン５１の位置関係は、この音声入力装置を設
置した時点で確定するので、スピーカ４２からマイクロ
ホン５１への音響伝搬経路もほぼ一定になる。従って、
音声入力装置の運用前に、遅延回路６１−１〜６１−ｎ
の遅延量及び減衰器６２−１〜６２−ｎの減衰量は調整
されており、楽音抑制信号ＭＲ中に含まれる楽音信号Ｍ
Ｓのパワーは、音声認識用の閾値ＴＨよりも小さくなる
ように設定されているものとする。この楽音信号抑制部
６０によって、楽音信号成分が閾値ＴＨ以下に抑制さ
れ、ほぼ発声者２の音声だけとなった楽音抑制信号ＭＲ
は、音声認識部７０へ出力される。音声認識部７０で
は、音声認識用の閾値ＴＨによって、音声信号を検出し
て、音声認識を行う。On the other hand, the output signal of the tone signal vocalization section 30 is
It is given to the delay circuits 61-1 to 61-n of the musical sound signal suppressing unit 60, and is equivalent to the propagation time by the plurality of acoustic propagation paths from the speaker 42 to the microphone 51 by each delay circuit 61-1 to 61-n. The amount of delay to do is given. Furthermore, the attenuators 62-1 to 62-n provide attenuation amounts corresponding to the attenuation in each acoustic propagation path. The output signals of the respective attenuators 62-1 to 62-n are added by the adder 63 and given to the subtractor 64. Subtractor 6
In 4, the signal given by the adder 63 is subtracted from the signal given by the sound input unit 50, and the result of the subtraction is output as the musical tone suppression signal MR. Since the positional relationship between the speaker 42 and the microphone 51 is determined when the voice input device is installed, the sound propagation path from the speaker 42 to the microphone 51 is almost constant. Therefore,
Before operating the voice input device, the delay circuits 61-1 to 61-n
And the attenuation amounts of the attenuators 62-1 to 62-n are adjusted, and the tone signal M included in the tone suppression signal MR is
It is assumed that the power of S is set to be smaller than the threshold TH for voice recognition. The musical tone signal suppressing unit 60 suppresses the musical tone signal component below the threshold value TH, and the musical tone suppressing signal MR becomes almost only the voice of the speaker 2.
Is output to the voice recognition unit 70. The voice recognition unit 70 detects a voice signal by the threshold TH for voice recognition and performs voice recognition.

【００１９】この様に、本実施形態の音声入力装置は、
虹彩識別による発声者識別手段１０で発声者２を識別
し、楽音信号発声部３０では、その発声者２の音声デー
タＶＤを登録データベース２０から検索して発声者２の
音声に最も類似する楽音信号ＭＳを出力する。そして、
楽音出力部４０から第三者３に対して楽音ＭＴを出力す
るので、第三者３には、発声者２の声が楽音ＭＴでマス
キングされて聞き取ることが困難となる。更に、楽音信
号抑制部６０は、楽音信号出力部４０から回り込んで音
響入力部５０へ入力された楽音ＭＴを、楽音信号発声部
３０から出力される楽音信号ＭＳを使用して抑制する。
このように楽音ＭＴを使用することにより、発声者２の
音声入力において、音声認識に影響を与えずにプライバ
シーを保護することが可能となる。また、虹彩識別によ
る本人確認が行われるので、特に本実施例の自動入出金
装置のようにセキュリティの確保が必要な応用面に対し
ても利点がある。In this way, the voice input device of this embodiment is
The speaker 2 is identified by the speaker identification means 10 by iris identification, and the tone signal voicing unit 30 retrieves the voice data VD of the speaker 2 from the registration database 20 to obtain a tone signal most similar to the voice of the speaker 2. Output MS. And
Since the musical sound MT is output from the musical sound output unit 40 to the third party 3, it becomes difficult for the third party 3 to hear the voice of the utterer 2 because the voice MT is masked by the musical sound MT. Further, the musical sound signal suppression unit 60 suppresses the musical sound MT which is sneaked from the musical sound signal output unit 40 and input to the acoustic input unit 50 by using the musical sound signal MS output from the musical sound signal vocalization unit 30.
By using the musical sound MT in this way, it becomes possible to protect privacy in the voice input of the speaker 2 without affecting voice recognition. In addition, since the identity verification is performed by the iris identification, there is an advantage particularly in the application side where security is required to be secured like the automatic depositing / dispensing apparatus of the present embodiment.

【００２０】なお、本発明は、上記実施形態に限定され
ず、種々の変形が可能である。この変形例としては、例
えば、次のようなものがある。（ａ）発声者識別手段１０及び登録データベース２０を
省略することが可能である。その場合、発声者２の音声
データＶＤのような個人情報は得られないので、楽音信
号発声部３０は、個別の発声者２ではなく標準的な発声
者を想定してその音声に対するマスキング効果のある楽
音信号ＭＳを発生するようにすれば良い。このような構
成にすることにより、プライバシー保護の効果は若干減
少する場合があるが、装置の構成を簡略化することがで
きる。（ｂ）上記実施形態及び上記（ａ）の音声入力装置で
は、自動入出金装置１に組み込まれた例を説明したが、
音声認識を利用した馬券購入端末等のプライバシー保護
を必要とする装置にも同様に適用することができる。（ｃ）楽音信号発生部３０は、予め各種の楽音データを
格納したメモリ３１−１〜３１−ｍを用い、発声者２の
音声に一番類似した楽音を選択しているが、発声者２の
音声データＶＤに基づいて楽音ＭＴを合成して出力する
ものであっても良い。これにより、更にマスキング効果
の大きな楽音ＭＴを発生することが可能となる。（ｄ）図１の登録データベース２０は、音声入力装置に
組み込まれた構成となっているが、通信回線網を介して
アクセスするような構成としても良い。これにより、虹
彩データＩＤや音声データＶＤの個人情報を一か所で管
理することが可能となり、拡張性及び融通性のあるシス
テムを構成することができる。The present invention is not limited to the above embodiment, and various modifications can be made. For example, there are the following modifications. (A) It is possible to omit the speaker identification means 10 and the registration database 20. In that case, personal information such as the voice data VD of the speaker 2 cannot be obtained, so that the musical tone signal vocalization unit 30 assumes the standard speaker, not the individual speaker 2, and provides a masking effect for the voice. It is sufficient to generate a certain tone signal MS. With this configuration, the privacy protection effect may be slightly reduced, but the configuration of the device can be simplified. (B) In the above embodiment and the voice input device of the above (a), the example of being incorporated in the automatic depositing / dispensing device 1 has been described.
The invention can be similarly applied to a device requiring privacy protection, such as a betting ticket purchase terminal using voice recognition. (C) The tone signal generator 30 uses the memories 31-1 to 31-m in which various tone data are stored in advance, and selects a tone that is most similar to the voice of the speaker 2. The musical sound MT may be synthesized and output based on the audio data VD. This makes it possible to generate a musical tone MT having a greater masking effect. (D) Although the registration database 20 of FIG. 1 has a configuration incorporated in the voice input device, it may have a configuration of being accessed via a communication line network. This makes it possible to manage the personal information of the iris data ID and the voice data VD in one place, and thus it is possible to configure a system having expandability and flexibility.

【００２１】[0021]

【発明の効果】以上詳細に説明したように、第１の発明
によれば、楽音信号発生手段と楽音出力手段とにより、
発声者の発する音声を打ち消すような楽音が出力され
る。これにより、発声者の近傍にいる第三者には、発声
者の声を聞き取ることが困難になる。一方、この楽音は
音響入力手段からずれた方向に放射されるように設定さ
れているので、この音響入力手段には、発声者の発する
音声よりも低レベルで入力される。更に、楽音信号抑制
手段において楽音信号成分は、音声認識用の閾値以下に
抑制される。このため、音声認識には影響を与えずに、
第三者に対してはプライバシーを保護することが可能な
音声入力装置を提供することができる。第２及び第３の
発明によれば、発声者識別手段によって、発声者の識別
が行われる。更に、楽音信号発声手段で、識別された発
声者の音声データが登録手段から検索され、その音声デ
ータに基づいて発声者の音声を打ち消すような楽音信号
が発生される。このため、第１の発明よりも更に発声者
の発する音声を打ち消す効果のある楽音の出力が可能に
なるので、更にプライバシー保護の効果のある音声入力
装置を提供することができる。As described in detail above, according to the first aspect of the invention, the tone signal generating means and the tone outputting means provide
A musical sound that cancels the voice of the speaker is output. This makes it difficult for a third party near the speaker to hear the voice of the speaker. On the other hand, since this musical sound is set to be emitted in a direction deviated from the sound input means, it is input to this sound input means at a level lower than that of the voice produced by the speaker. Further, the musical tone signal suppressing unit suppresses the musical tone signal component to a value equal to or less than the threshold for voice recognition. Therefore, without affecting the voice recognition,
A voice input device capable of protecting privacy can be provided to a third party. According to the second and third aspects, the speaker is identified by the speaker identifying means. Further, the sound signal voicing means retrieves the voice data of the identified utterer from the registration means, and generates a tone signal for canceling the voice of the utterer based on the voice data. For this reason, it is possible to output a musical sound having an effect of canceling the voice uttered by the speaker more than in the first aspect of the invention, and thus it is possible to provide a voice input device having a further privacy protection effect.

[Brief description of drawings]

【図１】本発明の実施形態を示す音声入力装置の構成図
である。FIG. 1 is a configuration diagram of a voice input device showing an embodiment of the present invention.

【図２】本発明の原理を説明するための虹彩の構造図で
ある。FIG. 2 is a structural diagram of an iris for explaining the principle of the present invention.

【図３】図１の楽音信号発生部３０、楽音出力部４０、
音響入力部５０及び楽音信号抑制部６０の構成図であ
る。FIG. 3 is a diagram showing a musical tone signal generator 30, a musical tone output unit 40, and
3 is a configuration diagram of a sound input unit 50 and a musical sound signal suppression unit 60. FIG.

【図４】本発明の実施形態の登録データベース２０に対
する登録装置の構成図である。FIG. 4 is a configuration diagram of a registration device for the registration database 20 according to the embodiment of this invention.

[Explanation of symbols]

２発声者１０発声者識別手段２０登録データベース３０楽音信号発生部３１−１〜３１−ｍメモリ３２選択部４０楽音出力部５０音響入力部６０楽音信号抑制部７０音声認識部Ａ目Ｂ虹彩Ｃ瞳孔 2 Speaker 10 Speaker identification means 20 Registration database 30 Musical sound signal generation part 31-1 to 31-m Memory 32 Selection part 40 Musical sound output part 50 Sound input part 60 Musical sound signal suppression part 70 Speech recognition part A Eye B Iris C Pupil

Claims

[Claims]

1. A sound input means for inputting a voice uttered by a speaker at a predetermined position and a sound including a musical sound from the surroundings of the predetermined position and converting the sound into an electric signal, which is included in a vowel of a voice. A musical sound signal generating means mainly for formant frequency component, which generates a musical sound signal that can be canceled with respect to the voice uttered by the speaker, and a sound signal in the vicinity of the predetermined position and in a direction deviated from the sound input means. A sound output means for setting a radiation range, converting the musical sound signal into a sound, and outputting the musical sound having an average power larger than the average level of the voice produced by the speaker to the radial range of the sound, and the acoustic input. A musical tone signal suppressing means for subtracting the musical tone signal generated by the musical tone signal generating means from the output signal of the means, and suppressing the power of the musical tone signal in the output signal of the acoustic input means below a threshold for voice recognition. A voice input device comprising: a voice detection unit that detects a voice signal when the power of the output signal of the tone signal suppression unit exceeds the threshold value.

2. A registration unit in which voice data corresponding to a formant frequency component of a vowel in voices of a plurality of registered speakers and iris data corresponding to an iris pattern of each registered speaker are registered, and a predetermined registration unit. The iris pattern of the speaker is detected at the position, the features of the iris are extracted, the extraction result is compared with the iris data registered in the registration means, and the speaker is selected from among the plurality of registered speakers. The speaker identification means for identifying which one of the corresponding persons, the voice uttered by the speaker at the predetermined position, and the sound including the musical sound from the vicinity of the predetermined position are input as an electric signal. A sound input means for converting the voice data of the speaker identified by the speaker identification means into the registration means, and the voice generated by the speaker can be canceled based on the voice data. Music A tone signal generating means for generating a signal, and a sound emission range is set in the vicinity of the predetermined position and in a direction deviated from the sound input means, and the tone signal is converted into a sound and emitted by the speaker. The musical sound having an average power larger than the average level of the voice, a musical sound output means for outputting to the emission range of the sound, and subtracting the musical sound signal generated by the musical sound signal generating means from the output signal of the acoustic input means, A musical tone signal suppressing means for suppressing the power of the musical tone signal in the output signal of the acoustic input means to a threshold for voice recognition or less, and a voice signal when the power of the output signal of the musical tone signal suppressing means exceeds the threshold value. An audio input device, comprising:

3. The musical tone signal generating means selects the musical tone data having a frequency component most similar to the voice data of the speaker and a storage unit for storing a plurality of musical tone data created in advance. The voice input device according to claim 2, further comprising a selection unit for reading from the unit.