JPH05265482A

JPH05265482A - Information processor

Info

Publication number: JPH05265482A
Application number: JP4059933A
Authority: JP
Inventors: Kazuo Fujimoto; 和生藤本
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-03-17
Filing date: 1992-03-17
Publication date: 1993-10-15

Abstract

PURPOSE:To improve the speech recognition rate of the input speech from an unspecified speaker. CONSTITUTION:When a recognizing means 5 recognizes the speech input from an input means 4, the speech of the unspecified speaker is recognized by using standard speech information stored in an information storage means 6 and information in a difference information storage means 10 stored with difference information between the standard speech information in the information storage means 6 and speech functions characteristic to respective individuals, so the speech recognition is performed by utilizing the features of the vocalization of each unspecified speaker to provide a speech recognition device which is effective to service wherein a high recognition rate is required.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、不特定話者を対象とし
た音声による入力指示を音声認識することによって、該
当サービスの適用を行う情報処理装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing apparatus that applies a corresponding service by recognizing a voice input instruction for an unspecified speaker.

【０００２】[0002]

【従来の技術】近年、金融業界や、流通業界では、電話
回線等を用いた音声入力による音声認識装置が導入され
ている。これらの音声認識装置を用いて、銀行の預金の
残高照会や、各種宅配用のオーダーエントリー等のサー
ビスが利用者に提供されている。また最近では家電商品
や自動車内製品に音声認識装置が開発され、カーオーデ
ィオやビデオの予約、電話等に応用されている。2. Description of the Related Art In recent years, a voice recognition device by voice input using a telephone line has been introduced in the financial industry and the distribution industry. Using these voice recognition devices, services such as bank deposit balance inquiry and order entry for various home deliveries are provided to users. Recently, voice recognition devices have been developed for home electric appliances and in-vehicle products, and have been applied to car audio and video reservations, telephones and the like.

【０００３】以下に従来の音声認識機能を有する情報処
理装置について説明する。図４は従来の情報処理装置の
構成を示すものである。図４において、５１はここで説
明する情報処理装置である。構成要素として５４は入力
手段で、音声入力を含む入力を受け付ける。５５は認識
手段で、音声認識を行う。５６は情報格納手段で、音声
認識を行うために必要な標準の音声情報を格納する。５
８は出力手段で、音声入力を促進するメッセージや音声
認識結果等を出力する。An information processing apparatus having a conventional voice recognition function will be described below. FIG. 4 shows the configuration of a conventional information processing apparatus. In FIG. 4, reference numeral 51 is an information processing apparatus described here. As a component, 54 is an input means, which receives an input including a voice input. A recognition unit 55 performs voice recognition. An information storage unit 56 stores standard voice information necessary for voice recognition. 5
An output unit 8 outputs a message for prompting voice input, a voice recognition result, and the like.

【０００４】以下にその構成要素のお互いの関連動作に
ついて説明する。まず利用者は、情報処理装置５１の出
力手段５８からの入力促進メッセージ（音声出力による
ガイダンスや、表示装置による案内表示）に従って音声
入力を行う。入力手段５４はこの音声入力を受け付ける
とともに、次の認識手段５５で認識処理を行うために必
要な処理を行う。The operation of the constituent elements related to each other will be described below. First, the user performs voice input according to an input prompting message (guidance by voice output or guidance display by a display device) from the output means 58 of the information processing device 51. The input means 54 accepts this voice input and performs the processing necessary for the recognition processing by the next recognition means 55.

【０００５】まず入力されたアナログ信号をデジタル信
号に変換する。通常サンプリング周波数として、８から
１０ｋHzが用いられ、また各々サンプリング時間毎に、
８から１６ビットの値で量子化される。認識手段５５
は、入力信号と情報格納手段５６の中の音声情報とパタ
ーンマッチング（以下マッチングと略す）することによ
って、ある一定の閾値以上で最も似ている情報を選び出
す。そして発声された単語としてマッチングした音声認
識結果を返す。音声認識結果から次に要求する入力候補
内容を決定したり、出力手段５８から出力する内容を決
定する。First, the input analog signal is converted into a digital signal. Normally, 8 to 10 kHz is used as the sampling frequency, and at each sampling time,
It is quantized with a value of 8 to 16 bits. Recognition means 55
Is pattern-matched (hereinafter abbreviated as matching) with the input signal and the voice information in the information storage means 56 to select the most similar information above a certain threshold. Then, the speech recognition result matched as the uttered word is returned. Based on the voice recognition result, the content of the next input candidate to be requested is determined, and the content to be output from the output means 58 is determined.

【０００６】この情報処理装置５１を電話機に応用した
場合、市外局番号を入力するためには、１０桁程度の数
字の発声が必要である。少なくとも０（「ぜろ」と発
音）から９（「きゅう」と発音）までの数字を順に発声
し、その内容を認識して、ダイヤルパルスやトーン信号
に変換し、電話をかけるものである。その認識結果は、
電話機に付属の出力手段５８の表示装置や、音声出力装
置を用いて、認識された結果を利用者が認識することが
できる（特開昭６３−３３７９６等）。電話番号以外に
も、暗証番号の入力についても同様である。When the information processing apparatus 51 is applied to a telephone, it is necessary to utter a number of about 10 digits in order to input the area code. At least a number from 0 (pronounced "zero") to 9 (pronounced "kyu") is uttered in sequence, the content is recognized, converted into dial pulse or tone signal, and a call is made. The recognition result is
The user can recognize the recognized result by using the display device of the output means 58 attached to the telephone or the voice output device (Japanese Patent Laid-Open No. 63-33796). The same applies to the input of a password other than the telephone number.

【０００７】１０桁の数字を音声入力する時間を少しで
も短縮するためには、１つの数字を発声し、認識を行っ
て結果を表示すると同時に、次の入力を待つように構成
される。認識が行われた場合は、該当数字情報に変換さ
れて次の入力を待つ。発声が小さかった場合等で認識が
できなかった場合は、言い直しを要求して、次の入力を
待つ構成となっていた。音声入力に慣れてくると表示装
置などの情報を見なくても入力ができるようになる。従
って視線を電話帳の該当数字列から外すことなく、順に
発声していくことにより電話番号入力を行うことが可能
となる。入力結果は、全ての発声完了後に確認すればよ
く、１個１個の入力毎に表示装置等を見る必要はない。In order to shorten the time required to input a 10-digit number by voice as much as possible, one number is uttered, recognition is performed, and the result is displayed, while waiting for the next input. If it is recognized, it is converted into the corresponding numerical information and waits for the next input. When the voice cannot be recognized due to a small amount of utterance, a request for rewording is made and the system waits for the next input. As you become accustomed to voice input, you will be able to input without looking at information on the display device. Therefore, it is possible to input the telephone number by speaking in order without removing the line of sight from the corresponding number string in the telephone directory. The input result may be confirmed after the completion of all utterances, and it is not necessary to look at the display device or the like for each input.

【０００８】公衆電話機のように、誰もが利用する機器
において音声認識を行うためにはいくつかの条件が必要
である。その条件の１つに、利用者の音声登録作業を伴
わずとも、誰の音声でも音声認識できなければならない
ことがある。なぜなら機器毎に発声語彙が異なるような
場合、毎回音声を登録する必要があり、該当機器の普及
を阻害する大きな要因に成りかねないからである。また
利用者の発声の仕方や、機器の設置環境で同一人でも発
声の仕方は変化する。さらに発声量や発声周波数は個人
差が大きい。そこで、利用者を特定すること無しに、誰
の声でも音声認識するものを不特定話者認識と定義す
る。[0008] In order to perform voice recognition in a device used by everyone, such as a public telephone, some conditions are required. One of the conditions is that it is necessary for any person's voice to be able to be voice-recognized without the user's voice registration work. This is because if the vocabulary is different for each device, it is necessary to register the voice every time, which may be a major factor that hinders the spread of the device. Also, the way the user speaks and the way the same person speaks depends on the environment in which the device is installed. Furthermore, the amount of utterance and the voicing frequency have large individual differences. Therefore, what is recognized by anyone's voice without specifying a user is defined as unspecified speaker recognition.

【０００９】不特定話者認識では、音声認識を行う標準
音声情報を作成するために、複数の人（数人から数百人
程度）に予め定められた語彙等を発声してもらい、その
音声を収録し統計的処理を行う。そして音声認識を行う
ために適した情報を抽出し、その情報を標準音声情報と
して情報格納手段５６に格納する。情報格納手段５６に
格納された標準音声情報と、入力手段５４からの音声入
力とマッチングすることによって、発声語彙を音声認識
する。不特定話者の音声認識率を向上させるためには、
より多くのかつ異なった（性別、年齢、住所）発声を収
録し、音声情報処理を行うことが必要である。In unspecified speaker recognition, in order to create standard speech information for speech recognition, a plurality of people (several people to several hundred people) utter a predetermined vocabulary and the like. Is recorded and statistically processed. Then, information suitable for performing voice recognition is extracted, and the information is stored in the information storage means 56 as standard voice information. By matching the standard voice information stored in the information storage unit 56 with the voice input from the input unit 54, the vocal vocabulary is recognized by voice. In order to improve the voice recognition rate of unspecified speakers,
It is necessary to record more and different (gender, age, address) utterances and perform voice information processing.

【００１０】ここで認識率を次のように定義する。音声
認識可能な閾値の範囲内で入力された音声を指定された
語群のなかからマッチングし、入力指定した語彙と同じ
語彙を指す情報が認識結果として得られた確率とする。
従って入力手段５４が故障していたり、入力音声が音声
認識を行うには小さすぎたりしたマッチングできないも
のを除く。あくまで誤ったマッチングを行わなかった確
率とする。Here, the recognition rate is defined as follows. It is assumed that the input voice within the threshold of voice recognition is matched from the specified word group, and the information indicating the same vocabulary as the input specified vocabulary is the probability obtained as the recognition result.
Therefore, the input means 54 is out of order, or the input voice is too small for the voice recognition to be excluded from matching. The probability is that no incorrect matching is performed.

【００１１】不特定話者認識では、認識率が９５パーセ
ント程度であり、必ずしも全ての人の発声を認識できる
わけではない。これは音声情報処理は統計処理であり、
収録音声の平均と分散をとり、標準偏差等の値を利用し
て、ある一定範囲の音声情報を利用するからである。音
声認識の高速演算を可能とするために、情報量を制約す
るという条件がある。従って認識率１００パーセントを
達成する情報量を持つことは難しい。研究レベルでは、
音声認識のマッチングを行うための演算時間は、あまり
関係がないが、実用化レベルでは、発声終了後数十秒以
上かかるものは使用できない。発声終了後、長くても数
秒以内に認識結果を出力しなければ、利用者に不便さを
感じさせる原因と成りかねない。In the unspecified speaker recognition, the recognition rate is about 95%, and the utterances of all persons cannot necessarily be recognized. This is because voice information processing is statistical processing,
This is because the average and variance of the recorded voices are calculated and the value of the standard deviation or the like is used to use voice information within a certain range. There is a condition that the amount of information is restricted in order to enable high-speed calculation of voice recognition. Therefore, it is difficult to have the amount of information that achieves the recognition rate of 100%. At the research level,
The calculation time for matching the voice recognition is not so much related, but at the practical level, it cannot be used if it takes several tens of seconds or more after the end of utterance. If the recognition result is not output within a few seconds at the longest after the utterance ends, it may cause the user to feel inconvenience.

【００１２】しかしながら不特定話者認識は、利用者が
発声するだけで認識可能であるので、家電製品への応用
が検討されている。ビデオ機器の音声予約リモコン等が
実用化されている。この音声予約リモコンの音声信号処
理の主な構成要素としては、音声合成用のＬＳＩと、Ａ
／Ｄ変換を行うためのＩＣ、音声認識用の高速演算を行
うＤＳＰ（デジタル信号処理プロセサ）、標準音声情報
を格納したメモリのＲＯＭがおさめられている。この機
器の場合、発声終了後１秒以内に認識結果を出力する。However, since the unspecified speaker recognition can be recognized only by the user uttering, application to home electric appliances is under study. Voice reservation remote controllers for video equipment have been put to practical use. The main components of the voice signal processing of this voice reservation remote controller are an LSI for voice synthesis and an A
An IC for performing D / D conversion, a DSP (digital signal processing processor) for performing high-speed calculation for voice recognition, and a ROM of a memory storing standard voice information are stored. In the case of this device, the recognition result is output within 1 second after the end of utterance.

【００１３】例えば音声予約リモコンでは、合成音声に
よるガイダンス出力に応じて、該当チャンネル番号、予
約曜日、予約開始時間と終了時間の指定を音声によって
行う。音声認識結果は、出力手段５８であるＬＣＤ表示
装置に出力され、発声内容と合っていれば、確定用のボ
タンスイッチを押し、誤っていればもう一度（正しく認
識されるまで）発声を行うものである。For example, in the voice reservation remote controller, the corresponding channel number, the reserved day of the week, the reservation start time and the reservation time are designated by voice according to the guidance output by the synthesized voice. The voice recognition result is output to the LCD display device which is the output means 58. If the voice recognition result matches the utterance content, the confirmation button switch is pressed, and if wrong, the voice is uttered again (until correctly recognized). is there.

【００１４】一方発声者を特定の個人に限定し、発声者
個人が操作したい音声を情報格納手段５６に登録するこ
とによって、登録された内容と、自分の発声とを比較す
る音声認識方法もある。これを特定話者認識と定義す
る。特定話者認識では、多くの人の発声情報を集めなく
ても、音声認識が可能である。しかしながら、必ず最初
に自分で何回か発声し、声を登録するという作業が必要
である。また登録者以外では、その機器を使用すること
ができない問題点がある。従って特定話者認識は、予め
利用者が限定されている分野である自動車用のカーオー
ディオ、自動車電話等で応用が検討されている。On the other hand, there is also a voice recognition method in which the speaker is limited to a specific individual and the voice that the individual speaker wants to operate is registered in the information storage means 56 to compare the registered content with his or her own voice. .. This is defined as specific speaker recognition. In the specific speaker recognition, voice recognition is possible without collecting the vocalization information of many people. However, it is always necessary to utter a few times and register the voice first. In addition, there is a problem that only the registered person can use the device. Therefore, application of the specific speaker recognition is being considered for car audio for automobiles, automobile telephones, etc., which are fields in which users are limited in advance.

【００１５】一般に同一人の同一発声語彙による特定話
者認識の認識率は、不特定話者認識での同一発声語彙の
認識率に比べて高い。これは特定話者認識の方が、より
発声の特徴を生かした認識を行うことができる。従っ
て、発声者個人の特徴あるパラメータのみを演算するこ
とが可能であるため、演算量も不特定話者の時に比べて
少ない場合が多い。In general, the recognition rate of specific speaker recognition by the same vocabulary of the same person is higher than the recognition rate of the same vocabulary by unspecified speaker recognition. In this case, the specific speaker recognition can perform the recognition by making better use of the characteristics of the utterance. Therefore, since it is possible to calculate only the characteristic parameters of the individual speaker, the calculation amount is often smaller than that of the unspecified speaker.

【００１６】[0016]

【発明が解決しようとする課題】しかしながら従来の構
成では、利用者を限定しない機器（公衆電話機、券売
機、レンタル機器等）に音声認識を適用した場合（不特
定話者認識）、音声認識率には限界があり、標準音声と
はかなり異なる特定の人の発声内容は認識できない（誤
認識する）という問題点があった。However, in the conventional configuration, when the voice recognition is applied to a device (public telephone, ticket vending machine, rental device, etc.) that does not limit the user (recognition of unspecified speaker), the voice recognition rate. However, there is a problem that the speech content of a specific person, which is quite different from the standard speech, cannot be recognized (misrecognized).

【００１７】一方特定話者認識技術を用いるためには、
音声登録作業を行わなければならないため、前記の利用
者を限定しない機器には適用が行いにくいという問題点
も有していた。On the other hand, in order to use the specific speaker recognition technique,
Since the voice registration work has to be performed, there is also a problem that it is difficult to apply the device to the above-mentioned devices that do not limit users.

【００１８】本発明は上記課題に留意し、不特定話者の
音声認識率を高めた情報処理装置を提供することを目的
とする。The present invention has been made in view of the above problems, and an object of the present invention is to provide an information processing apparatus having an increased voice recognition rate of an unspecified speaker.

【００１９】[0019]

【課題を解決するための手段】この目的を達成するため
に本発明の情報処理装置は、音声入力を受け付ける入力
手段と、音声認識する認識手段と、音声認識時に参照す
る情報格納手段と、この情報格納手段に格納されている
情報との差分情報を格納する差分情報格納手段とを有
し、情報格納手段と差分情報格納手段に格納されている
情報を用いて、音声認識を行う制御手段を有するもので
ある。In order to achieve this object, an information processing apparatus of the present invention comprises an input means for receiving a voice input, a voice recognition recognition means, an information storage means referred to at the time of voice recognition, and A difference information storage means for storing difference information from information stored in the information storage means, and a control means for performing voice recognition using the information storage means and the information stored in the difference information storage means. I have.

【００２０】[0020]

【作用】上記構成の本発明の情報処理装置は、差分情報
格納手段により各個人に固有な差分情報内容を記憶媒体
に格納し、音声認識を行う際に、制御手段により情報格
納手段の標準音声情報と、この差分情報格納手段の差分
情報を参照しながら不特定話者認識を行うことで、高い
認識率が要求されるサービスに有効な装置を提供するこ
とができる。In the information processing apparatus of the present invention having the above structure, the difference information storing means stores the difference information content unique to each individual in the storage medium, and when the voice recognition is performed, the standard voice of the information storing means is controlled by the control means. By performing the unspecified speaker recognition with reference to the information and the difference information in the difference information storage means, it is possible to provide a device effective for a service requiring a high recognition rate.

【００２１】[0021]

【Example】

（実施例１）以下本発明の一実施例について、図面を参
照しながら説明する。図１は、本発明の第１の実施例に
おける情報処理装置の構成図を示すものである。図１に
示すように構成要素として１はここで説明する情報処理
装置、２は音声情報処理を行う音声認識部である。４は
入力手段で、音声を含む入力を受け付ける。６は情報格
納手段で、音声認識語彙情報等を格納する。５は認識手
段で、入力手段４から入力された音声信号を、情報格納
手段６の情報を参照して音声認識作業を行う。７は情報
制御手段で、情報格納手段６に格納されている情報以外
の情報を制御するものである。１０は情報格納手段６に
格納されている情報との差分情報を格納する差分情報格
納手段で、３は差分情報格納手段１０を有する記憶装置
としての記憶媒体である。(Embodiment 1) An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram of an information processing apparatus according to the first embodiment of the present invention. As shown in FIG. 1, as a component, 1 is an information processing device described here, and 2 is a voice recognition unit that performs voice information processing. An input unit 4 receives an input including voice. An information storage unit 6 stores voice recognition vocabulary information and the like. Reference numeral 5 is a recognizing means for performing a voice recognizing operation on the voice signal input from the input means 4 by referring to the information in the information storing means 6. An information control unit 7 controls information other than the information stored in the information storage unit 6. Reference numeral 10 is a difference information storage means for storing difference information with respect to the information stored in the information storage means 6, and reference numeral 3 is a storage medium as a storage device having the difference information storage means 10.

【００２２】前記のように構成された情報処理装置１に
ついて、その構成要素のお互いの関連動作を説明する。
前提として記憶媒体３は音声認識部２と着脱可能な形態
で、携帯可能な媒体で構成した例について説明する。着
脱可能でない記憶媒体３の実施例については、実施例２
で説明する。With respect to the information processing apparatus 1 configured as described above, the mutual operation of the components will be described.
As an assumption, an example in which the storage medium 3 is removable from the voice recognition unit 2 and is a portable medium will be described. For the embodiment of the storage medium 3 which is not removable, refer to Embodiment 2
Described in.

【００２３】まず利用者が記憶媒体３を情報処理装置１
に挿入する。情報処理装置１は記憶媒体３が挿入された
ことを検知し、音声認識部２と結合させる。次に情報制
御手段７が、結合された記憶媒体３が利用可能な媒体か
否かを判定する。利用可能と判定すれば、記憶媒体３内
の差分情報を差分情報格納手段１０から獲得する。First, the user loads the storage medium 3 into the information processing device 1
To insert. The information processing device 1 detects that the storage medium 3 has been inserted and couples it with the voice recognition unit 2. Next, the information control means 7 determines whether the coupled storage medium 3 is a usable medium. If it is determined that it is available, the difference information in the storage medium 3 is acquired from the difference information storage means 10.

【００２４】次に出力手段（図示せず）から、利用者に
対しサービスを開始するために必要な情報の入力を促進
するメッセージ等を出力する。例えば、現金引き出しサ
ービス等のサービスを受けたい場合、適用サービス名を
入力してください等を表示装置または、音声出力装置か
ら出力する。利用者は、出力されたメッセージを頼り
に、音声を用いて入力を行う。Next, the output means (not shown) outputs a message or the like for prompting the user to input information necessary for starting the service. For example, if you want to receive a service such as a cash withdrawal service, enter the applicable service name or the like and output it from the display device or the voice output device. The user relies on the output message to input using voice.

【００２５】入力手段４から入力されたサービス名や、
暗証番号等の入力音声は、情報格納手段６内に格納され
ている標準の音声情報と情報制御手段７に獲得された差
分情報を元にマッチングされることによって、音声認識
が行われる。音声認識を行い、該当情報を音声信号から
適当な数字や文字情報等の情報に変換して、各種の情報
処理を行う。出力手段は、表示装置や音声出力装置を単
独または組み合わせて構成し、入力依頼を出力するだけ
でなく、各種のサービスを適用するときにも用いられ
る。The service name input from the input means 4,
The input voice such as a personal identification number is matched based on the standard voice information stored in the information storage means 6 and the difference information acquired by the information control means 7 to perform voice recognition. Various information processing is performed by performing voice recognition, converting the relevant information from the voice signal into information such as appropriate numeral and character information. The output unit is configured not only to output an input request but also to apply various services, by configuring a display device and a voice output device individually or in combination.

【００２６】マッチングされる音声情報は次のようにし
て格納されている。音声認識可能な語彙は、利用種別ご
とに、群というグループに分割されて格納されている。
例えば、第１群はサービス適用名（例えば、預け入れ、
引き出し、残高、クレジット等）であり、第２群は暗証
番号等の入力（「ぜろ」から「きゅう」）に用いる数字
であり、第３群はサービス確定指示用の語彙（取り消
し、訂正、確定、確認等）等で構成する。The matched voice information is stored as follows. The vocabulary that can be recognized by voice is divided into groups and stored for each usage type.
For example, the first group includes service application names (eg, deposit,
Withdrawals, balances, credits, etc.), the second group is numbers used for inputting personal identification numbers (“Zero” to “Kyu”), and the third group is vocabulary for service confirmation instructions (cancellation, correction, Confirmation, confirmation, etc.) etc.

【００２７】利用目的に応じて語彙情報をグループ化す
ることにより、指定されたグループ内で最も発声された
音声情報と近いものを探し出す作業を行う。もちろん各
情報は、マッチングを短時間で行いやすい情報に符号化
されている。またこのマッチングに、閾値というものを
設け、ある値以上でないと、マッチング語彙がないとい
う音声認識結果を出力する閾値判定手段を認識手段５内
に有する。By grouping the vocabulary information in accordance with the purpose of use, work is performed to find the one that is closest to the most vocalized voice information in the designated group. Of course, each information is coded into information that facilitates matching in a short time. Further, a threshold value is provided for this matching, and the recognition means 5 has a threshold value judgment means for outputting a voice recognition result that there is no matching vocabulary unless it is a certain value or more.

【００２８】入力音声は、特定の入力レベルの閾値を越
えたところで、語彙情報を持つ発声の開始を検知し、閾
値を下回った時に発声の終了を確認する。この発声の区
間中の入力音声を、数ｋHz（８ｋHzから１０ｋHz程度）
でサンプリングし、各サンプリング時間毎に、８ビット
から１２ビット程度で、量子化することによってデジタ
ル化する。When the input voice exceeds the threshold of a specific input level, the start of utterance having vocabulary information is detected, and when the input voice falls below the threshold, the end of utterance is confirmed. Input voice in this vocal section is several kHz (8kHz to 10kHz)
Are sampled, and at each sampling time, about 8 to 12 bits are quantized and digitized.

【００２９】音声認識は例えば、このデジタル値を用い
て、ＬＰＣケプストラム係数を求め、これを特徴パラメ
ータとして、標準の語彙のもつそのパラメータ量と比較
することにより、最も近いものをその群番号で与えられ
た語彙の中から見つけだす。これを発声の区間中を通じ
てマッチングし、最終的に最も近いものを発声語彙とし
て認識するものである。ＬＰＣケプストラム係数以外に
も種々の方法があるが、前記のサンプリングによって得
られたデジタル値を、各種の信号処理を施して特徴パラ
メータを抽出し、マッチングを行う方法が一般的であ
る。For speech recognition, for example, an LPC cepstrum coefficient is obtained by using this digital value, and this is used as a characteristic parameter and compared with the parameter amount of the standard vocabulary to give the closest one by its group number. Find out from the given vocabulary. This is matched throughout the vocalization section, and finally the closest one is recognized as the vocalized vocabulary. Although there are various methods other than the LPC cepstrum coefficient, a method is generally used in which the digital value obtained by the above-described sampling is subjected to various signal processings to extract characteristic parameters and matching is performed.

【００３０】情報格納手段６には、数人から数百人程度
の発声から統計的処理を施され、特徴パラメータである
ＬＰＣケプストラム係数や、音声パワー値（特定の周波
数帯の音声レベル等）、発声時間、マッチング探索係数
等が格納されている。しかしながら従来例の中で説明し
た通り、演算処理性能等の制約により、平均や分散、標
準偏差等から求められる値を使用しているため、くせの
ある（標準からはかけ離れた）発声語彙の認識を正確に
行うことは難しい。The information storage means 6 is subjected to statistical processing from utterances of several people to several hundred people, and has LPC cepstrum coefficient which is a characteristic parameter, a voice power value (a voice level of a specific frequency band, etc.), The utterance time, the matching search coefficient, etc. are stored. However, as explained in the conventional example, due to constraints such as arithmetic processing performance, the values obtained from the average, variance, standard deviation, etc. are used, so that there is a tendency to recognize a vocal vocabulary that is far from the standard. Is difficult to do accurately.

【００３１】そこで記憶媒体３内の差分情報格納手段１
０に、平均情報から、利用者の発声情報が異なる差分情
報を格納しておき、各特徴パラメータの標準情報から差
分情報を差し引いて認識処理を行う。これにより個人特
有の情報が反映されて、認識率が向上する。Therefore, the difference information storage means 1 in the storage medium 3
The difference information in which the voice information of the user is different from the average information is stored in 0, and the difference information is subtracted from the standard information of each feature parameter to perform the recognition process. Thereby, the information peculiar to the individual is reflected, and the recognition rate is improved.

【００３２】標準音声情報との差分情報のみ記憶媒体３
に保有すれば良いため、音声認識に必要な全ての音声情
報を保有する必要がない。また音声認識部２の情報格納
手段６には、あくまで標準用の音声情報のみを保有する
だけでよい。差分による補正分を差し引けば十分な認識
率をあげることができる。従って標準音声情報を集める
際においても、以前より少ない人数の音声情報で、標準
用の音声情報を作成しやすくなる利点もある。Only the difference information from the standard voice information is stored in the storage medium 3
Therefore, it is not necessary to retain all the voice information necessary for voice recognition. Further, the information storage means 6 of the voice recognition unit 2 need only hold standard voice information. A sufficient recognition rate can be raised by subtracting the correction amount due to the difference. Therefore, when collecting the standard voice information, there is an advantage that the standard voice information can be easily created with a smaller number of voice information than before.

【００３３】音声認識には、認識単位として代表的なも
のに音素認識、単語認識、文認識がある。音素認識と
は、主として子音、母音を単位とした音声認識を行うも
のである。単語認識は、１つの単語（長くても数秒程
度）単位で音声認識を行うものである。文認識は、複数
の単語からなる文節単位、文単位で文法等を意識して音
声認識を行うものである。Typical examples of the voice recognition include phoneme recognition, word recognition, and sentence recognition as recognition units. The phoneme recognition is mainly for recognizing voices in units of consonants and vowels. In word recognition, voice recognition is performed in units of one word (a few seconds at the longest). Sentence recognition is for recognizing a grammar and the like in units of clauses and sentences consisting of a plurality of words and performing speech recognition.

【００３４】音声認識を音素単位で行う音素認識に対し
ては、各音素情報毎に標準音声情報を作成すると共に、
差分情報を作成する。この場合の差分情報は、各音素に
対しての値になる。一方単語単位で行う単語認識に対し
ては、各単語情報毎に標準音声情報を作成し、また差分
情報も各単語の発声に対しての情報となる。もしくは単
語を構成する音素情報を用いて単語認識を行うこともで
きる。記憶媒体３には音素に対する差分情報を、情報格
納手段６には音素単位の音声標準情報と、音素を組み合
わせた語彙情報で構成する。文単位は、複数の単語認識
の組み合わせになる。For the phoneme recognition in which the voice recognition is carried out in units of phonemes, standard voice information is created for each phoneme information, and
Create difference information. The difference information in this case is a value for each phoneme. On the other hand, for word recognition performed on a word-by-word basis, standard voice information is created for each word information, and the difference information is also information for utterance of each word. Alternatively, word recognition can be performed using phoneme information that constitutes a word. The storage medium 3 includes difference information for phonemes, and the information storage unit 6 includes phoneme-based speech standard information and vocabulary information in which phonemes are combined. The sentence unit is a combination of multiple word recognitions.

【００３５】標準音声情報との差分情報の登録は、次の
ようにして行う。利用者に音素情報を含む適用語彙を発
声してもらい、これを例えばサービス適用会社等が、利
用者の発声内容をＤＡＴ（デジタルオーディオテープ）
等の高性能な録音装置に収録する。このＤＡＴテープを
サービス適用会社等が、音声処理し差分情報を求めて、
記憶媒体３の差分情報格納手段１０に書き込む。また電
話等を利用してもよい構成とする。現状の電話で利用で
きる音声帯域は数十Hzから３．４ｋHz程度であるが、こ
の程度の帯域の情報でも、音声認識は可能な情報を含ん
でいる。しかし各種のノイズ成分も多く含んでいるた
め、補正処理を行っても、あまり高音質な声の収集は期
待できない。ＩＳＤＮ（サービス総合デジタル網）を利
用したデジタル電話等が普及すれば、伝送できる情報量
が飛躍的に増大し、広い帯域を使用できるため、家庭内
電話からの高音質の声の収集も可能になる。また家庭内
の録音機を用いて声を収集する方法も有効である。Registration of the difference information from the standard voice information is performed as follows. Ask the user to utter an applicable vocabulary including phoneme information, and for example, a service application company or the like may use the DAT (digital audio tape) to describe the user's utterance content.
Record on a high-performance recording device such as. A service application company or the like processes the DAT tape by voice processing to obtain difference information,
Write to the difference information storage means 10 of the storage medium 3. In addition, it is possible to use a telephone or the like. The current voice band that can be used in a telephone is from several tens of Hz to 3.4 kHz, but even information in this band includes information that allows voice recognition. However, since various noise components are also included, even if correction processing is performed, it is not possible to expect to collect voices with high sound quality. If digital telephones using ISDN (Integrated Services Digital Network) become widespread, the amount of information that can be transmitted will dramatically increase and a wide band can be used, so it will be possible to collect high-quality voice from home telephones. Become. It is also effective to collect voices using a home recorder.

【００３６】着脱可能な記憶媒体３は、次のような媒体
で構成される。磁気カード、光カード、ＩＣメモリカー
ド、ＩＣカードのようなカード状の記憶媒体３、ＣＤ−
ＲＯＭ、光磁気ディスクのような円盤状の記憶媒体３が
ある。例えば、磁気カードや、光カード、ＣＤ−ＲＯＭ
等の格納手段しか有しない記憶媒体３においては、差分
情報格納手段１０の内容を、情報制御手段７の中に有す
るメモリに読み込み、認識手段５が参照する構成とす
る。また前記のような非接触型の記憶媒体３も存在す
る。電磁誘導や電波、光等を介した通信形態をとって、
これらに記憶された差分情報格納手段１０内の情報を獲
得する。The removable storage medium 3 is composed of the following medium. Card-shaped storage medium 3 such as a magnetic card, optical card, IC memory card, IC card, CD-
There is a disk-shaped storage medium 3 such as a ROM or a magneto-optical disk. For example, magnetic card, optical card, CD-ROM
In the storage medium 3 having only storage means such as the above, the content of the difference information storage means 10 is read into the memory included in the information control means 7, and the recognition means 5 refers to it. There is also a non-contact type storage medium 3 as described above. Taking the form of communication via electromagnetic induction, radio waves, light, etc.,
The information stored in the difference information storage means 10 is acquired.

【００３７】一方ＩＣメモリカード、ＩＣカードのよう
に格納手段以外に音声認識部２がメモリ手段として使用
できる記憶媒体３もある。このような場合、情報制御手
段７は、記憶媒体３との通信機能を有し、差分情報格納
手段１０のメモリ手段を使用して音声認識を行うように
構成することもできる。On the other hand, there is a storage medium 3 such as an IC memory card or an IC card, in which the voice recognition unit 2 can be used as a memory means in addition to the storage means. In such a case, the information control means 7 has a communication function with the storage medium 3 and can be configured to perform voice recognition using the memory means of the difference information storage means 10.

【００３８】差分情報の例としては以下の項目がある。
例えば、音声区間が短めであるという差分情報が格納さ
れている場合がある。このような例は、各サンプリング
時間での標準値との差情報、発声区間の長短情報、全体
及び部分的な発声パワー量の大小情報、語頭、語尾の発
声の強弱の特徴情報、声の高低情報、各サンプリング値
とのマッチングするときの時間設定の設定パス情報等が
ある。これらの情報を差分情報の中から獲得して、利用
者の特徴を知った上で音声認識を行うことによって、音
声認識率を上昇させるものである。The following items are examples of the difference information.
For example, there is a case where difference information that the voice section is short is stored. Examples of such information include difference information from the standard value at each sampling time, voicing section length information, total or partial voicing power magnitude information, beginning and ending utterance strength feature information, and voice pitch. Information, setting path information for time setting when matching with each sampling value, and the like. The voice recognition rate is increased by acquiring these pieces of information from the difference information and performing voice recognition after knowing the characteristics of the user.

【００３９】差分情報を用いた計算方法は、以下の例が
ある。例えば標準値から離れている差分値分を、入力音
声から差し引いて、標準値とマッチングする方法、標準
値による演算の制限を差分値によって変更する方法（閾
値の変更）、標準値を差分値入力により予め補正してお
き、補正した標準値と入力音声をマッチングする方法、
標準音声情報でマッチングした後で、差分情報を用いて
さらにマッチングする方法等がある。The calculation method using the difference information includes the following examples. For example, a method of subtracting the difference value that is far from the standard value from the input voice and matching with the standard value, a method of changing the calculation limit by the standard value by the difference value (changing the threshold value), and inputting the standard value of the difference value Method in which the corrected standard value and the input voice are matched in advance,
After matching with standard voice information, there is a method of further matching using difference information.

【００４０】また認識率とは、従来例の中の定義通り、
閾値内の入力に対して音声認識処理を行った中で、正し
い語彙を選択する確率である。発声が小さすぎる等で、
閾値以外の入力に対しては、認識率の対象にはなってい
ない。しかし利用者から見れば、入力が小さすぎる等で
認識できなかった時も、認識誤りが発生したと考えられ
やすい。従って発声したうちで（声が小さくても）、正
しい認識結果が得られる確率という広義な意味の認識率
を上昇させる必要もある。The recognition rate is as defined in the conventional example,
This is the probability of selecting the correct vocabulary during the speech recognition process for inputs within the threshold. The vocalization is too small, etc.
Inputs other than the threshold are not included in the recognition rate. However, from the user's perspective, it is likely that a recognition error has occurred even when the input is too small to be recognized. Therefore, it is also necessary to increase the recognition rate in a broad sense, that is, the probability that a correct recognition result can be obtained while uttering a voice (even if the voice is small).

【００４１】記憶媒体３の使用を開始するに当たって
は、情報制御手段７の記憶媒体３の正当性を確認する必
要がある。磁気カード等の記憶媒体３では、その記憶媒
体３に格納されている差分情報以外に誤り検出、誤り訂
正等の検査情報と登録番号情報を格納し、その各情報を
情報制御手段７が読みだして、差分情報と検査情報等の
各情報の関連性を確認し、正当性を確認する。必要に応
じて各情報を暗号化する手段を用いて、各情報を格納す
る構成としても良い。When the use of the storage medium 3 is started, it is necessary to confirm the validity of the storage medium 3 of the information control means 7. In the storage medium 3 such as a magnetic card, in addition to the difference information stored in the storage medium 3, inspection information such as error detection and error correction and registration number information are stored, and the information control means 7 reads out the respective information. Then, the relevance of each information such as the difference information and the inspection information is confirmed, and the validity is confirmed. It may be configured to store each information by using a means for encrypting each information as needed.

【００４２】一方記憶媒体３をＩＣカードで構成した場
合には、ＩＣカードの機能（暗証番号等の該当パスワー
ドが照合されない限り内部は参照できない）を利用する
正当性確認手段とする。ＩＣカードの場合は、利用者の
正当性を暗証番号等の入力により行うことが可能である
が、同様にして音声認識部２の正当性を確認することも
可能である。ＩＣカード内に利用可能な音声認識部２及
び情報処理装置１の型番情報を保有し、許された型番情
報を得た時のみ利用可能とすることもできる。On the other hand, when the storage medium 3 is composed of an IC card, the function of the IC card (the inside cannot be referred unless the corresponding password such as the personal identification number is collated) is used as the validity confirmation means. In the case of an IC card, the validity of the user can be confirmed by inputting a personal identification number or the like, but it is also possible to confirm the validity of the voice recognition unit 2 in the same manner. It is also possible to hold the model number information of the voice recognition unit 2 and the information processing apparatus 1 that can be used in the IC card and use it only when the permitted model number information is obtained.

【００４３】さらにＩＣカード側の正当性を確認するた
めに、ＩＣカードに必要情報を与え、その情報によって
演算を行い、その結果を情報制御手段７に返す構成とす
る。一方情報制御手段７側でも演算を行って、ＩＣカー
ド側から得られた情報と比較を行うことにより、ＩＣカ
ードの正当性を確認することができる。このＩＣカード
に与える情報、演算手段はその都度変更する構成をとれ
ば、不正使用が行いにくい構成となり、高セキュリティ
が要求される用途においても、有効な手段となりうる。Further, in order to confirm the legitimacy of the IC card side, necessary information is given to the IC card, a calculation is performed based on the information, and the result is returned to the information control means 7. On the other hand, it is possible to confirm the validity of the IC card by performing a calculation also on the information control means 7 side and comparing it with the information obtained from the IC card side. If the information given to this IC card and the calculation means are changed each time, it becomes difficult to be illegally used, and it can be an effective means even in applications requiring high security.

【００４４】利用内容にもよるが、特にセキュリティの
必要性を伴わない内容に関しては、この差分情報の利用
にあたって、格納されている情報が内容を他人に複写さ
れても、利用者個人の声の情報であるから、他人にはあ
まり価値がないものである。一般的に使用するものにお
いては、この差分情報はあくまで利用者の便宜をはかる
目的を達成するものである。また差分情報は、その利用
者のみに有効なものであるため、その記憶媒体３が正し
いものかだけを判定する機能を持てば良い。しかし高セ
キュリティを要するサービスに関するものはこの限りで
はない。Depending on the contents of use, especially for contents that do not require security, when using the difference information, even if the stored information is copied to another person, the voice of the individual user Because it is information, it is of little value to others. In the commonly used one, this difference information serves only the purpose for the convenience of the user. Further, since the difference information is valid only for the user, it only has to have a function of determining whether the storage medium 3 is correct. However, this does not apply to services that require high security.

【００４５】周囲雑音等の情報が混在して発声の区間を
誤った場合は、正しく音声認識が行われない可能性があ
る。そのために入力手段４は、周囲雑音レベルを定期的
に獲得し、入力レベルの補正を行う機能を有し、また利
用者に対して、少し大きめの声で発声を希望する等の旨
を表示手段や拡声手段等による出力手段（図示せず）に
より出力する。逆に前回発声が大きすぎて、適した語彙
とマッチングできず該当語彙が選択できなかった場合
は、入力レベルの利得を調節し、少し小さな声での発声
を希望する旨の出力を行う。連続して複数の語彙の発声
を行うような場合では、上記のような利得の調整は有効
である。When information such as ambient noise is mixed and the utterance section is erroneous, the voice recognition may not be performed correctly. Therefore, the input means 4 has a function of periodically acquiring the ambient noise level and correcting the input level, and also displays to the user a message that the user wants to speak with a slightly louder voice. Or an output unit (not shown) such as a loudspeaker. On the other hand, if the previous utterance was too loud and could not be matched with a suitable vocabulary and the corresponding vocabulary could not be selected, the gain of the input level is adjusted and an output indicating that a little voice is desired is output. In the case of uttering a plurality of vocabularies continuously, the above gain adjustment is effective.

【００４６】適用サービスの内容によって、発声語彙は
メニューの中から選択させる形をとるか、入力促進メッ
セージだけを示して選択語彙を表示しない形をとるかが
決められる。また認識語彙は情報格納手段６という形
で、認識手段５とは分離した構成をとっているので、情
報格納手段６を各種メモリで構成することができる。い
ろいろな語彙を納めたメモリを構成することにより、用
途に応じた語彙情報を選択して用いることによって、多
種多様な情報処理装置１を提供することができる。Depending on the content of the applied service, it is determined whether the vocal vocabulary is in the form of being selected from the menu or in the form of showing only the input prompting message and not displaying the selected vocabulary. Further, since the recognition vocabulary is in the form of the information storage means 6 and is separated from the recognition means 5, the information storage means 6 can be composed of various memories. By configuring a memory that stores various vocabularies, it is possible to provide a wide variety of information processing devices 1 by selecting and using vocabulary information according to the purpose.

【００４７】さらに使用形態によっては、少人数（家族
等）の音声の差分情報をとり、家庭内で用いる情報処理
装置の音声認識に適用することもできる。この記憶媒体
３に格納する音声情報は、個人とは限らず、ある特定の
グループであっても良い。このような場合、複数の人の
差分情報格納手段１０を、人数分だけ用意する構成をと
る場合、複数の人を平均し、総合した差分情報格納手段
１０とする構成がある。後者は、前者に比べて認識率が
低くなる傾向が予想されるが、家族等の場合には、互い
に発声の仕方、抑揚が似ている点があるため、この特徴
を差分情報に取り入れた音声認識を行う情報処理装置１
を構成することができる。Further, depending on the form of use, the difference information of the voices of a small number of people (family etc.) can be obtained and applied to the voice recognition of the information processing apparatus used at home. The voice information stored in the storage medium 3 is not limited to an individual, and may be a specific group. In such a case, when the difference information storage means 10 for a plurality of persons is prepared for the number of persons, there is a configuration in which the plurality of persons are averaged and the difference information storage means 10 is integrated. The latter is expected to have a lower recognition rate than the former, but in the case of family members, etc., there are similar points in vocalization and intonation. Information processing device 1 for recognition
Can be configured.

【００４８】本発明は、音声入力を用い、音声の中に含
まれる個人の特徴を取り入れた不特定話者音声認識を行
うことで、利用者を限定しない機器でも高い認識率を得
られるため、各種情報機器、家電製品など幅広い機器の
音声認識に有効な装置を提供することができる。According to the present invention, the voice recognition is performed by using the voice input and the feature of the individual contained in the voice is taken into account, so that a high recognition rate can be obtained even in the device which does not limit the user. It is possible to provide a device effective for speech recognition of various devices such as various information devices and home electric appliances.

【００４９】（実施例２）以下本発明の第２の実施例に
ついて説明する。構成は第１の実施例と同一であるが、
記憶媒体３が着脱可能でなく、固定である例について説
明する。固定の手段には情報処理装置１にはじめから固
定してある場合と、利用者が目的に応じて装着した後
は、解体修理でもしない限り、取り外しが行われないよ
うな固定の仕方をする場合の２例について説明する。(Second Embodiment) A second embodiment of the present invention will be described below. The configuration is the same as that of the first embodiment,
An example in which the storage medium 3 is fixed, not removable, will be described. The fixing means is fixed to the information processing apparatus 1 from the beginning, or is fixed so that the user does not remove it after the user has installed it according to the purpose unless it is disassembled and repaired. Two examples will be described.

【００５０】まず第１の最初から固定されている場合に
ついて説明する。情報格納手段６をＩＣメモリカード等
の記憶手段で実現する。次にサービス提供者が利用者の
音声情報を収録し、差分情報を記憶媒体３に格納する。
この両方の情報を納めた媒体をサービス提供者が、利用
者に送り、利用者がその媒体をもっている人に限り情報
処理装置１の利用が行えるような構成とすることができ
る。First, the case of being fixed from the first beginning will be described. The information storage means 6 is realized by a storage means such as an IC memory card. Next, the service provider records the voice information of the user and stores the difference information in the storage medium 3.
It is possible to adopt a configuration in which the service provider sends a medium containing both of these pieces of information to the user, and only the person who has the medium can use the information processing apparatus 1.

【００５１】この時の音声情報は、単語情報に限らず、
子音や母音の音素の差分情報を格納すれば、標準音声情
報にどのような単語情報があろうと、それらの単語に含
まれる子音、母音の音素情報に照らし合わせることによ
って有効な音声認識を行うことができる。もちろん標準
音声情報に音素情報を用い、各音素情報との直接の差分
情報を持つように構成しても良い。The voice information at this time is not limited to word information,
Storing difference information between consonant and vowel phonemes enables effective speech recognition by comparing the phoneme information of consonants and vowels contained in those words, no matter what word information is included in the standard speech information. You can Of course, phoneme information may be used as the standard speech information, and the information may be configured to have direct difference information with respect to each phoneme information.

【００５２】標準用の音声情報も、サービスの種別がか
われば、利用発声語彙も異なるので、異なった標準音声
情報を用意する必要がある。しかしながら差分情報は、
サービスの種類が異なっても変更しなくてもよいものを
構成できる。ここでの例は音声情報を含む媒体とその他
に部分が分離されている場合について説明した。しかし
これは製造、販売会社内での形態で、実際利用者にわた
る場合では、情報処理装置１にはじめから記憶媒体３の
固定する構成とする。利用者の手に渡る段階では、音声
を登録せずとも、個人の特徴を反映した音声認識を行う
情報処理装置１を構成することができる。Since the voice information for standard use also has a different vocabulary for use depending on the type of service, it is necessary to prepare different standard voice information. However, the difference information is
It is possible to configure services that do not need to be changed even if the types of services are different. In this example, the case where the medium containing the audio information and the other part are separated has been described. However, this is a form within the manufacturing and sales company, and in the case where it actually covers users, the storage medium 3 is fixed to the information processing device 1 from the beginning. At the stage of reaching the user's hand, it is possible to configure the information processing apparatus 1 that performs voice recognition reflecting individual characteristics without registering voice.

【００５３】次に第２の利用者が記憶媒体３に固定する
例について説明する。従来例の中で説明した家電製品等
で不特定話者認識を行うような機器を想定する。通常の
場合は、はじめから準備されている標準用の音声情報で
認識可能である。しかし声帯に損傷がある人や、老人で
歯がしっかりしていないため、音声認識が行いにくい特
定の人に対しても、障害がない人と同様なサービスを提
供できれば好ましい。Next, an example in which the second user fixes the storage medium 3 will be described. Assume a device that recognizes an unspecified speaker, such as a home electric appliance described in the conventional example. In the normal case, it can be recognized by the standard voice information prepared from the beginning. However, it is preferable to be able to provide a service similar to that of a person without a disability to a person who has a damaged vocal cord or a specific person who has difficulty in voice recognition because he / she has old teeth and does not have solid teeth.

【００５４】このような場合、本体機器のオプションと
して、その人の差分情報を格納した記憶媒体３を取り付
けることにより、平均的な発声をする人と同様のサービ
スの提供できる情報処理装置を構成することができる。In such a case, by attaching the storage medium 3 storing the difference information of the person as an option of the main body device, an information processing apparatus capable of providing the same service as a person who makes an average voice is constructed. be able to.

【００５５】サービスが異なり発声語彙が異なるような
情報処理装置にも、音声認識装置の記憶媒体３を複数個
用意し、基本情報となる差分情報格納手段１０の内容を
複写して各装置に取り付ける。取り付けたどの装置でも
音声認識が利用できるため、各装置毎に音声を登録する
特定話者音声認識装置に比べて、使い勝手はかなり良
い。For information processing devices having different services and different vocabulary, a plurality of storage media 3 of the voice recognition device are prepared, and the contents of the difference information storage means 10 as basic information are copied and attached to each device. .. Since voice recognition can be used in any attached device, it is much more convenient to use than a specific speaker voice recognition device that registers voices for each device.

【００５６】特定話者認識を用いる場合は、登録作業を
伴うという不都合な点もあるが、家族等の複数の人数の
音声認識には向かないので、あくまで利用者一人に対す
るサービスとなり、誰もが使用できる機器に採用しよう
とすると、割高になる。従って本発明の構成をとれば、
複数の人の音声情報に答えられるため、家族単位で購入
するような情報処理装置として使用することができる。When the specific speaker recognition is used, there is an inconvenience that registration work is involved, but since it is not suitable for voice recognition of a plurality of people such as family members, it is a service for only one user, and everyone can use it. If you try to use it in a usable device, it will be expensive. Therefore, with the configuration of the present invention,
Since the voice information of a plurality of persons can be answered, it can be used as an information processing apparatus for purchasing in units of family.

【００５７】以上のように本実施例によれば、差分情報
を含む記憶媒体３を固定した構成としても、不特定話者
音声認識の特徴を損なうことなく、利用者に便利でかつ
認識率が高い情報処理装置を提供することができる。As described above, according to this embodiment, even if the storage medium 3 containing the difference information is fixed, it is convenient for the user and has a high recognition rate without impairing the characteristics of the unspecified speaker voice recognition. An expensive information processing device can be provided.

【００５８】（実施例３）以下本発明の第３の実施例に
ついて、図面を参照しながら説明する。図２は、第３の
実施例における情報処理装置の構成を示すものである。
図２に示すように構成要素として、音声認識部２、記憶
媒体３、入力手段４、認識手段５、情報格納手段６、情
報制御手段７、差分情報格納手段１０は第１の実施例と
同一である。８は出力手段である。(Embodiment 3) A third embodiment of the present invention will be described below with reference to the drawings. FIG. 2 shows the configuration of the information processing apparatus in the third embodiment.
As shown in FIG. 2, as the components, the voice recognition unit 2, the storage medium 3, the input unit 4, the recognition unit 5, the information storage unit 6, the information control unit 7, and the difference information storage unit 10 are the same as those in the first embodiment. Is. Reference numeral 8 is an output means.

【００５９】以上のように構成された情報処理装置１に
ついて、その構成要素のお互いの関連動作を説明する。
まず利用者が記憶媒体３を情報処理装置１に挿入する。
情報処理装置１は記憶媒体３が挿入されたことを検知
し、情報制御手段７が、結合された記憶媒体３が利用可
能な媒体か否かを判定する。利用可能と判定すれば、記
憶媒体３内の差分情報を差分情報格納手段１０から獲得
する。With respect to the information processing apparatus 1 configured as described above, the mutual operation of the constituent elements will be described.
First, the user inserts the storage medium 3 into the information processing device 1.
The information processing device 1 detects that the storage medium 3 has been inserted, and the information control means 7 determines whether or not the combined storage medium 3 is an available medium. If it is determined that it is available, the difference information in the storage medium 3 is acquired from the difference information storage means 10.

【００６０】次に出力手段８から、利用者に対しサービ
スを開始するために必要な情報の入力を促進するメッセ
ージ等を出力する。利用者は、メッセージ出力内容を頼
りにして、音声を用いて入力を行う。入力手段４から入
力された音声は、情報格納手段６内に格納されている音
声情報と、情報制御手段７に獲得された差分情報を元に
マッチングされて、音声認識が行われる。音声認識を行
い、該当情報を音声信号から適当な数字や文字情報等の
情報に変換して、各種の情報処理を行う。出力手段８
は、表示装置や音声出力装置を単独または組み合わせて
構成し、入力依頼を出力するだけでなく、各種のサービ
スを適用するときにも用いられる。Next, the output means 8 outputs a message or the like for prompting the user to input information necessary for starting the service. The user relies on the output content of the message to input using voice. The voice input from the input unit 4 is matched based on the voice information stored in the information storage unit 6 and the difference information acquired by the information control unit 7, and voice recognition is performed. Various information processing is performed by performing voice recognition, converting the relevant information from the voice signal into information such as appropriate numeral and character information. Output means 8
Is used not only to output an input request but also to apply various services by configuring a display device and a voice output device individually or in combination.

【００６１】この情報制御手段７は、差分情報を獲得し
たときに、その獲得した情報に従って、利用者に対して
メッセージを出力する。例えば、音声区間が短めである
という差分情報が格納されている場合、「ゆっくりと発
声してください」等のメッセージを出力する。このよう
な例は、発声区間の長短、発声パワー量の大小、語頭、
語尾の発声の強弱の特徴、声の高低等がある。When the information control means 7 acquires the difference information, it outputs a message to the user according to the acquired information. For example, when the difference information that the voice section is short is stored, a message such as "Speak slowly" is output. Examples of this are the length of the vocalization section, the amount of vocalization power, the beginning of the word,
There are features such as the strength of the ending utterance and the pitch of the voice.

【００６２】これらの情報を差分情報の中から獲得し、
利用者へのコメントメッセージとして出力手段８から出
力することによって、利用者が意識して発声すれば、標
準音声との差を少しでも少なくすることができ、音声認
識率の向上をはかることができる。差分情報は、標準値
との差を示すものであるが、差分値で補正した値よりも
少しでも標準値に近い発声を行う付加情報として用いる
ことで、誤る確率を少しでも低下させる効果をもつ。These pieces of information are obtained from the difference information,
By outputting the comment message to the user from the output means 8, if the user consciously speaks, the difference from the standard voice can be reduced as much as possible, and the voice recognition rate can be improved. .. The difference information indicates the difference from the standard value, but it has the effect of reducing the probability of error as much as possible by using it as additional information for utterance that is closer to the standard value than the value corrected by the difference value. ..

【００６３】（実施例４）以下本発明の第４の実施例に
ついて、図面を参照しながら説明する。図２は、第４の
実施例における情報処理装置の構成を示すものである。
図２に示すように構成要素として、音声認識部２、記憶
媒体３、認識手段５、情報格納手段６、情報制御手段
７、差分情報格納手段１０は第１の実施例と同一であ
る。８は出力手段である。また４は入力手段であるが、
入力手段４内に、入力利得変更手段（図示せず）を有す
る。(Embodiment 4) A fourth embodiment of the present invention will be described below with reference to the drawings. FIG. 2 shows the configuration of the information processing apparatus in the fourth embodiment.
As shown in FIG. 2, the voice recognition unit 2, the storage medium 3, the recognition unit 5, the information storage unit 6, the information control unit 7, and the difference information storage unit 10 are the same as those of the first embodiment. Reference numeral 8 is an output means. 4 is an input means,
The input unit 4 has an input gain changing unit (not shown).

【００６４】以上のように構成された情報処理装置１に
ついて、その構成要素のお互いの関連動作を説明する。
情報制御手段７が、記憶媒体３が利用可能な媒体か否か
を判定し、利用可能と判定すれば、記憶媒体３内の差分
情報を差分情報格納手段１０から獲得する。With respect to the information processing apparatus 1 configured as described above, the mutual operation of the constituent elements will be described.
The information control unit 7 determines whether or not the storage medium 3 is available, and if it is determined as available, the difference information in the storage medium 3 is acquired from the difference information storage unit 10.

【００６５】次に出力手段８から、利用者に対しサービ
スを開始するために必要な情報の入力を促進するメッセ
ージ等を出力する。入力手段４から入力されたサービス
名や、暗証番号等の入力音声は、情報格納手段６内に格
納されている音声情報と情報制御手段７に獲得された差
分情報を元にマッチングされて、音声認識が行われる。
音声認識を行い、該当情報を音声信号から適当な数字や
文字情報等の情報に変換して、各種の情報処理を行う。Next, the output means 8 outputs a message or the like for prompting the user to input information necessary for starting the service. The input voice such as the service name and the personal identification number input from the input means 4 is matched based on the voice information stored in the information storage means 6 and the difference information acquired by the information control means 7 to obtain a voice. Recognition is done.
Various information processing is performed by performing voice recognition, converting the relevant information from the voice signal into information such as appropriate numeral and character information.

【００６６】この情報制御手段７は、差分情報を獲得し
たときに、その獲得した情報に従って、入力利得を変更
する要求を入力利得変更手段に発する。例えば、音声パ
ワー量が少ないという差分情報が格納されている場合、
音声入力利得を上げて、入力感度を良くする。このよう
な例は、発声区間の長短、発声パワー量の大小、語頭、
語尾の発声の強弱等がある。これらの情報を差分情報の
中から獲得し、利用者の音声入力に適した入力手段４と
するために、音声入力利得の調節を行うことによって、
標準音声との差を少しでも少なくすることができ、音声
認識率の向上をはかることができる。差分情報は、標準
値との差を示すものであるが、差分値で補正した値より
も少しでも標準値に近い発声を行う付加情報として用い
ることで、第３の実施例と同様に誤る確率を少しでも低
下させる効果をもつ。When the information control means 7 acquires the difference information, it issues a request for changing the input gain to the input gain changing means according to the acquired information. For example, when the difference information that the audio power amount is small is stored,
Increase audio input gain to improve input sensitivity. Examples of this are the length of the vocalization section, the amount of vocalization power, the beginning of the word,
There are strong and weak utterances of endings. By obtaining these pieces of information from the difference information and adjusting the voice input gain in order to make the input means 4 suitable for the user's voice input,
The difference from the standard voice can be reduced as much as possible, and the voice recognition rate can be improved. The difference information indicates the difference from the standard value. However, by using the difference information as additional information for utterance that is closer to the standard value than the value corrected by the difference value, the probability of error is the same as in the third embodiment. Has the effect of lowering even a little.

【００６７】もちろん第３の実施例で説明したように、
出力手段８から、差分情報を用いた情報を出力する手段
を有し、出力手段８と、入力利得変更手段の両方を有す
れば、認識率はさらに上昇する。Of course, as described in the third embodiment,
If the output unit 8 has a unit for outputting information using the difference information and has both the output unit 8 and the input gain changing unit, the recognition rate is further increased.

【００６８】（実施例５）以下本発明の第５の実施例に
ついて、図面を参照しながら説明する。図３は、第５の
実施例における情報処理装置の構成を示すものである。
図３に示すように構成要素として、音声認識部２、記憶
媒体３、入力手段４、認識手段５、情報格納手段６、情
報制御手段７、差分情報格納手段１０は第１の実施例と
同一である。８は出力手段で、９は差分情報更新手段で
ある。(Embodiment 5) A fifth embodiment of the present invention will be described below with reference to the drawings. FIG. 3 shows the configuration of the information processing apparatus in the fifth embodiment.
As shown in FIG. 3, as the components, the voice recognition unit 2, the storage medium 3, the input unit 4, the recognition unit 5, the information storage unit 6, the information control unit 7, and the difference information storage unit 10 are the same as those in the first embodiment. Is. Reference numeral 8 is an output means, and 9 is a difference information updating means.

【００６９】以上のように構成された情報処理装置１に
ついて、その構成要素のお互いの関連動作を説明する。
情報制御手段７が、記憶媒体３が利用可能な媒体か否か
を判定し、利用可能と判定すれば、記憶媒体３内の差分
情報を差分情報格納手段１０から獲得する。Regarding the information processing apparatus 1 configured as described above, the mutual operation of the constituent elements will be described.
The information control unit 7 determines whether or not the storage medium 3 is available, and if it is determined as available, the difference information in the storage medium 3 is acquired from the difference information storage unit 10.

【００７０】次に出力手段８から、利用者に対しサービ
スを開始するために必要な情報の入力を促進するメッセ
ージ等を出力する。入力手段４から入力されたサービス
名や、暗証番号等の入力音声は、情報格納手段６内に格
納されている音声情報と情報制御手段７に獲得された差
分情報を元にマッチングされて、音声認識が行われる。
音声認識を行い、該当情報を音声信号から適当な数字や
文字情報等の情報に変換して、各種の情報処理を行う。Next, the output means 8 outputs a message or the like for prompting the user to input information necessary for starting the service. The input voice such as the service name and the personal identification number input from the input means 4 is matched based on the voice information stored in the information storage means 6 and the difference information acquired by the information control means 7 to obtain a voice. Recognition is done.
Various information processing is performed by performing voice recognition, converting the relevant information from the voice signal into information such as appropriate numeral and character information.

【００７１】この情報制御手段７は、差分情報を獲得し
たときに、その獲得した情報に従って、第３の実施例と
同様に出力手段８に該当情報を出力する。さらにこの出
力処理を行い、音声認識を行った結果、標準音声情報と
の差分情報を差分情報更新手段９が、差分情報格納手段
１０に書き込む。When the difference information is acquired, the information control means 7 outputs the corresponding information to the output means 8 according to the acquired information, as in the third embodiment. Further, as a result of performing this output processing and performing voice recognition, the difference information with the standard voice information is written into the difference information storage unit 10 by the difference information updating unit 9.

【００７２】このように構成すると下記のような利点が
ある。記憶媒体３を新規に発行した場合、用途によって
は差分情報格納手段１０に利用者個人の差分情報が格納
されていない場合がある。また新しい語彙に対応した差
分情報が新たに必要な場合もある。このような場合、最
初は情報格納手段６内にある標準音声情報を用いて音声
認識を行う。そしてその時の標準値との差分情報を差分
情報更新手段９が、差分情報格納手段１０に書き込む。
次からは、この書き込まれた差分情報を元に、情報格納
手段６と差分情報格納手段１０の両方の情報を用いて、
音声認識を行うことができる。従って、最初１回目に個
人の特徴をつかみきれずに音声認識を誤ったとしても、
次からは個人の特徴を取り入れた音声認識を行うことが
できる。With this configuration, there are the following advantages. When the storage medium 3 is newly issued, the difference information of the user may not be stored in the difference information storage means 10 depending on the use. In addition, difference information corresponding to a new vocabulary may be newly required. In such a case, initially, the voice recognition is performed using the standard voice information stored in the information storage means 6. Then, the difference information updating means 9 writes the difference information from the standard value at that time into the difference information storage means 10.
From now on, based on the written difference information, the information of both the information storage means 6 and the difference information storage means 10 is used,
Voice recognition can be performed. Therefore, even if the voice recognition is erroneous because the individual characteristics cannot be grasped at the first time,
From now on, you can perform voice recognition that incorporates individual characteristics.

【００７３】また利用者が、長い間記憶媒体３を用いな
い場合もある。住所環境がかわったり、病気等で発声の
仕方が以前とかわる場合もある。このような場合におい
ても、最初は以前の差分情報を用いて音声認識し、音声
認識を行った差分情報を書き込む構成にすることによっ
て、より利用者の発声を確実にとらえる情報処理装置１
を構成することができる。There are also cases where the user does not use the storage medium 3 for a long time. In some cases, the address environment may change or the utterance may change due to illness. Even in such a case, the information processing apparatus 1 that more surely captures the utterance of the user by first performing voice recognition using the previous difference information and writing the difference information subjected to the voice recognition.
Can be configured.

【００７４】用途によって同じ語彙情報でも、早く発声
したり、ゆっくり発声したりすることもある。例えば０
（ぜろ）から９（きゅう）までの音声数字情報を対象に
した場合、メニュー形式の数字選択のような場合には、
１つの数字をゆっくり発声するが、電話番号のように複
数桁の数字情報を発声する場合には、各数字を短く、ま
た続けて発声する傾向がある。従って用途に応じて同じ
語彙の差分情報を複数持ったり、その都度差分情報を書
き換える機能をもてば、さらに音声認識率は上がり、利
用者の便宜も向上する。Depending on the use, even the same vocabulary information may be uttered early or slowly. Eg 0
In the case of targeting the voice and numerical information from (Zero) to 9 (Kyu), in the case of menu-type number selection,
Although one number is uttered slowly, when uttering plural-digit number information like a telephone number, each number tends to be short and continued to be uttered. Therefore, if a plurality of pieces of difference information of the same vocabulary are provided or a function of rewriting the difference information each time is provided according to the purpose, the voice recognition rate is further increased and the convenience of the user is improved.

【００７５】さらに第４の実施例と同様に入力利得変更
手段を有する構成をとり、差分情報として、例えば全体
に発声レベルが小さい傾向がある等の情報を用いて、入
力手段４の入力利得レベルを上げるとともに、出力手段
８に、大きめの声で発声することを希望する旨のメッセ
ージ等を出力する構成とすることができる。Further, as in the case of the fourth embodiment, a configuration having an input gain changing means is adopted, and as the difference information, for example, the information that the utterance level tends to be small as a whole is used, and the input gain level of the input means 4 is used. In addition to the above, the output unit 8 may be configured to output a message or the like indicating that a loud voice is desired.

【００７６】また記憶媒体３に音声認識の結果情報を格
納する構成としてもよい。セキュリティが要求されるよ
うな用途に応じては次のような処理も可能となる。何度
も誤った入力が繰り返されている場合は、不正に使用さ
れている場合や、適用不可能なサービスが選択されてい
る場合と想定できるので、サービスの中止や、パターン
マッチングの閾値をさらに厳しくする等の処理を行う必
要がある。The storage medium 3 may be configured to store the voice recognition result information. The following processing is also possible depending on the use for which security is required. If incorrect input is repeated many times, it can be assumed that it is being used illegally or that an inapplicable service has been selected.Therefore, stop the service or further set the pattern matching threshold. It is necessary to take measures such as tightening.

【００７７】音声認識誤りが頻発して、標準音声情報と
かけ離れている発声が繰り返された場合には、音声認識
の結果を格納した、過去の結果情報を参照し、記憶媒体
３を用いた情報処理装置１を用いたサービスを中止する
機能を有する。When the voice recognition error occurs frequently and the utterance which is far from the standard voice information is repeated, the past result information in which the result of the voice recognition is stored is referred to and the information stored in the storage medium 3 is used. It has a function of stopping the service using the processing device 1.

【００７８】またあまりセキュリティを要求しないよう
な用途に関しては、前述の結果情報として音声認識の誤
り訂正傾向を記憶しておき、補正するような構成とする
こともできる。例えば７（しち）の発声が１（いち）に
近いために、良く１（いち）に誤る場合には、この誤り
傾向情報を取り込み、その傾向情報から自動的に７（し
ち）を選択するように構成することができる。Further, with respect to applications in which security is not required so much, the error correction tendency of voice recognition may be stored as the above result information and corrected. For example, if the utterance of 7 (shichi) is close to 1 (1), and if it is often mistaken for 1 (1), this error tendency information is taken in and 7 (shichi) is automatically selected from the tendency information. Can be configured to.

【００７９】従って本実施例では、音声入力された音声
情報と情報格納手段６内の標準音声情報を用いて音声認
識を行い、標準との差分情報を書き込む手段を持つこと
で、利用者の発声の変化に対応できる、利便性を高めた
情報処理装置１を提供することができる。Therefore, in this embodiment, the voice recognition is performed using the voice information input by voice and the standard voice information in the information storing means 6, and the means for writing the difference information from the standard is used to make the voice of the user. It is possible to provide the information processing apparatus 1 that can cope with changes in the above and that has improved convenience.

【００８０】[0080]

【発明の効果】以上の説明から明らかなように本発明
は、入力音声を認識する認識手段と、認識時に参照する
標準音声情報を格納する情報格納手段と、利用者個人の
発声の特徴を示す差分情報を格納する差分情報格納手段
を有し、音声認識の際に、標準音声情報と差分情報から
認識手段により認識するように制御する制御手段を備え
ることにより、音声の中に含まれる個人の特徴を用いた
音声認識を行うことができるため、利用者の発声の特徴
を生かした音声認識率の高い情報処理装置を実現するも
のである。また、差分情報を着脱可能な記憶装置とする
ことにより、秘密保持の面でも優れた情報処理装置を提
供することができる。As is apparent from the above description, the present invention shows the recognition means for recognizing the input voice, the information storage means for storing the standard voice information to be referred at the time of recognition, and the characteristics of the user's individual utterance. By including a difference information storage unit that stores difference information, and a control unit that controls the recognition unit to recognize it from the standard voice information and the difference information when performing voice recognition Since it is possible to perform voice recognition using features, an information processing device having a high voice recognition rate that makes use of the features of the user's utterance is realized. Further, by using the removable storage device for the difference information, it is possible to provide an information processing device which is excellent in terms of confidentiality.

[Brief description of drawings]

【図１】本発明の第１の実施例の情報処理装置の構成を
示すブロック図FIG. 1 is a block diagram showing a configuration of an information processing apparatus according to a first embodiment of the present invention.

【図２】本発明の第３，第４の実施例の情報処理装置の
構成を示すブロック図FIG. 2 is a block diagram showing a configuration of an information processing device according to third and fourth embodiments of the present invention.

【図３】本発明の第５，第６の実施例の情報処理装置の
構成を示すブロック図FIG. 3 is a block diagram showing a configuration of an information processing device according to fifth and sixth embodiments of the present invention.

【図４】従来の情報処理装置の構成を示すブロック図FIG. 4 is a block diagram showing a configuration of a conventional information processing device.

[Explanation of symbols]

１情報処理装置２音声認識部３記憶媒体４入力手段５認識手段６情報格納手段７情報制御手段１０差分情報格納手段 1 Information Processing Device 2 Voice Recognition Unit 3 Storage Medium 4 Input Means 5 Recognition Means 6 Information Storage Means 7 Information Control Means 10 Difference Information Storage Means

Claims

[Claims]

1. An information storage in which input means for inputting voice, recognition means for recognizing the content of the voice input by the input means, and voice information referred to by the recognition means at the time of voice recognition are stored. Means, and difference information storage means for storing difference information between the standard voice information stored in the information storage means and the voice information peculiar to each individual, and recognizing the content of the voice input by the input means. In this case, the recognizing means has a control means for controlling the voice recognition by referring to the standard voice information stored in the information storage means and the difference information stored in the difference information storage means. Information processing device.

2. The information processing apparatus according to claim 1, wherein the difference information storage means is a removable storage device.

3. An input gain changing means for changing the input level of the input means, wherein the control means controls the input level to a predetermined level according to the difference information stored in the difference information storage means. The information processing apparatus according to claim 1, further comprising:

4. A difference information updating means for changing the difference information stored in the difference information storing means, wherein the difference information updating means is stored in the voice input inputted to the input means and the information storing means. The information processing apparatus according to claim 1, wherein the difference information is updated from standard audio information.

5. An input gain changing means for changing the input level of the input means is provided, and the control means controls the input level to a predetermined level according to the difference information stored in the difference information storage means. 5. The information processing apparatus according to claim 4, wherein the difference information updating means then updates the difference information from the voice input input to the input means and the standard voice information stored in the information storage means.

6. A difference information updating means for changing the difference information stored in the difference information storing means, wherein the difference information updating means is stored in the voice input inputted to the input means and the information storing means. The information processing apparatus according to claim 1, wherein error tendency information peculiar to each individual is added to standard voice information in addition to the difference information.

7. An output means for presenting a message to the user, wherein the control means presents the message to the output means in accordance with the difference information stored in the difference information storage means. The information processing device described.