JP3090344B2

JP3090344B2 - Voice recognition device

Info

Publication number: JP3090344B2
Application number: JP03152940A
Authority: JP
Inventors: 洋一貞本; 洋一竹林; 宏之坪井; 博史金澤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1991-06-25
Filing date: 1991-06-25
Publication date: 2000-09-18
Anticipated expiration: 2015-09-18
Also published as: JPH052399A

Abstract

PURPOSE:To obtain the speech recognition device which is adapted to a momentary change in environment and has a high recognition rate. CONSTITUTION:The speech recognition device has voice recognition dictionaries 5-1-5-n generated by superposing plural kinds of noise and is equipped with an input part 1 for inputting a voice, an analysis part 2 which analyzes the inputted voice, a storage part 6 stored previously with time information on, for example, time length, time, a period, etc., regarding the generation of a noise, selection parts 6 and 4 which selects the voice recognition dictionary generated by superposing the noise corresponding to the current time from a timer 61 according to the information in the storage part, and a recognition part 3 which collates the analysis result of the analysis part with the voice recognition dictionary selected by the selection part and recognizes the inputted voice.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、荷物の区分け、券売機
等に用いられる音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus used for sorting luggage, ticket vending machines, and the like.

【０００２】[0002]

【従来の技術】近年、マンマシン・インタ―フェ―スと
して音声認識・合成の技術発展が目ざましく、荷物の区
分け、券売機等に音声認識装置が使用されている。しか
し、現在の音声認識装置の多くは、使用する環境のノイ
ズが多くなると認識率が低下してしまう。そこでこのノ
イズを人工的に重畳した学習用音声デ―タより作成した
音声認識辞書を用いるという手法、あるいは、ノイズ除
去の諸手法を用いて対応されていた。2. Description of the Related Art In recent years, the technology of speech recognition / synthesis has been remarkably developed as a man-machine interface, and speech recognition devices have been used for baggage sorting, ticket vending machines, and the like. However, in many current speech recognition devices, the recognition rate decreases when the noise in the environment in which the device is used increases. Therefore, a method of using a speech recognition dictionary created from learning speech data in which the noise is artificially superimposed, or a method of removing noise has been used.

【０００３】しかしながら、ノイズを人工的に重畳した
学習用音声デ―タより作成した音声認識辞書を用いる場
合、ノイズの種類を限定しなければ認識性能は向上しな
いにも拘らず、周囲の環境は一様ではなく常に変化する
ため、ノイズを限定して音声認識辞書を用いることは困
難であった。[0003] However, when using a speech recognition dictionary created from learning speech data in which noise is artificially superimposed, the surrounding environment cannot be improved unless the type of noise is limited. Since it is not uniform and constantly changes, it is difficult to use the speech recognition dictionary with limiting noise.

【０００４】さらに、ノイズ除去を行う方法を用いる場
合にも、一般にノイズは非定常・有色であるので、ノイ
ズの音声波としての特徴やそのノイズの発生のタイミン
グをとらえることが難しい。したがって、周囲の環境が
変化すると、発生したノイズに適したノイズ除去を十分
に行うことができなかった。Further, even when a method of removing noise is used, since noise is generally non-stationary and colored, it is difficult to grasp the characteristics of noise as a sound wave and the timing of the generation of the noise. Therefore, when the surrounding environment changes, noise removal suitable for the generated noise cannot be sufficiently performed.

【０００５】[0005]

【発明が解決しようとする課題】このように従来の音声
認識装置では、上述したノイズ対策を行っていても、周
囲の環境が時事刻々変化するノイズに対して的確に対応
することができず、認識率を低下させる要因となってい
た。As described above, in the conventional speech recognition apparatus, even if the above-described noise countermeasures are taken, it is not possible to accurately cope with noise whose surrounding environment changes every moment. This was a factor that reduced the recognition rate.

【０００６】本発明はこのような事情を考慮してなされ
たもので、その目的とするところは、周囲の環境の変化
に伴って時事刻々変化するノイズに対しても充分に認識
率の向上を図ることのできる実用性の高い音声認識装置
を提供することにある。The present invention has been made in view of such circumstances, and an object of the present invention is to sufficiently improve a recognition rate with respect to noise that changes momentarily with changes in the surrounding environment. An object of the present invention is to provide a highly practical voice recognition device that can be achieved.

【０００７】[0007]

【課題を解決するための手段】第一の発明に係る音声認
識装置は、複数種類のノイズを重畳してそれぞれ作成さ
れた複数の音声認識辞書を備え、音声を入力する入力部
と、入力した音声を分析する分析部と、ノイズの発生に
関する例えば時間長、時刻、周期等の時間的情報を予め
記憶しておく記憶部と、タイマ―からの現在の時刻に対
応するノイズを重畳して作成された音声認識辞書を記憶
部の情報をもとに選択する選択部と、分析部の分析結果
と選択部で選択した音声認識辞書とを照合して入力した
音声を認識する認識部とを備えたことを特徴とするもの
である。According to a first aspect of the present invention, there is provided a speech recognition apparatus including a plurality of speech recognition dictionaries created by superposing a plurality of types of noises, an input unit for inputting speech, and an input unit for inputting speech. An analysis unit that analyzes voice, a storage unit that stores in advance temporal information such as a time length, a time, and a cycle related to the occurrence of noise, and a noise that corresponds to the current time from a timer and is created. And a recognition unit that recognizes the input speech by comparing the analysis result of the analysis unit with the speech recognition dictionary selected by the selection unit. It is characterized by having.

【０００８】第二の発明に係る音声認識装置は、複数種
類のノイズデ―タを保持し、音声を入力する入力部と、
上記と同様に予め記憶しておいたノイズの発生に関する
時間的情報を用いて現在の時刻に対応するノイズデ―タ
を選択する選択部と、選択部で選択されたノイズデ―タ
を用いて入力された音声からノイズを除去し、ノイズを
除去した音声を分析する分析部と、分析部の分析結果を
音声認識辞書と照合して入力した音声を認識する認識部
とを備えたことを特徴とするものである。[0008] A speech recognition apparatus according to a second aspect of the present invention includes an input unit for holding a plurality of types of noise data and inputting speech.
A selection unit for selecting noise data corresponding to the current time using temporal information on the occurrence of noise stored in advance in the same manner as described above, and an input using the noise data selected by the selection unit. An analysis unit that removes noise from the obtained voice and analyzes the noise-removed voice, and a recognition unit that recognizes the input voice by comparing the analysis result of the analysis unit with a voice recognition dictionary. Things.

【０００９】[0009]

【作用】本発明によれば、予め記憶したノイズの発生の
時間的情報を参照して、あるノイズが発生する時刻や時
間長を得、その時点で発生するノイズと同じようなノイ
ズを重畳した音声デ―タより作成された音声認識辞書を
選択することにより、時間を追って変化する環境ノイズ
の下でも、ノイズの種類を限定した音声認識辞書を用い
ることができる。同様に、予め記憶したノイズの発生に
関する時間的情報を参照して、その時点で発生するノイ
ズに当たるノイズデ―タを選択してノイズ除去を行うこ
とにより、ノイズが時間とともに変化する環境下でも、
入力音声から適切にノイズを除去することができる。し
たがって、音声認識の性能を高めることができる。この
ようなノイズ対策を、対象音声が入力された時刻をモニ
タすることで、全自動的に行うことができる。According to the present invention, the time and the time length at which a certain noise occurs are obtained by referring to the temporal information of the noise occurrence stored in advance, and the same noise as the noise occurring at that time is superimposed. By selecting a speech recognition dictionary created from speech data, a speech recognition dictionary with limited types of noise can be used even under environmental noise that changes over time. Similarly, by referring to temporal information on the occurrence of noise stored in advance and selecting noise data corresponding to the noise generated at that time and performing noise removal, even in an environment where noise changes with time,
Noise can be appropriately removed from the input voice. Therefore, the performance of voice recognition can be improved. Such a noise countermeasure can be fully automatically performed by monitoring the time at which the target voice is input.

【００１０】[0010]

【実施例】以下、本発明を参照しながら本発明の一実施
例について説明する。 ○実施例１An embodiment of the present invention will be described below with reference to the present invention. ○ Example 1

【００１１】図１は、本発明の第１の実施例に係る音声
認識装置のブロック図である。本装置は、音声入力部
１、音響分析部２、音声認識部３、認識辞書スイッチ
４、認識辞書部５、ノイズ対策部６により構成される。FIG. 1 is a block diagram of a speech recognition apparatus according to a first embodiment of the present invention. This device comprises a voice input unit 1, a sound analysis unit 2, a voice recognition unit 3, a recognition dictionary switch 4, a recognition dictionary unit 5, and a noise countermeasure unit 6.

【００１２】音声入力部１は、マイクロホン等を通じて
電気信号に変換される入力音声を、例えば12kHz の標本
化周波数にて16bitsの量子化ビット数のデジタル信号に
変換し、音響分析部２に出力する。音響分析部２は、Ｆ
ＥＴ分析（高速フ―リエ変換による周波数分析）、ＬＰ
Ｃ（線形予測法）分析、ケプストラム分析、フィルタ分
析等の分析法のうちいずれかにより、例えば８ms毎に特
徴パラメ―タを求め、その時系列を音声認識部３に出力
する。音声認識部３では、音響分析部２から得た特徴パ
ラメ―タと認識辞書部５の認識辞書のリファレンス特徴
パラメ―タとの照合を行い、類似度を算出して、音声認
識する。例えば、音響分析部でＦＥＴ分析して得られた
周波数スペクトルパタ―ンと認識辞書中の各単語のリフ
ァレンスパタ―ンとの照合を行い、複合類似度法により
類似度を計算して、最も類似度の高い単語を認識結果と
して出力する。ここで用いる認識辞書は、後述するノイ
ズ対策部６により選択されたものである。The voice input unit 1 converts an input voice converted into an electric signal through a microphone or the like into a digital signal having a quantization bit number of 16 bits at a sampling frequency of 12 kHz, for example, and outputs the digital signal to the acoustic analysis unit 2. . The acoustic analysis unit 2
ET analysis (frequency analysis by fast Fourier transform), LP
For example, a characteristic parameter is obtained every 8 ms by any one of analysis methods such as C (linear prediction method) analysis, cepstrum analysis, and filter analysis, and the time series is output to the speech recognition unit 3. The voice recognition unit 3 compares the feature parameter obtained from the acoustic analysis unit 2 with the reference feature parameter of the recognition dictionary of the recognition dictionary unit 5, calculates the similarity, and performs voice recognition. For example, the frequency analysis pattern obtained by the FET analysis in the acoustic analysis unit is compared with the reference pattern of each word in the recognition dictionary, and the similarity is calculated by the composite similarity method to obtain the most similarity. A word with a high degree is output as a recognition result. The recognition dictionary used here is selected by the noise countermeasure unit 6 described later.

【００１３】ここで、認識辞書部５は複数の認識辞書１
〜ｎを保持している。各々の認識辞書は、本音声認識装
置が設置された環境での固有の特徴的環境音を、静かな
環境で収集された学習用音声に重畳して、パタ―ン変形
した音声デ―タより作成したものである。例えば、本装
置が学校のエレベ―タ内に設置された場合においては、
認識辞書１は、朝の登校時でエレベ―タが最も混雑して
いるときの、足音、話し声を多く含んだ環境ノイズを、
認識辞書２は、授業中でエレベ―タが比較的混まないと
きの、エレベ―タの動力音をわずかに含んだ環境ノイズ
を、それぞれ学習用音声に重畳した音声デ―タにより作
成されるものである。あるいはまた、例えば、本装置が
駅構内に設置された場合においては、認識辞書１は列車
の発着に伴う人の移動音を含む環境ノイズを、認識辞書
２はホ―ムで鳴るベル音を含む環境ノイズを、認識辞書
３は列車の動力音を含む環境ノイズを、それぞれ学習用
音声に重畳した音声デ―タにより作成されるものであ
る。なお、これらの認識辞書を、各状況の環境音と学習
用音声とが混合した音声を入力、分析した音声デ―タよ
り作成するという方法もある。Here, the recognition dictionary unit 5 includes a plurality of recognition dictionaries 1.
To n. Each recognition dictionary superimposes the characteristic environmental sound unique to the environment where the voice recognition device is installed on the learning voice collected in a quiet environment, and obtains a pattern-deformed voice data. It was created. For example, if the device is installed in a school elevator,
Recognition dictionary 1 is designed to reduce environmental noise including footsteps and speech when the elevators are most crowded at school in the morning.
The recognition dictionary 2 is created by voice data obtained by superimposing environmental noises slightly containing the power sound of the elevators on the learning voice when the elevators are relatively less crowded in class. It is. Alternatively, for example, when the present apparatus is installed in a station yard, the recognition dictionary 1 includes environmental noise including a moving sound of a person accompanying the arrival and departure of a train, and the recognition dictionary 2 includes a bell sound generated by a home. The environmental noise and the recognition dictionary 3 are created by voice data in which environmental noise including train power noise is superimposed on the learning voice. Note that there is also a method in which these recognition dictionaries are created from voice data obtained by inputting and analyzing voices in which environmental sounds in each situation and learning voices are mixed.

【００１４】ノイズ対策部６は、タイマ―（６１）と時
間情報管理テ―ブル１（６２）と対策部（６３）とから
構成される。時間情報管理テ―ブル１は、例えば表１に
示すように、各々の認識辞書が使われるべき時間の情報
（曜日、時刻、時間等）と認識辞書（辞書番号）との対
応関係を保持している。このテ―ブルは例えば、上述し
たうち後者の例では、時刻表から列車の発着の時刻を参
照して作成される。The noise countermeasure unit 6 includes a timer (61), a time information management table 1 (62), and a countermeasure unit (63). For example, as shown in Table 1, the time information management table 1 holds the correspondence between time information (day of the week, time, time, etc.) to be used by each recognition dictionary and the recognition dictionary (dictionary number). ing. This table is created by referring to the time of departure and arrival of a train from a timetable, for example, in the latter example described above.

【００１５】[0015]

【表１】 [Table 1]

【００１６】対策部（６３）の動作を図３のフロ―図に
従って説明する。まず、タイマ―（６１）より時刻をモ
ニタする（Ｓ１）。次に、その時刻が当てはまる時間帯
を時間情報管理テ―ブル１（６２）の曜日、時刻、時間
長等の項目より検索し、検索した時間帯に用いるべき認
識辞書番号を抽出する（Ｓ２）。例えば、タイマ―から
の時刻が水曜日の８：20：00であれば、表１の月〜金の
８：00：00〜８：24：59に当てはまるので、認識辞書番
号１を抽出し、タイマ―からの時刻が土曜日の９：25：
00であれば、表１の土〜日の８：00：00〜９：59：59に
当てはまるので、認識辞書番号４を抽出する。次に、抽
出した認識辞書番号の認識辞書が音声認識部３において
用いられるように、認識辞書スイッチ４を制御する（Ｓ
３）。そして、再びタイマ―からの時刻をモニタして同
様の処理を繰り返す。The operation of the countermeasure unit (63) will be described with reference to the flowchart of FIG. First, the time is monitored by the timer (61) (S1). Next, a time zone to which the time applies is searched from items such as a day of the week, a time, and a time length of the time information management table 1 (62), and a recognition dictionary number to be used for the searched time zone is extracted (S2). . For example, if the time from the timer is 8:20:00 on Wednesday, it corresponds to 8:00:00 to 8:24:59 on Monday to Friday in Table 1, so the recognition dictionary number 1 is extracted and the timer is extracted. -From 9:25 on Saturday:
If it is 00, it corresponds to 8:00 to 9:59:59 on Saturday to Sunday in Table 1, so the recognition dictionary number 4 is extracted. Next, the recognition dictionary switch 4 is controlled so that the recognition dictionary of the extracted recognition dictionary number is used in the voice recognition unit 3 (S
3). Then, the time from the timer is monitored again and the same processing is repeated.

【００１７】このように、本実施例装置においては、環
境ノイズの発生に関する時間的情報を用いて、その場の
環境を反映した環境ノイズを重畳した学習用音声デ―タ
により作成した認識辞書を選択的に使用するため、環境
情報を効果的に利用して適切なノイズ対策を行うことが
でき、認識性能が向上する。As described above, in the present embodiment, the recognition dictionary created by the learning speech data in which the environmental noise reflecting the environment of the place is superimposed using the temporal information on the occurrence of the environmental noise. Since it is selectively used, appropriate noise countermeasures can be taken by effectively using environmental information, and recognition performance is improved.

【００１８】なお、ここでの「タイマ―」は、任意のあ
る時点を基準としてある期間（例えば１ケ月、１週間、
１日等）中ある単位（秒、分、時間）で時間のカウント
をする装置であるが、これを通常の時計（場合によって
は日付や曜日つきの時計）に置き換えてもよい。 ○実施例２Note that the "timer" here means a certain period (for example, one month, one week,
Although this is a device that counts time in a certain unit (second, minute, hour) during one day or the like, it may be replaced with a normal clock (a clock with date or day of the week in some cases). ○ Example 2

【００１９】図２は、第２の実施例に係る音声認識装置
のブロック図である。この音声認識装置は、音声入力部
１、音響分析部２、音声認識部３、ノイズ除去部８、ノ
イズデ―タ部９、ノイズデ―タスイッチ１０、認識辞書
１１、ノイズ対策部７により構成される。FIG. 2 is a block diagram of a speech recognition apparatus according to the second embodiment. This speech recognition device includes a speech input unit 1, an acoustic analysis unit 2, a speech recognition unit 3, a noise removal unit 8, a noise data unit 9, a noise data switch 10, a recognition dictionary 11, and a noise countermeasure unit 7. .

【００２０】音声入力部１、音響分析部２、音声認識部
３の機能は第１の実施例と同様である。但し、音響分析
部２にあるノイズ除去部８が、入力音声に含まれる環境
ノイズを除去する。The functions of the voice input unit 1, the sound analysis unit 2, and the voice recognition unit 3 are the same as in the first embodiment. However, the noise removal unit 8 in the sound analysis unit 2 removes environmental noise included in the input voice.

【００２１】ここでは、ノイズ除去の方法として、スペ
クトルサブストラクション法と呼ばれる手法を用いた場
合について説明する。まず、認識対象となる音声が入力
されていないときに環境ノイズを取り込み、そのノイズ
のバンド周波数毎のパワ―スペクトルを求めて、ノイズ
デ―タとする。つまり、バンド周波数毎のパワ―スペク
トル列：｜ｘ₁｜｜ｘ₂｜…｜ｘ_n｜（ｎはバンド数、
｜ｘ_i｜はバンド周波数ｉのパワ―スペクトル）がノイ
ズデ―タとなる。このノイズデ―タを作成するために取
り込む環境ノイズは、本音声認識装置が設置された環境
での固有の特徴的環境音である。ノイズデ―タはノイズ
デ―タ部９に複数（１〜ｎ）保持される。例えば、本装
置が駅構内に設置されているならば、ノイズデ―タ１
は、列車の発着に伴い固定位置にあるベルから発せられ
るベル音を固定位置にあるマイクから入力してパワ―ス
ペクトル化したものにであり、ノイズデ―タ２は、列車
の発着に伴い人が移動するざわざわした音を入力してパ
ワ―スペクトル化したものである。Here, a case will be described in which a technique called a spectral subtraction method is used as a method for removing noise. First, environmental noise is taken in when no speech to be recognized is input, and a power spectrum for each band frequency of the noise is obtained as noise data. That is, a power spectrum sequence for each band frequency: | x ₁ || x ₂ | ... | x _n | (n is the number of bands,
| X _i | is the power spectrum at the band frequency i) as the noise data. The environmental noise taken in to create this noise data is a characteristic environmental sound unique to the environment in which the present voice recognition device is installed. A plurality (1 to n) of noise data is held in the noise data unit 9. For example, if this device is installed in the station yard, noise data 1
Is a power spectrum obtained by inputting a bell sound emitted from a bell at a fixed position according to the arrival and departure of a train from a microphone at a fixed position. This is a power spectrum obtained by inputting a moving noise.

【００２２】ノイズ対策部７は、タイマ―（７１）と時
間情報管理テ―ブル２（７２）と対策部（７３）とから
構成される。時間情報管理テ―ブル２は、例えば表２に
示すように、各々のノイズデ―タが使われるべき時間の
情報（曜日、時刻、時間長等）とノイズデ―タとの対応
関係を保持している。このテ―ブルは、上述した例で
は、時刻表を参照して作成される。The noise counter 7 comprises a timer (71), a time information management table 2 (72), and a counter (73). For example, as shown in Table 2, the time information management table 2 holds the correspondence between time information (day of the week, time, time length, etc.) at which each noise data is to be used and the noise data. I have. This table is created with reference to the timetable in the example described above.

【００２３】[0023]

【表２】 [Table 2]

【００２４】対策部（７３）は、第１の実施例と同様
に、タイマ―（７１）により現在の時刻をモニタし、時
間管理情報テ―ブル２（７２）からある時刻に使われる
べきノイズデ―タ番号を抽出する。例えば、現在の時刻
が木曜日の８：00：00のとき、表２の月〜金、８：00：
00〜８：00：05に対応するノイズデ―タ番号１を抽出す
る。次に、抽出したノイズデ―タ番号のノイズデ―タが
ノイズ除去部８によって用いられるように、ノイズデ―
タスイッチ１０を制御する。As in the first embodiment, the countermeasure unit (73) monitors the current time by the timer (71), and reads the noise data to be used at a certain time from the time management information table 2 (72). -Extract the data number. For example, when the current time is 8:00: 00 on Thursday, the time from Monday to Friday, 8:00:
The noise data number 1 corresponding to 00 to 8:00:05 is extracted. Next, the noise data of the extracted noise data number is used so that the noise data is used by the noise removing unit 8.
Switch 10 is controlled.

【００２５】ノイズ除去部８では、音声入力部１に入力
された音声の全てのバンド周波数のパワ―スペクトルか
ら、ノイズ対策部７で選択されたノイズデ―タの｜ｘ_i
｜（ｉ＝１，２，…，ｎ）をそれぞれ差し引く。つま
り、（入力音声のパワ―スペクトル｜ｙ_i｜−上記｜ｘ
_i｜）という処理を行う。The noise elimination unit 8 obtains | x _{i of the} noise data selected by the noise countermeasure unit 7 from the power spectra of all the band frequencies of the voice input to the voice input unit 1.
| (I = 1, 2,..., N) is subtracted. That is, (power spectrum of input voice | y _i | -above | x
_i |) is performed.

【００２６】認識対象となる音声がノイズと共に入力さ
れると、上記のように、音響分析部においてノイズ成分
を除去して認識対象となる音声信号のパワ―スペクトル
を推定し、このパワ―スペクトルパタ―ンを音声認識部
３において認識辞書１１中のリファレンスパタ―ンと照
合して認識を行う。When the speech to be recognized is input together with noise, as described above, the noise component is removed by the acoustic analyzer to estimate the power spectrum of the speech signal to be recognized. The voice recognition unit 3 performs recognition by collating with the reference pattern in the recognition dictionary 11 in the voice recognition unit 3.

【００２７】ノイズ除去の方法としては、このほか、適
応フィルタを用いたアクティブノイズ制御等がある。こ
れは、ノイズの混入した音声を入力する入力部と、ノイ
ズのみを入力する入力部と、適応フィルタとを用いてノ
イズ除去を行うもので、この場合には、各時間帯のノイ
ズに対応する適応フィルタの初期値を、ノイズデ―タ１
〜ｎとして保持する。Other methods for removing noise include active noise control using an adaptive filter. This is to remove noise using an input unit for inputting speech mixed with noise, an input unit for inputting only noise, and an adaptive filter. In this case, the noise corresponding to each time zone is corresponded. Set the initial value of the adaptive filter to noise data 1
Ｎn.

【００２８】以上のように、環境音が時間帯によって変
化するとき、環境ノイズの発生に関する時間的情報を用
いて、その場の環境を反映したノイズデ―タを選択的に
使用してノイズ除去を行うため、Ｓ／Ｎ比を高めて認識
性能を向上させることができる。As described above, when the environmental sound changes depending on the time zone, noise is removed by selectively using noise data reflecting the local environment using temporal information on the occurrence of environmental noise. Therefore, the recognition performance can be improved by increasing the S / N ratio.

【００２９】なお、第１及び第２の実施例において、そ
れぞれ、認識辞書作成の際に重畳させる環境ノイズ及び
ノイズデ―タ作成の際に取り込む環境ノイズは、上で説
明したように状況に対応したものを収集して、後で時間
情報管理テ―ブル作成の際に時間と対応させていてもよ
いし、あるいは、まず状況の時間的変化を時間情報管理
テ―ブルの形に記述して、各時間帯における環境ノイズ
を収集しその番号の認識辞書かノイズデ―タを作成する
ようにしてもよい。 ○実施例３In the first and second embodiments, the environmental noise to be superimposed at the time of creating the recognition dictionary and the environmental noise taken at the time of creating the noise data correspond to the situation as described above. It is possible to collect things and make them correspond to time later when creating the time information management table, or to first describe the temporal change of the situation in the form of the time information management table, The environmental noise in each time zone may be collected, and a recognition dictionary or noise data of the number may be created. ○ Example 3

【００３０】図４は、第３の実施例に係る音声認識装置
のブロック図である。この音声認識装置は、第１と第２
の実施例を組み合わせたもので、認識辞書部５・認識辞
書スイッチ４とノイズ除去部８・ノイズデ―タ部９・ノ
イズデ―タスイッチ１０とを両方備えている。また、ノ
イズ対策部６７は、時間情報管理テ―ブル１及び２を両
方保持している。FIG. 4 is a block diagram of a speech recognition apparatus according to the third embodiment. This speech recognition device comprises a first and a second
The present embodiment is a combination of the above embodiments, and includes both a recognition dictionary unit 5, a recognition dictionary switch 4, a noise removing unit 8, a noise data unit 9, and a noise data switch 10. Further, the noise countermeasure unit 67 holds both time information management tables 1 and 2.

【００３１】処理の流れを簡単に説明する。ノイズ対策
部６７では、タイマ―から音声を入力した時刻を知り、
時間情報管理テ―ブル２を参照してノイズデ―タ部９の
ノイズデ―タを選択し、時間情報管理テ―ブル１を参照
して認識辞書部５の認識辞書を選択する。ここで、時間
情報管理テ―ブル１と２を合体させた、時間の情報の項
目・ノイズデ―タ番号・認識辞書番号を１セットとする
テ―ブルを持つようにしてもよい。選択したノイズデ―
タを用いて、音響分析部２とその中のノイズ除去部８
が、分析と共にノイズ除去を行い、その結果を音声認識
部３に出力する。音声認識部３は、選択した認識辞書を
用いて認識を行う。ここで、認識辞書は、対応するノイ
ズデ―タを用いてノイズ除去を行ってもまだ残るノイズ
を学習用音声に重畳した音声デ―タをもとに作成された
ものである。The processing flow will be briefly described. The noise countermeasure unit 67 knows the time when the sound was input from the timer,
The noise data of the noise data section 9 is selected with reference to the time information management table 2, and the recognition dictionary of the recognition dictionary section 5 is selected with reference to the time information management table 1. Here, a table in which the time information management tables 1 and 2 are combined and which has a set of time information items, noise data numbers, and recognition dictionary numbers may be provided. Selected noise data
The sound analysis unit 2 and the noise removal unit 8 therein.
Performs noise removal together with the analysis, and outputs the result to the speech recognition unit 3. The speech recognition unit 3 performs recognition using the selected recognition dictionary. Here, the recognition dictionary is created based on speech data obtained by superimposing noise remaining after the noise removal using the corresponding noise data on the learning speech.

【００３２】本実施例では、第２の実施例におけるノイ
ズ除去の効果が不十分である場合にも、除去しきれなか
ったノイズに即した認識辞書を選択的に用いることによ
り、更に認識性能が向上する。 ○実施例４In this embodiment, even when the effect of noise removal in the second embodiment is insufficient, the recognition performance is further improved by selectively using a recognition dictionary corresponding to the noise that has not been completely removed. improves. ○ Example 4

【００３３】図５は、第４の実施例に係る音声認識装置
のブロック図である。この音声認識装置は、第１の実施
例と、ワ―ドスポッティング法により認識辞書を学習さ
せる機能とを組み合わせたもので、第１の実施例の構成
に、学習用音声デ―タファイル１２、学習用音声デ―タ
加工部１３、認識辞書作成部１４を付加した構成を持
つ。FIG. 5 is a block diagram of a speech recognition apparatus according to the fourth embodiment. This speech recognition apparatus is a combination of the first embodiment and a function of learning a recognition dictionary by a word spotting method. The configuration of the first embodiment includes a learning speech data file 12, It has a configuration in which a voice data processing unit 13 and a recognition dictionary creating unit 14 are added.

【００３４】ワ―ドスポッティング法による認識辞書の
学習は、特願平１−２５５２７０号に述べられている方
式が利用できる。音声入力部１から認識すべき音声が入
力されていない（環境ノイズのみが入力されている）時
に、音声入力部１から入力され音響分析部２で分析され
た環境ノイズと、学習用音声デ―タファイル１２の学習
用音声デ―タとを、学習用音声デ―タ加工部１３におい
て重畳し、この結果を音響分析部２を通して認識辞書作
成部１４に送る。The learning of the recognition dictionary by the word spotting method can use the method described in Japanese Patent Application No. 1-255270. When the voice to be recognized is not input from the voice input unit 1 (only environmental noise is input), the environmental noise input from the voice input unit 1 and analyzed by the acoustic analysis unit 2 and the learning voice data The learning voice data of the data file 12 is superimposed on the learning voice data processing unit 13, and the result is sent to the recognition dictionary creating unit 14 through the acoustic analysis unit 2.

【００３５】この認識辞書作成部１４において、認識辞
書部５の内どの認識辞書を学習させるかという判断は、
ノイズ対策部６に保持されている、ある認識辞書が使用
されるべき時間とその辞書との対応関係の情報よりなさ
れる。例えば、タイマ―７からの時刻と時間情報管理テ
―ブル１の曜日、時刻、時間長が合致した辞書番号の認
識辞書を選択し、認識辞書スイッチ４を制御して、選択
した認識辞書をそのときの環境ノイズで学習させる。In the recognition dictionary creating section 14, the determination as to which recognition dictionary of the recognition dictionary section 5 is to be learned is as follows.
This is based on information on the correspondence between the time at which a certain recognition dictionary is to be used and the dictionary held in the noise suppression unit 6. For example, a recognition dictionary of a dictionary number whose day, time, and time length of the time information management table 1 match the time from the timer 7 is selected, the recognition dictionary switch 4 is controlled, and the selected recognition dictionary is stored in the corresponding dictionary. Learning with environmental noise at the time.

【００３６】本実施例では、実際にこの音声認識装置を
使用すると全く同一の環境で、その環境に特有のノイズ
を用いて認識辞書を作成するため、第一の実施例の効果
に加えて更に音声認識の性能を向上させることができ
る。 ○実施例５In this embodiment, since the recognition dictionary is created using the noise peculiar to the environment in the completely same environment when the speech recognition apparatus is actually used, the effect of the first embodiment is further added. The performance of speech recognition can be improved. ○ Example 5

【００３７】図６は、第５の実施例に係る音声認識装置
のブロック図である。この音声認識装置は、第３と第４
の実施例を組み合わせたもので、処理の流れは第３、第
４の実施例と同様である。 ○実施例６FIG. 6 is a block diagram of a speech recognition apparatus according to the fifth embodiment. This voice recognition device has a third and a fourth
The processing flow is the same as that of the third and fourth embodiments. ○ Example 6

【００３８】図７は、第６の実施例に係る音声認識装置
のブロック図である。第５の実施例までは、ノイズ対策
を講じて認識性能を高めるものであったが、本実施例
は、発声者を限定することにより認識性能を高めようと
するものである。この音声認識装置は、音声入力部１、
音響分析部２、音声認識部３、認識辞書１１、発声者予
測部１５、語彙・発声者対応テ―ブル１６により構成さ
れる。音声入力部１、音響分析部２の機能は第１の実施
例と同様である。FIG. 7 is a block diagram of a speech recognition apparatus according to the sixth embodiment. Up to the fifth embodiment, the recognition performance is improved by taking measures against noise. However, in the present embodiment, the recognition performance is improved by limiting the number of speakers. This voice recognition device includes a voice input unit 1,
It comprises an acoustic analysis unit 2, a speech recognition unit 3, a recognition dictionary 11, a speaker prediction unit 15, and a vocabulary / speaker correspondence table 16. The functions of the voice input unit 1 and the acoustic analysis unit 2 are the same as in the first embodiment.

【００３９】語彙・発声者対応テ―ブル１６には、例え
ば語彙・発声者対応テ―ブル（表３）のように語彙毎に
その語彙を主に発話する人の集合を対応づけたものであ
る。例えば、「おかあちゃん」という語彙には、小学生
という集合が対応づけられている。The vocabulary / speaker correspondence table 16 is, for example, a vocabulary / speaker correspondence table (Table 3) in which a set of persons mainly speaking the vocabulary is associated with each vocabulary. is there. For example, the vocabulary "Mom-chan" is associated with a set of elementary school students.

【００４０】[0040]

【表３】 [Table 3]

【００４１】発声者予測部１５は、例えば時間情報管理
テ―ブル３（１５２）（表４）のように時間情報とその
時間に本音声認識装置の入力音声の発声者となる確率の
高い人の集合を対応づけた情報を保持している。そして
タイマ―（１５１）より時刻をモニタして、その時刻に
発声者となりやすい人の集合（これを予測発声者と呼
ぶ）を時間情報管理テ―ブル３より選択し、音声認識部
３に伝える。予測発声者とは例えば、人をある基準によ
りいくつかの集団に分け、集団に属する人がその時間帯
に本装置に音声を入力する頻度を集団毎に集計し、この
頻度が最も高い集団である。The speaker predictor 15 is a person having a high probability of becoming the speaker of the input speech of the present speech recognition device at that time as shown in the time information management table 3 (152) (Table 4). Holds the information corresponding to the set of. Then, the time is monitored by the timer (151), and a set of persons who are likely to be speakers at that time (this is called a predicted speaker) is selected from the time information management table 3 and transmitted to the speech recognition unit 3. . The predicted speaker is, for example, dividing a person into several groups according to a certain criterion, summing up the frequency at which persons belonging to the group input voices to the apparatus during the time zone, for each group. is there.

【００４２】[0042]

【表４】 [Table 4]

【００４３】音声認識部３では、第１の実施例で説明し
たように複合類似度法により認識対象語彙の類似度を算
出する。そして、例えば単語を認識する際、発声者予測
部１５より伝えられた現在の予測発声者と語彙・発声者
対応テ―ブル１の対象発声者の集合が一致する語彙（単
語）を確認し、一致する語彙（単語）の類似度に重みづ
けして大きくし、また一致しない語彙（単語）の類似度
を小さくする。次ぎに、重みづけられた類似度により認
識単語の判定を行う。The speech recognition section 3 calculates the similarity of the vocabulary to be recognized by the compound similarity method as described in the first embodiment. Then, for example, when recognizing a word, a vocabulary (word) in which the set of the target predicted speaker of the vocabulary / speaker correspondence table 1 coincides with the current predicted speaker reported from the speaker predictor 15 is checked The similarity of a matching vocabulary (word) is weighted to increase, and the similarity of a non-matching vocabulary (word) is reduced. Next, the recognition word is determined based on the weighted similarity.

【００４４】例えば、店舗において月〜金曜日の15：30
〜17：00に小学生がいつもたくさん来る場合、発声者予
測部１５により時間情報管理テ―ブル３を用いて小学生
の集合を表す情報が音声認識部３に伝えられる。音声認
識部３では、認識辞書１１中の「キャンデイ」「おにい
ちゃん」「おかあちゃん」等の語彙は語彙・発声者対応
テ―ブル１より予測発声者が小学生であるために類似度
が重みづけられ大きくなるため、認識単語として判定さ
れやすくなる。よって、例えば小学生が２０歳前後の男
性店員を「おにいちゃん」と呼んだ場合など音声を認識
してその店員に知らせる事ができる。For example, at a store at 15:30 from Monday to Friday
When a large number of elementary school students always arrive at about 17:00, the speaker predicting unit 15 uses the time information management table 3 to transmit information representing the set of elementary school students to the speech recognition unit 3. In the voice recognition unit 3, similarities are weighted for vocabularies such as "candy", "oniichan", and "mother" in the recognition dictionary 11 because the predicted speaker is an elementary school student from the vocabulary / speaker correspondence table 1. Since it becomes large, it is easy to be determined as a recognized word. Therefore, for example, when an elementary school student calls a male clerk around the age of 20 "Oniichan", the voice can be recognized and the clerk can be notified.

【００４５】このように、本実施例によれば、発声者と
なる確率の高い人の情報を用いて語彙の類似度に重みづ
けし、高い認識率を得ることができる。更に、発声者と
なる確率の高い人の情報を時間と対応づけて用いること
により、刻々と変化する環境に適応した認識を行うこと
ができる。 ○実施例７As described above, according to this embodiment, it is possible to obtain a high recognition rate by weighting the vocabulary similarity using the information of a person who has a high probability of being a speaker. Further, by using information of a person who has a high probability of being a speaker in association with time, it is possible to perform recognition adapted to an ever-changing environment. ○ Example 7

【００４６】図８は、第７の実施例に係る音声軟式装置
のブロック図である。この音声認識装置は、第１と第６
の実施例を組み合わせたもので、認識辞書部５、認識辞
書スイッチ４と語彙・発声者対応テ―ブル１６とを両方
備えており、ノイズ対策部６と発声者予測部１５の機能
を兼ね備えた環境適応部１７を持つ。処理の流れは第
１、第６の実施例と同様である。 ○実施例８FIG. 8 is a block diagram of a speech soft device according to a seventh embodiment. This voice recognition device has first and sixth
Which includes both the recognition dictionary section 5, the recognition dictionary switch 4, and the vocabulary / speaker correspondence table 16, and has the functions of the noise countermeasure section 6 and the speaker prediction section 15. It has an environment adaptation unit 17. The processing flow is the same as in the first and sixth embodiments. ○ Example 8

【００４７】図９は、第８の実施例に係る音声認識装置
のブロック図である。この音声認識装置は、第２と第６
の実施例を組み合わせたもので、ノイズ除去部８、ノイ
ズデ―タ部９、ノイズデ―タスイッチ１０と語彙・発声
者対応テ―ブル１６とを両方備えており、ノイズ対策部
７と発声者予測部１５の機能を兼ね備えた環境適応部１
８を持つ。処理の流れは第２、第６の実施例と同様であ
る。 ○実施例９FIG. 9 is a block diagram of a speech recognition apparatus according to the eighth embodiment. This voice recognition device has a second and a sixth
Which includes both a noise removing section 8, a noise data section 9, a noise data switch 10, and a vocabulary / speaker correspondence table 16, and includes a noise countermeasure section 7 and a speaker predictor. Environment adaptation unit 1 having the function of unit 15
Have eight. The processing flow is the same as in the second and sixth embodiments. ○ Example 9

【００４８】図１０は、第９の実施例に係る音声認識装
置のブロック図である。第８の実施例までは、時間情報
管理テ―ブルのスケジュ―ル通りに環境が変化すること
を前提とするものであったが、本実施例は、環境の変化
の時間的なズレにも対応できるようにしたものである。
この音声認識装置は、音声入力部１、音響分析部２、音
声認識部３、認識辞書スイッチ４、認識辞書部５、ノイ
ズ対策部１９、臨時事象選択部２０により構成される。FIG. 10 is a block diagram of a speech recognition apparatus according to the ninth embodiment. Up to the eighth embodiment, it has been assumed that the environment changes according to the schedule of the time information management table. However, this embodiment is also applicable to the time shift of the environmental change. It is made to be able to cope.
This speech recognition device includes a speech input unit 1, an acoustic analysis unit 2, a speech recognition unit 3, a recognition dictionary switch 4, a recognition dictionary unit 5, a noise countermeasure unit 19, and a temporary event selection unit 20.

【００４９】環境の変化には、駅構内で電車が到着す
る、電車の発着にともなってベルが鳴る等の、観測可能
な「事象」が起きることによってもたらされるノイズ
と、群集によってかもし出されるざわざわした音や、早
朝に鳴く鳥の声等の、ノイズとそのノイズが発生する原
因との因果関係がはっきりせず事象という概念ではとら
えきれない「状況」のノイズとがある。第８の実施例ま
では、環境の変化を時間という枠でとらえて、前者も後
者も含めてノイズ対策をするものであった。ところがこ
の方法では、事象が予め定まった時間通りに起こらなか
った場合に対応することができない。The environmental changes include noise caused by the occurrence of observable "events" such as a train arriving at the station premises and a bell ringing as the train arrives and departs, and a crowd generated by the crowd. There is a "situation" noise, such as a sound or a voice of a bird singing in the early morning, which cannot be clearly understood by the concept of an event because the causal relationship between the noise and the cause of the noise is not clear. Until the eighth embodiment, a change in the environment is captured in the frame of time, and noise measures including the former and the latter are taken. However, this method cannot cope with a case where an event does not occur at a predetermined time.

【００５０】そこで、本実施例のノイズ対策部１９は、
時間情報管理テ―ブル４（１９３）に、時刻とその時刻
に発生する事象とは対応づけた情報を保持している。例
えば、時刻ｔ₁に事象Ｂ（例えば電車が到着する）、時
刻ｔ₄に事象Ｅ（例えばサイレンが鳴る）が起こる予定
であることが記憶されている。認識辞書部５は、各ノイ
ズ毎に作成された認識辞書ａ，ｂ，ｃ，…と予備の汎用
認識辞書（あるノイズに特有のものではなく一般の認識
辞書）とを備えている。各認識辞書は、例えば、ａは早
朝に鳴く鳥の声を含むノイズを、ｂは事象Ｂが起きたと
きのノイズを、ｃは事象Ｂに引き続いて起こるホ―ムの
ベル音とラッシュの音を含むノイズを、ｄは昼間行き来
する人の足音を含むノイズを、ｅは事象Ｅが起きたとき
のノイズを、それぞれ学習用音声に重畳した音声デ―タ
をもとに作成されている。さらに、本実施例のノイズ対
策部１９は、事象・辞書テ―ブル（１９４）に、事象と
その事象が生起したときに用いるべき認識辞書とを対応
づけた情報を保持している。ここには、事象と対応づけ
られた認識辞書ｂ，ｅが登録され、事象でくくれない状
況に対応する認識辞書ａ，ｄは登録されない。Therefore, the noise countermeasure unit 19 of this embodiment is
The time information management table 4 (193) holds information in which the time and the event occurring at that time are associated with each other. For example, an event B (for example a train arrives) at time t _1, it is stored in time t ₄ (rings, for example a siren) event E is scheduled to occur. The recognition dictionary unit 5 includes recognition dictionaries a, b, c,... Created for each noise and a spare general-purpose recognition dictionary (a general recognition dictionary not specific to a certain noise). In each of the recognition dictionaries, for example, a is a noise including the voice of a bird singing in the early morning, b is noise when an event B occurs, and c is a bell sound and a rush sound of a home that occur after the event B. , D is a noise including footsteps of a person coming and going during the day, and e is generated based on voice data obtained by superimposing a noise when an event E occurs on a learning voice. Further, the noise countermeasure unit 19 of the present embodiment holds information in which the event and the recognition dictionary to be used when the event occurs are stored in the event / dictionary table (194). Here, the recognition dictionaries b and e associated with the event are registered, and the recognition dictionaries a and d corresponding to the situation where the event does not occur are not registered.

【００５１】次に、臨時事象選択部２０は、現在または
将来に起きる事象をノイズ対策部１９に伝える機能を持
つ。例えば、電車の到着が１０分遅れそうなときは、手
動で、予定時の１０分後に電車が到着することをノイズ
対策部１９に伝える。または、監視カメラを用いて、現
在起こっている事象を観測してその結果（到着予定の電
車が見えたかまだ見えないか）をノイズ対策部１９に伝
える。Next, the temporary event selection unit 20 has a function of transmitting a current or future event to the noise countermeasure unit 19. For example, when the arrival of the train is likely to be delayed by 10 minutes, the noise countermeasure unit 19 is manually notified that the train arrives 10 minutes after the scheduled time. Alternatively, the current event is observed using a monitoring camera, and the result (whether or not the arriving train is visible or not yet visible) is transmitted to the noise suppression unit 19.

【００５２】ノイズ対策部１９の動きについて図１１の
フロ―図を参照しながら述べる。タイマ―（１９１）よ
り現在の時刻ｔをモニタし、時間情報管理テ―ブル４
（１９３）から現在の時刻に起きる予定の事象を検出
し、これと臨時事象選択部２０から送られてくる事象発
生の情報とが一致する場合（上記の例ではｔ＝ｔ₁に事
象Ｂが生起し、ｔ＝ｔ₄に事象Ｅが生起し、ｔ≠ｔ₁，
ｔ₄には事象生起が観測されない場合）、第１の実施例
と同様に、時間と認識辞書を対応させた時間情報管理テ
―ブル１（１９２）を用いて、全認識辞書のうち１つを
選択する為に認識辞書スイッチ４を制御する（Ｓ１０
３，Ｓ１０７，Ｓ１１５）。The operation of the noise countermeasure unit 19 will be described with reference to the flowchart of FIG. The current time t is monitored by the timer (191), and the time information management table 4
An event scheduled to occur at the current time is detected from (193), and when this matches the event occurrence information sent from the temporary event selection unit 20 (in the above example, the event B is detected at t = t _1). Event E occurs at t = t ₄ , and t 1t ₁ ,
If the t ₄ when event occurrence is not observed), as in the first embodiment, the time and the time information management Te and the recognition dictionary in correspondence - with Bull 1 (192), one out of all recognition dictionary The recognition dictionary switch 4 is controlled in order to select (S10).
3, S107, S115).

【００５３】ここで、例えば電車の到着が遅れる等の事
象発生の時間のズレが生じた場合、臨時事象選択部２０
から送られてくる電車が来ないという情報によって、時
間情報管理テ―ブル１の内容を一時的に変更する。例え
ば、ｔ＝ｔ₁に事象Ｂが生起しない場合には、時間情報
管理テ―ブル１の、事象Ｂに対応する認識辞書ｂが用い
られるはずであった時間ｔ₁〜ｔ₂の認識辞書の項をａ
または汎用と書き換える（Ｓ１０４）。これは、事象が
起きなかったということはその直前の状況が続いている
と考えて、この状況に適した認識辞書を選択するためで
ある。直前のものが状況に適した認識辞書とはならない
場合には、汎用認識辞書を選択する。ｔ＝ｔ₄に事象Ｅ
が生起しない場合も同様の考え方で時間情報管理テ―ブ
ル１の認識辞書の項を書き換える（Ｓ１０８）。なお、
ｔ₂〜ｔ₃に用いられるはずの認識辞書ｃは、事象Ｂに
引き続いてなる状況に即したものであるから、事象Ｂが
起こらない場合は時間ｔ₂〜ｔ₃の認識辞書の項もａま
たは汎用と書き換える。さらに、ｔ≠ｔ₁に事象Ｂが生
起した場合には、事象・辞書テ―ブル（１９４）を参照
して該当する認識辞書ｂを選択し（Ｓ１１０）、時間情
報管理テ―ブル１の、事象Ｂに対応する認識辞書ｂが用
いられるはずであった時間ｔ₁〜ｔ₂を（ｔ−ｔ₁）分
ずらすように書き換える（Ｓ１１１）。ｔ≠ｔ₄に事象
Ｅが生起した場合にも同様に事象・辞書テ―ブルにより
認識辞書ｅを選択して（Ｓ１１３）、時間情報管理テ―
ブル１の認識辞書ｅが用いられるばずであった時間の項
を書き換える（Ｓ１１４）。Here, if there is a time lag in the occurrence of an event, for example, the arrival of a train is delayed, the temporary event selection unit 20
The contents of the time information management table 1 are temporarily changed according to the information that the train will not arrive. For example, when t = t ₁ to the event B does not occur, the time information management Te - Bull 1, the recognition dictionary of the recognition dictionary b is supposed at a time t ₁ ~t ₂ used corresponding to the event B Term a
Or, it is rewritten as general purpose (S104). This is because the fact that the event has not occurred means that the situation immediately before that is continued, and a recognition dictionary suitable for this situation is selected. If the immediately preceding one is not a recognition dictionary suitable for the situation, a general-purpose recognition dictionary is selected. Event E at t = t ₄
Does not occur, the terms of the recognition dictionary of the time information management table 1 are rewritten in the same way (S108). In addition,
recognition dictionary c supposed used t ₂ ~t _3, since those in line with situations in which subsequent to the event B, also terms of the recognition dictionary of time when the event B does not occur t ₂ ~t ₃ a Or rewrite it as generic. Further, when the event B has occurred in t ≠ t ₁ is event-Dictionary Te - selecting a recognition dictionary b in question with reference to the table (194) (S110), the time information management Te - Bull 1, The time t ₁ to t ₂ at which the recognition dictionary b corresponding to the event B was to be used is rewritten so as to be shifted by (t−t ₁ ) (S111). t ≠ t ₄ the same even if the event E has occurred in the event, dictionaries tape - to select the recognition dictionary e by Bull (S113), the time information management Te -
The term of time when the recognition dictionary e of the bull 1 was used is rewritten (S114).

【００５４】このように、本実施例によれば、環境の変
化を事象によるものも状況ととらえられるものも時間と
いう枠でくくってノイズ対策をする場合に、事象が予め
定まった時間通りに起こらない非常事態にも対応するこ
とができる。As described above, according to the present embodiment, when a change in environment is caused by an event or a situation that can be regarded as a situation is enclosed in a time frame to take measures against noise, the event occurs at a predetermined time. Can respond to no emergencies.

【００５５】[0055]

【発明の効果】以上説明したように本発明によれば、非
定常な環境ノイズに対して、そのノイズの時間的情報を
予め記憶して利用することにより、環境に適した音声認
識辞書を選択的に用いたり、適切なノイズ除去を行った
りすることが可能となり、認識性能の向上を図ることの
できる音声認識装置を提供できるという実用上多大なる
効果が奏せられる。As described above, according to the present invention, for a non-stationary environmental noise, the temporal information of the noise is stored in advance and used to select a speech recognition dictionary suitable for the environment. This makes it possible to provide a voice recognition device capable of improving the recognition performance by practically using it and by appropriately removing noise.

[Brief description of the drawings]

【図１】第１の実施例に係る音声認識装置の構成図。FIG. 1 is a configuration diagram of a speech recognition device according to a first embodiment.

【図２】第２の実施例に係る音声認識装置の構成図。FIG. 2 is a configuration diagram of a speech recognition device according to a second embodiment.

【図３】ノイズ対策部６の動作を示すフロ―図。FIG. 3 is a flowchart showing the operation of a noise suppression unit 6;

【図４】第３の実施例に係る音声認識装置の構成図。FIG. 4 is a configuration diagram of a speech recognition device according to a third embodiment.

【図５】第４の実施例に係る音声認識装置の構成図。FIG. 5 is a configuration diagram of a speech recognition device according to a fourth embodiment.

【図６】第５の実施例に係る音声認識装置の構成図。FIG. 6 is a configuration diagram of a speech recognition device according to a fifth embodiment.

【図７】第６の実施例に係る音声認識装置の構成図。FIG. 7 is a configuration diagram of a speech recognition device according to a sixth embodiment.

【図８】第７の実施例に係る音声認識装置の構成図。FIG. 8 is a configuration diagram of a speech recognition device according to a seventh embodiment.

【図９】第８の実施例に係る音声認識装置の構成図。FIG. 9 is a configuration diagram of a speech recognition device according to an eighth embodiment.

【図１０】第９の実施例に係る音声認識装置の構成
図。FIG. 10 is a configuration diagram of a speech recognition device according to a ninth embodiment.

【図１１】ノイズ対策部１９の動作を示すフロ―図。FIG. 11 is a flowchart showing the operation of the noise suppression unit 19;

[Explanation of symbols]

１音声入力部２音響分析部３音声認識部４認識辞書スイッチ５認識辞書部６，７，６７，１９ノイズ対策部６１，７１，１５１，１９１タイマ― ６２，１９２時間情報管理テ―ブル１７２時間情報管理テ―ブル２１９３時間情報管理テ―ブル４１９４事象・辞書テ―ブル８ノイズ除去部９ノイズデ―タ部１０ノイズデ―タスイッチ１１認識辞書１２学習用音声デ―タファイル１３学習用音声デ―タ加工部１４認識辞書作成部１５発声者予測部１５２時間情報管理テ―ブル３１６語彙・発声者対応テ―ブル１７，１８環境適応部２０臨時事象選択部 Reference Signs List 1 voice input unit 2 sound analysis unit 3 voice recognition unit 4 recognition dictionary switch 5 recognition dictionary unit 6, 7, 67, 19 noise suppression unit 61, 71, 151, 191 timer 62, 192 time information management table 1 72 Time information management table 2 193 Time information management table 4 194 Event / dictionary table 8 Noise removal unit 9 Noise data unit 10 Noise data switch 11 Recognition dictionary 12 Learning voice data file 13 Learning voice Data processing unit 14 Recognition dictionary creation unit 15 Speaker prediction unit 152 Time information management table 3 16 Vocabulary / speaker correspondence table 17, 18 Environment adaptation unit 20 Temporary event selection unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者金澤博史神奈川県川崎市幸区小向東芝町１番地株式会社東芝総合研究所内 (56)参考文献特開昭59−168496（ＪＰ，Ａ) 特開昭62−103699（ＪＰ，Ａ) 特開昭59−34595（ＪＰ，Ａ) 特開昭59−34596（ＪＰ，Ａ) 特公昭61−27758（ＪＰ，Ｂ２) 特公昭63−67197（ＪＰ，Ｂ２) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/00 - 17/00 ──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Hirofumi Kanazawa 1st address, Komukai Toshiba-cho, Sachi-ku, Kawasaki-shi, Kanagawa Prefecture Toshiba Research Institute, Inc. (56) References JP-A-59-168496 (JP, A) JP-A-62-103699 (JP, A) JP-A-59-34595 (JP, A) JP-A-59-34596 (JP, A) JP-B-61-27758 (JP, B2) JP-B-63-67197 (JP, A) , B2) (58) Fields surveyed (Int. Cl. ⁷ , DB name) G10L 15/00-17/00

Claims

(57) [Claims]

An input means for inputting voice, comprising: a plurality of voice recognition dictionaries; an analysis means for analyzing voice input by the input means; and associating time with a voice recognition dictionary to be used for the time. A storage means for storing the time, and a time detecting means for detecting a time at which the voice is input by the input means; and a voice recognition dictionary corresponding to the time detected by the time detecting means, using the storage means. Selecting means for selecting; a recognizing means for recognizing the input voice by collating the analysis result by the analyzing means with the voice recognition dictionary selected by the selecting means; and a time and an event scheduled to occur at the time Event storage means for storing in advance, a preliminary storage means for storing in advance an event and a speech recognition dictionary to be used when the event occurs, and the time detection means An event detecting means for detecting an event occurring at a time detected by the detecting means, and a preliminary selecting means for selecting a voice recognition dictionary corresponding to the event detected by the event detecting means using the preliminary storing means, An event scheduled to occur at the time detected by the time detecting means is checked from the event storage means. If the scheduled event does not match the event detected by the event detecting means, the selecting means A speech recognition apparatus, wherein the recognition means is operated by switching to the preliminary selection means.

2. An input means for inputting a voice, an analyzing means for analyzing the voice input by the input means, and collating the analysis result by the analyzing means with a voice recognition dictionary to recognize the input voice. A first storage unit that stores in advance a vocabulary in the voice recognition dictionary and a utterance set that utters the vocabulary, and a time in advance, Second storage means for storing a utterance set having the highest probability of inputting a voice; time detecting means for detecting a time at which the voice is input by the input means; A search unit for searching a corresponding set of speakers from the second storage unit, wherein when the recognition unit compares the analysis result with the speech recognition dictionary, the first storage unit The speech recognition apparatus characterized in that the utterance set stored in the utterance set is recognized with a higher priority than the vocabulary that does not match the vocabulary that matches the utterance set searched by the search means.