JP2003241792A

JP2003241792A - Device and method for speech recognition

Info

Publication number: JP2003241792A
Application number: JP2002046318A
Authority: JP
Inventors: Takehiro Sekine; 剛宏関根; Akira Baba; 朗馬場; Masaru Nakamori; 勝中森; Haruhiro Kuboyama; 晴弘久保山; Hisataka Fujii; 寿隆藤井
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2002-02-22
Filing date: 2002-02-22
Publication date: 2003-08-29

Abstract

<P>PROBLEM TO BE SOLVED: To prevent a speech recognition rate from decreasing even in noisy environment having a frequency spectrum close to that of a speech. <P>SOLUTION: A speech taking-in part 1 takes in at least a speech spoken by a user and generates a speech signal, a noise superposition part 2 inputs a noise signal having a gradient of signal level to variation in frequency from a superposed noise data output part 3 and superposes it on the speech signal taken in by the speech input part 1 to reduce the signal peak of a speech other than the speech spoken by the user, and a speech recognition part 4 performs speech recognition processing with a speech signal on which the noise superposition part 2 superposes the noise signal. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声を用いた単語
認識、文章認識、感情認識等を行うに際して音声認識処
理をする音声認識装置及び方法に関し、特に、テレビや
ラジオ等の音声に似た雑音信号が存在する環境下に用い
て好適な音声認識装置及び方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition apparatus and method for performing voice recognition processing when performing word recognition, sentence recognition, emotion recognition, etc. using voice, and more particularly to a voice recognition device similar to the voice of a television or radio. The present invention relates to a voice recognition device and method suitable for use in an environment where a noise signal exists.

【０００２】[0002]

【従来の技術】従来より、音声認識システムにおいて、
マイク等により取り込んだ音声信号に音声認識処理をす
るものが知られている。このような音声認識システムで
は、音声認識の誤認識を軽減する為に、音声認識処理の
前処理としてスペクトル減算法などの雑音除去処理を行
ったり、音声モデルの学習データと認識用の音声データ
とに雑音データを重畳する処理をしていた。このような
音声認識システムとしては、例えば特開平９−１９８０
７９号公報に開示された技術が知られている。2. Description of the Related Art Conventionally, in a voice recognition system,
It is known that voice recognition processing is performed on a voice signal captured by a microphone or the like. In such a voice recognition system, in order to reduce erroneous recognition of voice recognition, noise removal processing such as spectrum subtraction is performed as preprocessing of the voice recognition processing, and learning data of the voice model and voice data for recognition are used. The noise data was superposed on. As such a voice recognition system, for example, Japanese Patent Laid-Open No. 9-1980.
The technique disclosed in Japanese Patent Publication No. 79 is known.

【０００３】この音声認識装置では、音声モデルの学習
データと認識用の音声データとに同じ雑音データを重畳
する構成を有し、雑音データを重畳した認識用の音声デ
ータの周波数スペクトルと音声モデルの学習データの周
波数スペクトルとを類似させることによって音声認識率
を向上させている。This speech recognition apparatus has a structure in which the same noise data is superimposed on the learning data of the speech model and the speech data for recognition, and the frequency spectrum of the speech data for recognition on which noise data is superimposed and the speech model of the speech model. The speech recognition rate is improved by making the frequency spectrum of the learning data similar.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、従来の
音声認識システムにおいては、通常、マイク等により検
出した音声データに、認識させたい音声データに加え
て、環境雑音が含まれており、環境雑音が音声認識処理
に影響を及ぼし、音声認識率の低下を招く。特に、環境
雑音が例えばテレビやラジオ等のユーザの音声に似た周
波数スペクトルを有する雑音である場合には、音声認識
率の低下が著しく、ユーザにとって使い勝手の悪いイン
ターフェースとなってしまう。However, in the conventional voice recognition system, usually, in addition to the voice data to be recognized, the ambient noise is included in the voice data detected by the microphone, etc. This affects the voice recognition process and causes a reduction in the voice recognition rate. In particular, when the environmental noise is noise having a frequency spectrum similar to that of a user's voice such as television or radio, the voice recognition rate is remarkably reduced, and the interface becomes inconvenient for the user.

【０００５】図１６に、従来の音声認識システムを示
す。この音声認識システムでは、音声取り込み部１０１
により取り込まれた音声と環境雑音とを信号増幅部１０
２で適切なレベルに増幅し、Ａ／Ｄ変換部１０３により
標本化、量子化した後に音声認識部１０４へそのまま出
力する。環境雑音信号は、例えばテレビやラジオのニュ
ースなど、人の音声に類似している場合は、そのままの
周波数スペクトル及び周波数スペクトルの時間的変移で
音声認識部１０４へ送られる。そして、音声認識部１０
４では、環境雑音を含んだ音声データに対してフーリエ
変換部１１１によるフーリエ変換処理、特徴量抽出部１
１２による特徴抽出をし、音響モデル記憶部１１３の音
響モデル、言語モデル記憶部１１４の言語モデルを用い
た尤度計算部１１５による尤度計算及び音声認識を行う
ことになる。その結果、音声認識部１０４の音声認識に
悪影響を与え、誤認識をもたらしてしまう。FIG. 16 shows a conventional voice recognition system. In this voice recognition system, the voice capturing unit 101
The voice and environmental noise captured by the signal amplification unit 10
In step 2, the signal is amplified to an appropriate level, sampled and quantized by the A / D converter 103, and then directly output to the voice recognition unit 104. When the environmental noise signal resembles a human voice, such as news on television or radio, it is sent to the voice recognition unit 104 with the frequency spectrum as it is and the temporal shift of the frequency spectrum. Then, the voice recognition unit 10
In 4, the Fourier transform processing by the Fourier transform unit 111 is performed on the voice data including the environmental noise, and the feature amount extraction unit 1
The feature extraction is performed by 12, and the likelihood calculation unit 115 using the acoustic model of the acoustic model storage unit 113 and the language model of the language model storage unit 114 performs likelihood calculation and speech recognition. As a result, the voice recognition of the voice recognition unit 104 is adversely affected, resulting in erroneous recognition.

【０００６】そこで、本発明は、上述した実情に鑑みて
提案されたものであり、音声に似た周波数スペクトルを
有する雑音環境下においても音声認識率の低下を防ぐこ
とができる音声認識装置及び方法を提供することを目的
とする。Therefore, the present invention has been proposed in view of the above-mentioned circumstances, and a speech recognition apparatus and method capable of preventing a reduction in speech recognition rate even in a noise environment having a frequency spectrum similar to speech. The purpose is to provide.

【０００７】[0007]

【課題を解決するための手段】上述の課題を解決するた
めに、本発明に係る音声認識装置では、少なくともユー
ザから発せられた音声を取り込んで音声信号を生成する
音声取り込み手段と、上記音声取り込み手段により取り
込まれた音声信号に、周波数の変化に対して信号レベル
の傾斜を有する雑音信号を重畳して、ユーザから発せら
れた音声以外の音声の信号ピークを低減する雑音重畳手
段と、上記雑音重畳手段により雑音信号が重畳された音
声信号に音声認識処理をする音声認識手段とを備える。In order to solve the above-mentioned problems, in a voice recognition apparatus according to the present invention, a voice capturing means for capturing at least a voice uttered by a user to generate a voice signal, and the above voice capturing. A noise superimposing means for superimposing a noise signal having a slope of a signal level with respect to a change in frequency on a voice signal taken in by the means to reduce a signal peak of a voice other than a voice emitted from a user; And a voice recognition unit for performing voice recognition processing on the voice signal on which the noise signal is superimposed by the superimposing unit.

【０００８】また、本発明に係る音声認識方法では、少
なくともユーザから発せられた音声を取り込んで音声信
号を生成し、上記音声信号に、周波数の変化に対して信
号レベルの傾斜を有する雑音信号を重畳して、ユーザか
ら発せられた音声以外の周波数スペクトルを平坦化する
雑音信号を重畳し、上記雑音信号を重畳した音声信号に
音声認識処理をする。Further, in the voice recognition method according to the present invention, at least a voice uttered by a user is taken in to generate a voice signal, and a noise signal having a signal level inclination with respect to a change in frequency is added to the voice signal. A noise signal that superimposes and flattens the frequency spectrum other than the voice emitted from the user is superimposed, and voice recognition processing is performed on the voice signal on which the noise signal is superimposed.

【０００９】[0009]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を参照して説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００１０】本発明は、以下に示すように構成された音
声認識装置に適用される。The present invention is applied to a voice recognition device configured as shown below.

【００１１】［音声認識装置の基本構成例］この音声認
識装置は、図１に示すように、例えば外部に露呈したマ
イク等からなる音声取り込み部１、雑音重畳部２、重畳
雑音データ出力部３、音声認識部４を備えて構成されて
いる。[Basic Configuration Example of Voice Recognition Device] As shown in FIG. 1, this voice recognition device includes a voice capturing unit 1, a noise superimposing unit 2, and a superposed noise data output unit 3 each including, for example, a microphone exposed to the outside. The voice recognition unit 4 is provided.

【００１２】音声取り込み部１は、例えばコンデンサマ
イク等からなり、例えば音声認識装置の直前に位置する
ユーザから発せられたユーザ音声を取り込むと共に、ユ
ーザ周囲の環境雑音をも取り込む。この環境雑音は、認
識対象となるユーザ音声よりもレベルは小さいが、音声
認識に必要ない成分であって、例えばユーザ周囲のテレ
ビやユーザ外の人から発せられた音声等をも含む。音声
取り込み部１は、ユーザ音声及び環境雑音を含む音声デ
ータを生成して雑音重畳部２に出力する。The voice capturing unit 1 is composed of, for example, a condenser microphone, and captures, for example, a user voice uttered by a user located immediately before the voice recognition device and also captures environmental noise around the user. This environmental noise has a level lower than that of the user's voice to be recognized, but is a component not necessary for voice recognition, and includes, for example, a television around the user or a voice emitted from a person outside the user. The voice capturing unit 1 generates voice data including user voice and environmental noise and outputs the voice data to the noise superimposing unit 2.

【００１３】重畳雑音データ出力部３は、予め用意され
た重畳雑音データや、逐次発生させた重畳雑音データを
雑音重畳部２に出力する。この重畳雑音データは、例え
ばＲＯＭ（Read Only Memory）などの記憶機構に記憶さ
れたデータ、又は記憶されたデータを演算処理したデー
タである。この重畳雑音データは、ユーザ音声の周波数
スペクトルの変化及び環境雑音の周波数スペクトルの変
化よりも平坦で、時間的変移も小さい周波数スペクトル
を有し、例えば周波数帯域に応じて音圧レベルが変化す
る雑音、周波数帯域が高くなるほど音圧レベルが小さく
なるピンク系の雑音のような定常的なスペクトルを有す
る。The superimposed noise data output unit 3 outputs the prepared superimposed noise data or the successively generated superimposed noise data to the noise superimposing unit 2. The superposed noise data is, for example, data stored in a storage mechanism such as a ROM (Read Only Memory) or data obtained by arithmetically processing the stored data. This superposed noise data has a frequency spectrum that is flatter than changes in the frequency spectrum of the user voice and changes in the frequency spectrum of the environmental noise and has a small temporal change. For example, noise whose sound pressure level changes according to the frequency band. , It has a steady spectrum such as pink noise in which the sound pressure level decreases as the frequency band increases.

【００１４】雑音重畳部２は、音声取り込み部１からの
音声データに重畳雑音データ出力部３からの重畳雑音デ
ータを重畳させる処理をし、重畳雑音データが重畳され
た音声データを音声認識部４に出力する。The noise superimposing unit 2 performs a process of superimposing the superposed noise data from the superposed noise data output unit 3 on the voice data from the voice capturing unit 1, and the voice data on which the superposed noise data is superposed is recognized by the voice recognition unit 4. Output to.

【００１５】雑音重畳部２は、図２（ａ）に示す時間的
な音圧レベル変化の音声データと、図２（ｂ）に示す時
間的な音圧レベル変化の重畳雑音データとを合成するに
際して、２つの時系列データを逐次比較して、その絶対
値の大きい方を選択して重畳した音声データとする。こ
れにより、雑音重畳部２は、図２（ｃ）に示すように、
時刻ｔ２，ｔ４，ｔ６では音声データを使用し、時刻ｔ
１，ｔ３，ｔ５では重畳雑音データを使用して音声認識
部４に出力する音声データを作成する。なお、図２に示
した場合では、結果的に音声データと重畳雑音データを
交互に選択して音声認識部４に出力する音声データを作
成した一例を示したが、これに限るものではない。The noise superimposing section 2 synthesizes the voice data of the temporal sound pressure level change shown in FIG. 2A and the superposed noise data of the temporal sound pressure level change shown in FIG. 2B. At this time, the two time-series data are sequentially compared, and the one having the larger absolute value is selected as the superimposed audio data. As a result, the noise superimposing unit 2, as shown in FIG.
At time t2, t4, and t6, voice data is used, and at time t
At 1, t3 and t5, the superimposed noise data is used to create voice data to be output to the voice recognition unit 4. Note that, in the case shown in FIG. 2, an example in which the voice data and the superimposed noise data are consequently alternately selected and the voice data to be output to the voice recognition unit 4 is created is shown, but the present invention is not limited to this.

【００１６】ここで、雑音重畳部２には、例えば音声フ
レームごとに、認識対象となる音声データと同時に重畳
雑音データが入力される。雑音重畳部２は、音声データ
の音圧レベルの絶対値と重畳雑音データの音圧レベルの
絶対値とを比較し、音圧レベルの大きい方を音声データ
として出力する。出力する音声データは、入力した音声
データよりも大きくなることはなく、入力した音声デー
タが音声認識部４の入力レンジの最大値に近いデータで
あっても、重畳処理によりデータの桁が大きくなること
はなく、音声認識に用いる音声データの特徴量を失うこ
とはない。Here, the noise superimposing unit 2 receives, for example, for each voice frame, superimposed noise data at the same time as voice data to be recognized. The noise superimposing unit 2 compares the absolute value of the sound pressure level of the voice data with the absolute value of the sound pressure level of the superimposed noise data, and outputs the one with the higher sound pressure level as the voice data. The output voice data does not become larger than the input voice data, and even if the input voice data is close to the maximum value of the input range of the voice recognition unit 4, the digit of the data increases due to the superimposition processing. Therefore, the feature amount of voice data used for voice recognition is not lost.

【００１７】また、雑音重畳部２は、図２を用いて説明
したように重畳する場合のみならず、音声データと重畳
雑音データとを重み付けし、その平均値を音声認識部４
に出力する音声データにするように重畳しても良い。す
なわち、雑音重畳部２は、音声データを「ｘ１」とし、
重畳雑音データを「ｘ２」としたときに、ｙ＝（ａｘ１＋ｂｘ２）なる演算をすることで音声認識部４に出力する音声デー
タの値ｙを決定する。ここで、上記式におけるａは音声
データの重み付け係数であり、ｂは重畳雑音データの重
み付け係数である。Further, the noise superimposing unit 2 weights the voice data and the superposed noise data not only in the case of superimposing as described with reference to FIG.
You may superimpose so that it may become the audio data output to. That is, the noise superimposing unit 2 sets the voice data as “x1”,
When the superimposed noise data is “x2”, the value y of the voice data to be output to the voice recognition unit 4 is determined by performing the calculation y = (ax1 + bx2). Here, a in the above equation is a weighting coefficient for audio data, and b is a weighting coefficient for superimposed noise data.

【００１８】音声認識部４は、雑音重畳部２からの音声
データを用いた音声認識処理をして、音声認識結果を生
成する。このとき、音声認識部４は、音声フレーム毎に
短時間フーリエ変換を行い、周波数スペクトルのピーク
及びその時間的変化を音声認識の特徴量として抽出し、
音響モデル及び言語モデルを用いた音声認識処理及び尤
度計算処理を行う。The voice recognition section 4 performs a voice recognition process using the voice data from the noise superposition section 2 to generate a voice recognition result. At this time, the voice recognition unit 4 performs a short-time Fourier transform for each voice frame to extract the peak of the frequency spectrum and its temporal change as a feature amount of voice recognition,
Performs voice recognition processing and likelihood calculation processing using an acoustic model and a language model.

【００１９】このように構成された音声認識装置は、図
３に示すように、音声取り込み部１により音声を取り込
むと、音声データを生成して雑音重畳部２に出力し（ス
テップＳ１）、雑音重畳部２により重畳雑音データ出力
部３からの重畳雑音データを音声データに重畳させ（ス
テップＳ２）、次に、重畳雑音データが重畳された音声
データを用いて音声認識部４により音声認識処理をして
音声認識結果を出力する。As shown in FIG. 3, when the voice capturing unit 1 captures a voice, the voice recognition apparatus thus configured generates voice data and outputs it to the noise superimposing unit 2 (step S1). The superposition unit 2 superimposes the superposition noise data from the superposition noise data output unit 3 on the voice data (step S2), and then the voice recognition unit 4 performs a voice recognition process using the voice data on which the superposition noise data is superposed. And outputs the voice recognition result.

【００２０】具体的には、音声認識装置には、図４に示
すような音圧レベルの時間的な変化の、音声信号成分
Ａ、環境雑音成分Ｂを含む音声が入力される。このよう
な音声信号をフーリエ変換した結果の周波数スペクトル
は、図５に示すように、ある期間において、ユーザ音声
成分Ａによる周波数スペクトルのピークＡ１及び周波数
スペクトルのピークＡ２、環境雑音成分Ｂによる周波数
スペクトルのピークＢ１を含んでいる。Specifically, a voice recognition apparatus receives a voice including a voice signal component A and an environmental noise component B of a temporal change in sound pressure level as shown in FIG. As shown in FIG. 5, the frequency spectrum resulting from the Fourier transform of such an audio signal has a frequency spectrum peak A1 and a frequency spectrum peak A2 due to the user audio component A and a frequency spectrum due to the environmental noise component B in a certain period. Peak B1 of

【００２１】次に、雑音重畳部２により、音声取り込み
部１により取り込んだ音声データに、重畳雑音データを
重畳したときの音圧レベルの時間的変化を図６に示す。
これにより、音声データには、ユーザ音声成分Ａ、環境
雑音成分Ｂ及び重畳雑音成分Ｃが重畳された状態とな
る。このような音声データをフーリエ変換した結果の周
波数スペクトルは、図７に示すように、ユーザ音声成分
Ａによる周波数スペクトルのピークＡ１、Ａ２が残って
いるが、環境雑音成分Ｂによる周波数スペクトルのピー
クＢ１が重畳雑音成分Ｃによる周波数スペクトルＣ１に
より消えているのが分かる。Next, FIG. 6 shows a temporal change in the sound pressure level when the noise superimposing unit 2 superimposes the superimposed noise data on the voice data captured by the voice capturing unit 1.
As a result, the user voice component A, the environmental noise component B, and the superimposed noise component C are superimposed on the voice data. As shown in FIG. 7, the frequency spectrum peaks A1 and A2 of the frequency spectrum due to the user voice component A remain in the frequency spectrum resulting from the Fourier transform of such audio data, but the peak B1 of the frequency spectrum due to the environmental noise component B remains. Can be seen to disappear due to the frequency spectrum C1 due to the superimposed noise component C.

【００２２】すなわち、雑音重畳部２は、ユーザ音声成
分Ａよりも音圧レベルが小さく、周波数スペクトルが環
境雑音成分Ｂの周波数スペクトルよりも平坦である重畳
雑音データを音声データに重畳することにより、環境雑
音が重畳雑音に埋もれたような形の周波数スペクトルと
する。That is, the noise superimposing section 2 superimposes on the voice data the superposed noise data whose sound pressure level is lower than that of the user voice component A and whose frequency spectrum is flatter than that of the environmental noise component B. The frequency spectrum has a shape such that environmental noise is buried in superposed noise.

【００２３】このように重畳雑音データを音声データに
重畳することにより、環境雑音成分を周波数的に平坦化
し、入力される音声データに時間的に連続して重畳雑音
データを重畳することで、時間的に環境雑音成分を平坦
化して音声認識部４に出力する。なお、重畳雑音データ
は、環境雑音の周波数スペクトルを平坦化することが出
来れば良い。By superimposing the superposed noise data on the voice data in this way, the environmental noise component is flattened in frequency, and the superposed noise data is superposed continuously on the input voice data to obtain the time. The environmental noise component is flattened and output to the voice recognition unit 4. It should be noted that the superimposed noise data only needs to be able to flatten the frequency spectrum of environmental noise.

【００２４】すなわち、図７に示すように、高周波帯域
となるに従って次第に音圧レベルが小さくなり、音圧レ
ベルが周波数に対して反比例するような、周波数の変化
に対して傾斜を有する信号であっても良い。That is, as shown in FIG. 7, the signal has a slope with respect to a change in frequency such that the sound pressure level gradually decreases in the high frequency band and the sound pressure level is inversely proportional to the frequency. May be.

【００２５】上述したように、基本構成例に係る音声認
識装置によれば、雑音重畳を音声重畳することにより環
境雑音成分の周波数スペクトルのピークを平坦化するこ
とができ、環境雑音成分を音声認識部４によりユーザ音
声の特徴として抽出することなく音声認識処理をさせる
ことができる。したがって、この音声認識装置によれ
ば、例えばユーザの周囲にテレビ等が存在して音声取り
込み部１によりテレビ等から発せられる音声を取り込ん
でしまう場合であっても、音声認識の誤認識を低減する
ことができる。As described above, according to the speech recognition apparatus in the basic configuration example, the peak of the frequency spectrum of the environmental noise component can be flattened by superimposing the noise superposition, and the environmental noise component can be recognized by the speech recognition. The voice recognition process can be performed by the unit 4 without extracting the feature of the user voice. Therefore, according to this voice recognition device, erroneous recognition of voice recognition is reduced even if, for example, a television or the like exists around the user and the voice capturing unit 1 captures the voice emitted from the television or the like. be able to.

【００２６】また、この音声認識装置によれば、重畳雑
音データを重畳した音声データを音声認識部４の入力チ
ャネルを使って入力するので、例えば音声認識部４の音
響モデルやその学習アルゴリズムなどには影響を与える
ことはなく、音声認識部４の音声認識アルゴリズムに関
わりなく音声認識の誤認識を低減する効果に加えて、様
々な音声認識手法を用いても適用可能であるという利点
も有する。Further, according to this speech recognition apparatus, since the speech data on which the superimposed noise data is superimposed is input using the input channel of the speech recognition unit 4, for example, the acoustic model of the speech recognition unit 4 or its learning algorithm is used. Does not affect, and has an advantage that it can be applied by using various voice recognition methods in addition to the effect of reducing erroneous recognition of voice recognition regardless of the voice recognition algorithm of the voice recognition unit 4.

【００２７】［音声認識装置の第２構成例］つぎに、本
発明を適用した第２構成例に係る音声認識装置について
図８を参照して説明する。なお、上述した基本構成例と
同じ部分ついては同一符号を付することによりその詳細
な説明を省略する。[Second Configuration Example of Voice Recognition Device] Next, a voice recognition device according to a second configuration example to which the present invention is applied will be described with reference to FIG. The same parts as those of the basic configuration example described above are designated by the same reference numerals, and detailed description thereof will be omitted.

【００２８】第２構成例に係る音声認識装置は、重畳雑
音データ出力部３に代えて、重畳雑音記憶部１１と重畳
雑音選択部１２とを備えた点で基本構成例とは異なる。The speech recognition apparatus according to the second configuration example differs from the basic configuration example in that it includes a superposed noise storage unit 11 and a superposed noise selection unit 12 instead of the superposed noise data output unit 3.

【００２９】重畳雑音記憶部１１は、複数の重畳雑音デ
ータを記憶している。この重畳雑音記憶部１１は、複数
の周波数スペクトルと複数の音圧レベルとを組み合わせ
たデータパターンとして複数の重畳雑音データを記憶し
ている。The superposed noise storage unit 11 stores a plurality of superposed noise data. The superposed noise storage unit 11 stores a plurality of superposed noise data as a data pattern in which a plurality of frequency spectra and a plurality of sound pressure levels are combined.

【００３０】重畳雑音選択部１２は、雑音重畳部２にて
音声データに重畳する重畳雑音データのデータパターン
を選択して、少なくとも一つの重畳雑音データを雑音重
畳部２に出力する。The superposed noise selection unit 12 selects a data pattern of superposed noise data to be superposed on voice data in the noise superposed unit 2 and outputs at least one superposed noise data to the noise superposed unit 2.

【００３１】このような第２構成例に係る音声認識装置
によれば、重畳雑音記憶部１１に複数種類の重畳雑音デ
ータを予め作成して記憶しておき、重畳雑音選択部１２
により最適な重畳雑音データを選択して雑音重畳部２に
出力することができるので、基本構成例での効果に加え
て、更に環境雑音を低減して音声認識部４の誤認識を低
減することができる。According to the speech recognition apparatus according to the second configuration example, a plurality of types of superposed noise data are created and stored in the superposed noise storage unit 11 in advance, and the superposed noise selection unit 12 is provided.
Since optimum superposed noise data can be selected and output to the noise superimposing unit 2 by means of the above, in addition to the effect of the basic configuration example, environmental noise can be further reduced to reduce false recognition of the voice recognition unit 4. You can

【００３２】［音声認識装置の第３構成例］つぎに、本
発明を適用した第３構成例に係る音声認識装置について
図９を参照して説明する。なお、上述した構成例と同じ
部分ついては同一符号を付することによりその詳細な説
明を省略する。[Third Configuration Example of Voice Recognition Device] Next, a voice recognition device according to a third configuration example to which the present invention is applied will be described with reference to FIG. The same parts as those in the above-described configuration example are designated by the same reference numerals, and detailed description thereof will be omitted.

【００３３】第３構成例に係る音声認識装置は、音声認
識装置周囲の環境雑音を取り込む環境雑音取り込み部２
１と、音声取り込み部１の周囲の環境雑音レベルを測定
する環境雑音測定部２２とを備える点で、第２構成例と
は異なる。The speech recognition apparatus according to the third configuration example includes an environmental noise capturing unit 2 that captures environmental noise around the speech recognition apparatus.
1 and an environmental noise measuring unit 22 that measures an environmental noise level around the voice capturing unit 1, which is different from the second configuration example.

【００３４】環境雑音取り込み部２１は、音声認識装置
により音声認識処理を行っていない期間における音声認
識装置周囲の環境雑音を取り込み、環境雑音データを作
成して、環境雑音測定部２２に出力する。The environmental noise capturing unit 21 captures environmental noise around the voice recognition device during the period in which the voice recognition device is not performing voice recognition processing, creates environmental noise data, and outputs the environmental noise data to the environmental noise measurement unit 22.

【００３５】環境雑音測定部２２は、環境雑音取り込み
部２１により取り込んだ環境雑音の平均音圧レベルを測
定して、平均雑音情報として重畳雑音選択部１２に出力
する。The environmental noise measuring section 22 measures the average sound pressure level of the environmental noise captured by the environmental noise capturing section 21 and outputs it to the superimposed noise selecting section 12 as average noise information.

【００３６】重畳雑音選択部１２は、音声認識装置によ
り音声データの音声認識をするに際して、平均雑音情報
に基づいて適切な音圧レベルの重畳雑音データを選択し
て、重畳雑音記憶部１１から読み出す。具体的には、重
畳雑音選択部１２は、平均雑音情報にて示される環境雑
音の平均音圧レベルに基づいて、環境雑音の周波数スペ
クトルのピークを平坦化するような重畳雑音データを選
択して雑音重畳部２に出力する。The superposed noise selection unit 12 selects superposed noise data having an appropriate sound pressure level based on the average noise information and reads it from the superposed noise storage unit 11 when the voice recognition device recognizes the voice data. . Specifically, the superposed noise selection unit 12 selects superposed noise data that flattens the peak of the frequency spectrum of the environmental noise, based on the average sound pressure level of the environmental noise indicated by the average noise information. It is output to the noise superimposing unit 2.

【００３７】このような音声認識装置によれば、音声取
り込み部１にて取り込まれる環境雑音の平均音圧レベル
が大きい場合でも、環境雑音に対応した重畳雑音データ
を選択して重畳することができる。すなわち、この音声
認識装置によれば、環境雑音測定部２２により得た平均
雑音情報と重畳雑音記憶部１１に記憶した重畳雑音デー
タとの相関に基づいて重畳雑音選択部１２により使用す
る重畳雑音データを選択する。したがって、この音声認
識装置によれば、環境雑音成分による周波数スペクトル
のピークと略同じ音圧レベルの重畳雑音データを選択す
ることができ、確実に環境雑音成分の周波数スペクトル
のピークを平坦化することができ、音声認識部４の誤認
識を確実に低減させることができる。According to such a voice recognition device, even if the average sound pressure level of the environmental noise captured by the voice capturing unit 1 is high, the superimposed noise data corresponding to the environmental noise can be selected and superimposed. . That is, according to this speech recognition device, the superimposed noise data used by the superimposed noise selecting unit 12 based on the correlation between the average noise information obtained by the environmental noise measuring unit 22 and the superimposed noise data stored in the superimposed noise storage unit 11. Select. Therefore, according to this speech recognition device, it is possible to select superposed noise data having a sound pressure level substantially the same as the peak of the frequency spectrum of the environmental noise component, and to reliably flatten the peak of the frequency spectrum of the environmental noise component. Therefore, the erroneous recognition of the voice recognition unit 4 can be surely reduced.

【００３８】ここで、環境雑音に重畳雑音データを重畳
することによる誤認識の効果は、環境雑音の音圧レベル
と重畳する重畳雑音データの平均音圧レベルによって異
なる。すなわち、重畳雑音データの平均音圧レベルが環
境雑音レベルと比較して小さすぎる場合には環境雑音成
分の周波数スペクトルのピークを平坦化することができ
ず、重畳雑音データの平均音圧レベルが環境雑音レベル
と比較して大きすぎる場合にはユーザ音声成分の周波数
スペクトルのピーク高さが小さくなる。したがって、環
境雑音測定部２２で測定した環境雑音レベルに合わせて
重畳雑音データを選択することにより、より誤認識を低
減することができる。Here, the effect of erroneous recognition by superimposing the superimposed noise data on the environmental noise differs depending on the sound pressure level of the environmental noise and the average sound pressure level of the superimposed noise data. That is, if the average sound pressure level of the superimposed noise data is too small compared to the environmental noise level, the peak of the frequency spectrum of the environmental noise component cannot be flattened, and the average sound pressure level of the superimposed noise data is If it is too large compared to the noise level, the peak height of the frequency spectrum of the user voice component becomes small. Therefore, by selecting the superimposed noise data in accordance with the environmental noise level measured by the environmental noise measuring unit 22, it is possible to further reduce erroneous recognition.

【００３９】また、この音声認識装置によれば、音声取
り込み部１とは別に環境雑音取り込み部２１を設けた構
成としたので、音声認識時、非音声認識時に関わらず環
境雑音の平均音圧レベルを測定することができる。Further, according to this speech recognition apparatus, since the environmental noise capturing section 21 is provided separately from the speech capturing section 1, the average sound pressure level of the environmental noise regardless of whether the speech is recognized or not. Can be measured.

【００４０】［音声認識装置の第４構成例］つぎに、本
発明を適用した第４構成例に係る音声認識装置について
図１０を参照して説明する。なお、上述した構成例と同
じ部分ついては同一符号を付することによりその詳細な
説明を省略する。[Fourth Configuration Example of Voice Recognition Device] Next, a voice recognition device according to a fourth configuration example to which the present invention is applied will be described with reference to FIG. The same parts as those in the above-described configuration example are designated by the same reference numerals, and detailed description thereof will be omitted.

【００４１】第４構成例に係る音声認識装置は、音声認
識部４と重畳雑音選択部１２を接続した構成とした点
で、第２構成例とは異なる。The voice recognition apparatus according to the fourth configuration example differs from the second configuration example in that the voice recognition unit 4 and the superimposed noise selection unit 12 are connected.

【００４２】音声認識部４は、上述したように、ユーザ
音声の音声認識結果と共に、音声認識結果の尤度を算出
する。この第４構成例において、音声認識部４は、尤度
情報を重畳雑音選択部１２に出力する。As described above, the voice recognition unit 4 calculates the likelihood of the voice recognition result together with the voice recognition result of the user voice. In the fourth configuration example, the voice recognition unit 4 outputs likelihood information to the superposed noise selection unit 12.

【００４３】重畳雑音選択部１２は、音声認識部４から
の尤度情報を入力し、音声認識の尤度に基づいて重畳雑
音記憶部１１に記憶された重畳雑音データを選択する。
重畳雑音選択部１２は、例えば、尤度が高く音声認識が
正確に行われていると判定したときには雑音重畳部２に
て使用する重畳雑音データの変更をしない。一方、重畳
雑音選択部１２は、例えば、尤度が低く音声認識が正確
に行われていないと判定したときには雑音重畳部２にて
使用している重畳雑音データとは異なる重畳雑音データ
を選択して雑音重畳部２に出力する。The superposed noise selection unit 12 receives the likelihood information from the voice recognition unit 4 and selects the superposed noise data stored in the superposed noise storage unit 11 based on the likelihood of voice recognition.
The superposed noise selection unit 12 does not change the superposed noise data used in the noise superposition unit 2 when it is determined that the likelihood is high and the voice recognition is accurately performed, for example. On the other hand, for example, the superposed noise selection unit 12 selects superposed noise data different from the superposed noise data used in the noise superposition unit 2 when it is determined that the likelihood is low and the voice recognition is not accurately performed. And outputs it to the noise superimposing unit 2.

【００４４】このような音声認識装置によれば、重畳雑
音選択部１２により尤度を高くするような重畳雑音デー
タを選択するので、音声認識部４による音声認識の尤度
を更に高くして、音声認識部４の誤認識を更に低減する
ことができる。According to such a voice recognition device, since the superposed noise selecting unit 12 selects the superposed noise data having the higher likelihood, the likelihood of the voice recognition by the voice recognizing unit 4 is further increased, False recognition of the voice recognition unit 4 can be further reduced.

【００４５】［音声認識装置の第５構成例］つぎに、本
発明を適用した第５構成例に係る音声認識装置について
図１１を参照して説明する。なお、上述した構成例と同
じ部分ついては同一符号を付することによりその詳細な
説明を省略する。[Fifth Configuration Example of Voice Recognition Device] Next, a voice recognition device according to a fifth configuration example to which the present invention is applied will be described with reference to FIG. The same parts as those in the above-described configuration example are designated by the same reference numerals, and detailed description thereof will be omitted.

【００４６】第５構成例に係る音声認識装置は、音声取
り込み部１と雑音重畳部２との間に信号増幅部３１、Ａ
／Ｄ変換部３２、雑音除去部３３を備える点で、第１構
成例に係る音声認識装置とは異なる。なお、他の構成例
において、信号増幅部３１及びＡ／Ｄ変換部３２を図示
していない場合にあっては、ディジタル方式の音声デー
タ又はアナログ方式の音声信号の何れかを使用しても良
いとする。The speech recognition apparatus according to the fifth configuration example has a signal amplification section 31, A between the speech acquisition section 1 and the noise superposition section 2.
It is different from the speech recognition apparatus according to the first configuration example in that the / D conversion unit 32 and the noise removal unit 33 are provided. In another configuration example, if the signal amplification unit 31 and the A / D conversion unit 32 are not shown, either digital audio data or analog audio signal may be used. And

【００４７】この音声認識装置では、音声取り込み部１
により生成した音声信号を信号増幅部３１により所定の
音圧レベルに増幅し、Ａ／Ｄ変換部３２によりディジタ
ル形式の音声データに変換して雑音除去部３３に出力す
る。In this voice recognition device, the voice capturing unit 1
The audio signal generated by the above is amplified by the signal amplifier 31 to a predetermined sound pressure level, converted into digital audio data by the A / D converter 32, and output to the noise eliminator 33.

【００４８】雑音除去部３３は、Ａ／Ｄ変換部３２から
の音声データに、例えばスペクトル減算法などの雑音除
去処理をして、音声データから環境雑音成分を除去し
て、雑音重畳部２に出力する。そして、雑音重畳部２で
は、雑音除去部３３にて雑音除去処理がなされた音声デ
ータに重畳雑音データを重畳する。The noise removing unit 33 performs noise removal processing such as spectral subtraction on the voice data from the A / D conversion unit 32 to remove the environmental noise component from the voice data, and the noise superimposing unit 2 receives the noise. Output. Then, the noise superimposing unit 2 superimposes the superimposed noise data on the voice data that has been subjected to the noise removing process by the noise removing unit 33.

【００４９】このような音声認識装置によれば、第１構
成例での効果と雑音除去部３３による雑音除去効果との
相乗効果により、効果的に環境雑音を低減することがで
き、更に音声認識部４の誤認識を低減することができ
る。すなわち、雑音重畳部２により重畳雑音データを重
畳する前に雑音除去部３３により環境雑音を除去し、雑
音除去部３３により除去できずに残った環境雑音を含む
音声データに重畳を行う。したがって、雑音重畳部２
は、雑音除去部３３により環境雑音を除去していない音
声データと比較して環境雑音成分レベルが低くなってい
る音声データに重畳をすることになり、使用する重畳雑
音データの音圧レベルを小さくすることができる。According to such a voice recognition device, the environmental noise can be effectively reduced by the synergistic effect of the effect of the first configuration example and the noise removal effect of the noise removal unit 33, and further the voice recognition. False recognition of the unit 4 can be reduced. That is, before the noise superimposing unit 2 superimposes the superimposed noise data, the noise removing unit 33 removes the environmental noise, and the noise removing unit 33 removes the environmental noise and superimposes it on the voice data including the remaining environmental noise. Therefore, the noise superimposing unit 2
Is to be superimposed on the voice data whose environmental noise component level is lower than that of the voice data from which the ambient noise has not been removed by the noise removing unit 33, and the sound pressure level of the superposed noise data to be used is reduced. can do.

【００５０】したがって、この音声認識装置によれば、
音声データに重畳雑音データを重畳したことによるユー
ザ音声成分への影響を小さくすることができる結果、ユ
ーザ音声の周波数スペクトルのピークを音声認識部４に
て確実に抽出することができ、更に音声認識部４の誤認
識を低減することができる。Therefore, according to this voice recognition device,
As a result of being able to reduce the influence on the user voice component due to the superposition of the superimposed noise data on the voice data, the peak of the frequency spectrum of the user voice can be reliably extracted by the voice recognition unit 4, and the voice recognition can be further performed. False recognition of the unit 4 can be reduced.

【００５１】［音声認識装置の第６構成例］つぎに、本
発明を適用した第６構成例に係る音声認識装置について
図１２を参照して説明する。なお、上述した構成例と同
じ部分ついては同一符号を付することによりその詳細な
説明を省略する。[Sixth Configuration Example of Voice Recognition Device] Next, a voice recognition device according to a sixth configuration example to which the present invention is applied will be described with reference to FIG. The same parts as those in the above-described configuration example are designated by the same reference numerals, and detailed description thereof will be omitted.

【００５２】第６構成例に係る音声認識装置は、重畳雑
音選択部１２に代えて、重畳雑音データを発生させる重
畳雑音データ発生部４１を備える点で、上述の構成例と
は異なる。The speech recognition apparatus according to the sixth configuration example differs from the above-described configuration example in that it includes a superposed noise data generation unit 41 for generating superposed noise data, instead of the superposed noise selection unit 12.

【００５３】この第６構成例において、重畳雑音記憶部
１１には、雛型として重畳雑音データ発生部４１に使用
される重畳雑音データが記憶されている。なお、重畳雑
音記憶部１１に記憶される重畳雑音データは、上述した
ように複数用意しておいても良く、単一であっても良
い。In the sixth configuration example, the superposed noise storage unit 11 stores superposed noise data used as a template in the superposed noise data generation unit 41. The superposed noise data stored in the superposed noise storage unit 11 may be prepared in plural as described above, or may be single.

【００５４】重畳雑音データ発生部４１は、重畳雑音記
憶部１１に記憶された重畳雑音データを雛形とし、重畳
雑音記憶部１１から読み出した重畳雑音データを変形し
て雑音重畳部２に出力する。具体的には、重畳雑音デー
タ発生部４１は、アンプ４２、フィルタ４３からなり、
図示しない制御装置によりアンプ４２の増幅率及びフィ
ルタ４３の通過周波数帯域の設定がなされる。The superposed noise data generation unit 41 uses the superposed noise data stored in the superposed noise storage unit 11 as a template, modifies the superposed noise data read from the superposed noise storage unit 11 and outputs the modified superposed noise data to the noise superposition unit 2. Specifically, the superimposed noise data generator 41 includes an amplifier 42 and a filter 43,
The control unit (not shown) sets the amplification factor of the amplifier 42 and the pass frequency band of the filter 43.

【００５５】このような重畳雑音データ発生部４１で
は、アンプ４２の増幅率及びフィルタ４３の通過周波数
帯域の設定がなされた状態において、アンプ４２により
重畳雑音記憶部１１からの重畳雑音データ全体の音圧レ
ベル変換をし、フィルタ４３により重畳雑音データの周
波数帯域ごとに音圧レベル変換をして、重畳雑音データ
の変形をする。In the superposed noise data generator 41 as described above, in a state where the amplification factor of the amplifier 42 and the pass frequency band of the filter 43 are set, the sound of the entire superposed noise data from the superposed noise storage unit 11 is output by the amplifier 42. The pressure level is converted, the sound pressure level is converted by the filter 43 for each frequency band of the superimposed noise data, and the superimposed noise data is transformed.

【００５６】このような音声認識装置によれば、重畳雑
音記憶部１１に記憶した重畳雑音データを雛形にして、
全体の音圧レベル及び周波数スペクトル成分を変化させ
ることができ、他種類の重畳雑音データを重畳雑音記憶
部１１に記憶しておかなくても、様々な形態の重畳雑音
データを発生させることができ、重畳雑音記憶部１１の
容量を少なくすることができる。According to such a voice recognition device, the superimposed noise data stored in the superimposed noise storage unit 11 is used as a template,
The overall sound pressure level and frequency spectrum component can be changed, and various types of superposed noise data can be generated without storing other types of superposed noise data in the superposed noise storage unit 11. The capacity of the superposed noise storage unit 11 can be reduced.

【００５７】また、この音声認識装置によれば、音声認
識部４の音声認識結果や、環境雑音の音圧レベル、周波
数スペクトルに応じて、アンプ４２の増幅率やフィルタ
４３の通過周波数帯域を制御することにより、全周波数
帯域における音圧レベルの変化を調整して周波数に対す
る音圧レベルの傾きを制御することもでき、環境雑音を
低減するのに最適な重畳雑音データを発生させて更に音
声認識部４の誤認識を低減することができる。Further, according to this voice recognition device, the amplification factor of the amplifier 42 and the pass frequency band of the filter 43 are controlled according to the voice recognition result of the voice recognition unit 4, the sound pressure level of environmental noise, and the frequency spectrum. By adjusting the sound pressure level in all frequency bands, it is possible to control the slope of the sound pressure level with respect to the frequency, and generate the optimum superimposed noise data to reduce the environmental noise for further speech recognition. False recognition of the unit 4 can be reduced.

【００５８】［音声認識装置の第７構成例］つぎに、本
発明を適用した第７構成例に係る音声認識装置について
図１３を参照して説明する。なお、上述した構成例と同
じ部分ついては同一符号を付することによりその詳細な
説明を省略する。[Seventh Configuration Example of Voice Recognition Device] Next, a voice recognition device according to a seventh configuration example to which the present invention is applied will be described with reference to FIG. The same parts as those in the above-described configuration example are designated by the same reference numerals, and detailed description thereof will be omitted.

【００５９】第７構成例に係る音声認識装置は、音声取
り込み部１により取り込んだ音声信号を信号増幅部３１
及びＡ／Ｄ変換部３２により音声データにして、雑音重
畳部２に入力する。また、この音声認識装置は、環境雑
音測定部２２により非音声認識時における環境雑音を測
定して、平均雑音情報を重畳雑音選択部１２に供給し、
重畳雑音選択部１２により重畳雑音データを平均雑音情
報に基づいて選択して重畳雑音記憶部１１から読出し、
重畳雑音データ発生部４１により変形をして雑音重畳部
２に入力する。このとき、アンプ４２は、音声認識部４
からの尤度情報を入力して、尤度に応じて増幅率を設定
する。雑音重畳部２は、Ａ／Ｄ変換部３２からの音声デ
ータに重畳雑音データ発生部４１からの重畳雑音データ
を重畳して、音声認識部４に供給する。In the voice recognition apparatus according to the seventh configuration example, the voice signal captured by the voice capturing unit 1 is used as the signal amplifying unit 31.
Also, it is converted into voice data by the A / D converter 32 and input to the noise superimposing unit 2. Further, this speech recognition apparatus measures the environmental noise at the time of non-speech recognition by the environmental noise measuring unit 22 and supplies the average noise information to the superimposed noise selecting unit 12,
The superposed noise selection unit 12 selects superposed noise data based on the average noise information and reads it from the superposed noise storage unit 11,
The noise is generated by the superposed noise data generation unit 41 and input to the noise superposed unit 2. At this time, the amplifier 42 is operated by the voice recognition unit 4
The likelihood information from is input and the amplification factor is set according to the likelihood. The noise superimposing unit 2 superimposes the superposed noise data from the superposed noise data generating unit 41 on the voice data from the A / D conversion unit 32, and supplies the superposed noise data to the voice recognition unit 4.

【００６０】音声認識部４は、雑音重畳部２からの音声
データを入力して音声認識結果及び尤度情報を生成する
ための、フーリエ変換部５１、特徴量抽出部５２、音響
モデル記憶部５３、言語モデル記憶部５４、尤度計算部
５５を有する。The voice recognition unit 4 receives the voice data from the noise superposition unit 2 and generates a voice recognition result and likelihood information. The Fourier transform unit 51, the feature amount extraction unit 52, and the acoustic model storage unit 53. , A language model storage unit 54, and a likelihood calculation unit 55.

【００６１】この音声認識部４は、雑音重畳部２から音
声データを入力すると、フーリエ変換部５１によりフー
リエ変換処理をして図７に示した音声データの周波数ス
ペクトルを作成し、特徴量抽出部５２により周波数スペ
クトルのピークを検出してピークに対応した周波数帯域
を特徴量として抽出する。尤度計算部５５は、特徴量抽
出部５２から特徴量を入力すると、特徴量と音響モデル
記憶部５３に記憶された音響モデルとの比較をして最も
近似する音響モデルを選択する。音声認識部４は、音声
フレームごとにこのような処理を行うことで時系列に音
響モデルを選択して、時系列の音響モデルと言語モデル
記憶部５４に記憶された言語モデルとの比較をして音声
認識結果を作成する。また、この尤度計算部５５は、音
響モデルと特徴量との比較結果及び時系列の音響モデル
と言語モデルとの比較結果に基づいて、音声認識結果の
尤度を示す尤度情報を作成してアンプ４２に出力する。When the voice data is input from the noise superposition unit 2, the voice recognition unit 4 performs a Fourier transform process by the Fourier transform unit 51 to create the frequency spectrum of the voice data shown in FIG. 7, and the feature quantity extraction unit. The peak of the frequency spectrum is detected by 52, and the frequency band corresponding to the peak is extracted as a feature amount. When the feature amount is input from the feature amount extraction unit 52, the likelihood calculation unit 55 compares the feature amount with the acoustic model stored in the acoustic model storage unit 53 and selects the most approximate acoustic model. The speech recognition unit 4 selects the acoustic model in time series by performing such processing for each speech frame, and compares the acoustic model in time series with the language model stored in the language model storage unit 54. Create a speech recognition result. The likelihood calculator 55 also creates likelihood information indicating the likelihood of the speech recognition result based on the comparison result between the acoustic model and the feature amount and the comparison result between the time-series acoustic model and the language model. Output to the amplifier 42.

【００６２】このような音声認識装置によれば、環境雑
音測定部２２により測定された平均雑音情報に基づいて
重畳雑音記憶部１１に記憶された重畳雑音データを選択
すると共に、選択した重畳雑音データを尤度に従って変
形するので、更に環境雑音を低減することができ、音声
認識部４の誤認識を低減することができる。According to such a voice recognition device, the superposed noise data stored in the superposed noise storage unit 11 is selected based on the average noise information measured by the environmental noise measuring unit 22, and the selected superposed noise data is selected. Is transformed according to the likelihood, it is possible to further reduce the environmental noise and reduce the erroneous recognition of the voice recognition unit 4.

【００６３】［音声認識装置の第８構成例］つぎに、本
発明を適用した第８構成例に係る音声認識装置について
図１４を参照して説明する。なお、上述した構成例と同
じ部分ついては同一符号を付することによりその詳細な
説明を省略する。[Eighth Configuration Example of Speech Recognition Device] Next, a speech recognition device according to an eighth configuration example to which the present invention is applied will be described with reference to FIG. The same parts as those in the above-described configuration example are designated by the same reference numerals, and detailed description thereof will be omitted.

【００６４】第８構成例に係る音声認識装置は、アナロ
グ形式の重畳雑音信号を作成する重畳雑音信号発生回路
６１と、信号増幅部３１からの音声信号と重畳雑音信号
とを加算する加算器６２とを備える点で、上述の構成例
とは異なる。The speech recognition apparatus according to the eighth configuration example is a superposed noise signal generation circuit 61 for creating an analog superposed noise signal, and an adder 62 for adding the voice signal from the signal amplifier 31 and the superposed noise signal. The configuration is different from the above-described configuration example.

【００６５】重畳雑音信号発生回路６１は、駆動回路と
接続された雑音発生トランジスタ７１、音声認識部４と
接続されたＤ／Ａ変換回路７２、雑音発生トランジスタ
７１及びＤ／Ａ変換回路７２と接続された信号増幅回路
７３を備える。The superposed noise signal generation circuit 61 is connected to the noise generation transistor 71 connected to the drive circuit, the D / A conversion circuit 72 connected to the voice recognition unit 4, the noise generation transistor 71 and the D / A conversion circuit 72. The signal amplification circuit 73 is provided.

【００６６】雑音発生トランジスタ７１は、例えば小信
号トランジスタからなり、駆動回路によりベース−エミ
ッタ間に逆バイアスが印加されることで、熱雑音信号を
発生する。この雑音発生トランジスタ７１により発生し
た熱雑音信号は、回路結線を介して雑音信号増幅部７３
に入力される。The noise generating transistor 71 is composed of, for example, a small signal transistor, and generates a thermal noise signal when a reverse bias is applied between the base and the emitter by the driving circuit. The thermal noise signal generated by the noise generating transistor 71 is transferred to the noise signal amplifier 73 via the circuit connection.
Entered in.

【００６７】Ｄ／Ａ変換回路７２は、音声認識部４から
尤度に応じてレベルが異なるデータを入力し、Ｄ／Ａ変
換処理をして、尤度に応じたアナログ信号を信号増幅回
路７３に出力する。The D / A conversion circuit 72 inputs data having different levels according to the likelihood from the voice recognition unit 4, performs D / A conversion processing, and outputs an analog signal according to the likelihood to the signal amplification circuit 73. Output to.

【００６８】信号増幅回路７３は、雑音発生トランジス
タ７１からの熱雑音信号を、Ｄ／Ａ変換回路７２からの
アナログ信号レベルに応じて増幅して重畳雑音信号とし
て加算器６２に出力する。このように信号増幅回路７３
は、尤度に応じて増幅率を変化させることで、尤度に応
じた音圧レベルの重畳雑音信号を作成する。The signal amplification circuit 73 amplifies the thermal noise signal from the noise generation transistor 71 according to the analog signal level from the D / A conversion circuit 72 and outputs it as a superimposed noise signal to the adder 62. In this way, the signal amplification circuit 73
Generates an overlapping noise signal having a sound pressure level according to the likelihood by changing the amplification factor according to the likelihood.

【００６９】加算器６２は、信号増幅部３１からの音声
信号と重畳雑音信号発生回路６１からの重畳雑音信号と
を加算することにより、図６に示したような音声信号を
作成して、Ａ／Ｄ変換部３２に出力する。この加算器６
２は、上述の雑音重畳部２に相当し、雑音重畳手段とし
て機能する。加算器６２により作成された音声信号は、
Ａ／Ｄ変換部３２により音声データに変換されて音声認
識部４による音声認識の対象となる。The adder 62 creates a voice signal as shown in FIG. 6 by adding the voice signal from the signal amplifier 31 and the superposed noise signal from the superposed noise signal generation circuit 61, Output to the / D converter 32. This adder 6
2 corresponds to the noise superimposing unit 2 described above, and functions as noise superimposing means. The audio signal created by the adder 62 is
The voice data is converted into voice data by the A / D conversion unit 32 and becomes the target of voice recognition by the voice recognition unit 4.

【００７０】このように構成された音声認識装置によれ
ば、予め作成した重畳雑音データを記憶するためのＲＯ
Ｍや、重畳雑音データを作成したりディジタルデータの
重畳をするＤＳＰ（Digital Signal Processor）等、雑
音重畳の為の信号処理部を必要とせずに安価な構成で音
声信号に重畳雑音を重畳することができる。According to the voice recognition device having such a configuration, the RO for storing the superposed noise data created in advance is stored.
M, superimposing the superposed noise on the audio signal with an inexpensive configuration without requiring a signal processing unit for superimposing the noise such as a DSP (Digital Signal Processor) that creates superposed noise data or superimposes the digital data. You can

【００７１】また、この第８構成例において、周波数帯
域に応じて音圧レベルが異なる重畳雑音信号を作成する
ときには、図１５に示すように、信号増幅回路７３と加
算器６２との間にアナログフィルタ７４を設け、アナロ
グフィルタ７４の通過周波数帯域を制御する。これによ
り、高周波帯域になるに従って音圧レベルが傾斜した重
畳雑音信号を作成することができる。Further, in the eighth configuration example, when a superposed noise signal having different sound pressure levels depending on the frequency band is created, as shown in FIG. 15, an analog signal is provided between the signal amplification circuit 73 and the adder 62. A filter 74 is provided to control the pass frequency band of the analog filter 74. As a result, it is possible to create a superposed noise signal whose sound pressure level is inclined in the high frequency band.

【００７２】なお、上述の実施の形態は本発明の一例で
ある。このため、本発明は、上述の実施形態に限定され
ることはなく、この実施の形態以外であっても、本発明
に係る技術的思想を逸脱しない範囲であれば、設計等に
応じて種々の変更が可能であることは勿論である。The above-described embodiment is an example of the present invention. For this reason, the present invention is not limited to the above-described embodiment, and other than this embodiment, as long as it does not deviate from the technical idea of the present invention, various types according to the design etc. Of course, it is possible to change.

【００７３】[0073]

【発明の効果】本発明に係る音声認識装置及び方法によ
れば、少なくともユーザから発せられた音声を取り込ん
で音声信号を生成し、音声信号に、周波数の変化に対し
て信号レベルの傾斜を有する雑音信号を重畳して、ユー
ザから発せられた音声以外の周波数スペクトルを平坦化
する雑音信号を重畳し、雑音信号を重畳した音声信号に
音声認識処理をするので、音声信号に音声に似た周波数
スペクトルを有する環境雑音下においても、音声認識率
の低下を防ぐことができる。According to the speech recognition apparatus and method of the present invention, at least speech uttered by a user is captured to generate a speech signal, and the speech signal has a signal level gradient with respect to a change in frequency. A noise signal is superimposed to flatten the frequency spectrum other than the voice emitted by the user. A noise signal is superimposed on the speech signal, and speech recognition processing is performed on the speech signal. It is possible to prevent the voice recognition rate from decreasing even in the presence of environmental noise having a spectrum.

[Brief description of drawings]

【図１】本発明を適用した音声認識装置の基本構成を示
すブロック図である。FIG. 1 is a block diagram showing a basic configuration of a voice recognition device to which the present invention is applied.

【図２】雑音重畳部により音声データと重畳雑音データ
とを重畳するときの処理について説明するための図であ
り、（ａ）は音声データの時間的変化であり、（ｂ）は
重畳雑音データの時間的変化であり、（ｃ）は音声デー
タと重畳雑音データとを重畳した後の時間的変化であ
る。2A and 2B are diagrams for explaining processing when a noise superimposing unit superimposes voice data and superposed noise data, where FIG. 2A is a temporal change of the voice data, and FIG. (C) is a temporal change after superimposing the voice data and the superimposed noise data.

【図３】本発明を適用した音声認識装置の基本的な処理
手順を示すフローチャートである。FIG. 3 is a flowchart showing a basic processing procedure of a voice recognition device to which the present invention is applied.

【図４】音声取り込み部に取り込まれる音声データにお
ける音圧レベルと時間との関係を示す図である。FIG. 4 is a diagram showing a relationship between a sound pressure level and time in audio data captured by an audio capturing unit.

【図５】音声取り込み部に取り込まれる音声データにお
ける音圧レベルと周波数との関係を示す図である。FIG. 5 is a diagram showing a relationship between a sound pressure level and a frequency in audio data captured by an audio capturing unit.

【図６】雑音重畳部により音声データに重畳雑音データ
を重畳した後の音圧レベルと時間との関係を示す図であ
る。FIG. 6 is a diagram showing the relationship between the sound pressure level and time after superimposing the superimposed noise data on the voice data by the noise superimposing unit.

【図７】音声データに重畳雑音データを重畳したときの
音圧レベルと周波数との関係の他の一例を説明するため
の図である。FIG. 7 is a diagram for explaining another example of the relationship between the sound pressure level and the frequency when the superimposed noise data is superimposed on the audio data.

【図８】本発明を適用した音声認識装置の第２構成例を
示すブロック図である。FIG. 8 is a block diagram showing a second configuration example of a voice recognition device to which the present invention has been applied.

【図９】本発明を適用した音声認識装置の第３構成例を
示すブロック図である。FIG. 9 is a block diagram showing a third configuration example of a voice recognition device to which the present invention has been applied.

【図１０】本発明を適用した音声認識装置の第４構成例
を示すブロック図である。FIG. 10 is a block diagram showing a fourth configuration example of a voice recognition device to which the present invention has been applied.

【図１１】本発明を適用した音声認識装置の第５構成例
を示すブロック図である。FIG. 11 is a block diagram showing a fifth configuration example of a voice recognition device to which the present invention has been applied.

【図１２】本発明を適用した音声認識装置の第６構成例
を示すブロック図である。FIG. 12 is a block diagram showing a sixth configuration example of a voice recognition device to which the present invention has been applied.

【図１３】本発明を適用した音声認識装置の第７構成例
を示すブロック図である。FIG. 13 is a block diagram showing a seventh configuration example of a voice recognition device to which the present invention has been applied.

【図１４】本発明を適用した音声認識装置の第８構成例
を示すブロック図である。FIG. 14 is a block diagram showing an eighth configuration example of a voice recognition device to which the present invention has been applied.

【図１５】本発明を適用した音声認識装置の第８構成例
の他の回路を示すブロック図である。FIG. 15 is a block diagram showing another circuit of the eighth configuration example of the voice recognition device to which the present invention is applied.

【図１６】従来の音声認識システムの構成を示すブロッ
ク図である。FIG. 16 is a block diagram showing a configuration of a conventional voice recognition system.

[Explanation of symbols]

１音声取り込み部２雑音重畳部３重畳雑音データ出力部４音声認識部１１重畳雑音記憶部１２重畳雑音選択部２１環境雑音取り込み部２２環境雑音測定部３１信号増幅部３２Ａ／Ｄ変換部３３雑音除去部４１重畳雑音データ発生部４２アンプ４３フィルタ５１フーリエ変換部５２特徴量抽出部５３音響モデル記憶部５４言語モデル記憶部５５尤度計算部６１重畳雑音信号発生回路６２加算器７１雑音発生トランジスタ７２Ｄ／Ａ変換回路７３信号増幅回路 1 Audio capture unit 2 Noise superimposition section 3 Superposed noise data output section 4 Speech recognition section 11 Superposed noise memory 12 Superposed noise selector 21 Environmental noise capture unit 22 Environmental noise measurement section 31 signal amplifier 32 A / D converter 33 Noise removal unit 41 Superposed noise data generator 42 amp 43 filters 51 Fourier transform unit 52 Feature Extraction Unit 53 Acoustic model storage 54 Language Model Storage 55 Likelihood calculator 61 Superposed noise signal generation circuit 62 adder 71 Noise generation transistor 72 D / A conversion circuit 73 Signal amplification circuit

───────────────────────────────────────────────────── フロントページの続き (72)発明者中森勝大阪府門真市大字門真1048番地松下電工株式会社内 (72)発明者久保山晴弘大阪府門真市大字門真1048番地松下電工株式会社内 (72)発明者藤井寿隆大阪府門真市大字門真1048番地松下電工株式会社内Ｆターム(参考） 5D015 EE05 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Masaru Nakamori 1048, Kadoma, Kadoma-shi, Osaka Matsushita Electric Works Within the corporation (72) Inventor Haruhiro Kuboyama 1048, Kadoma, Kadoma-shi, Osaka Matsushita Electric Works Within the corporation (72) Inventor Toshitaka Fujii 1048, Kadoma, Kadoma-shi, Osaka Matsushita Electric Works Within the corporation F-term (reference) 5D015 EE05

Claims

[Claims]

1. A voice capturing means for capturing at least a voice uttered by a user to generate a voice signal, and a voice signal captured by the voice capturing means,
A noise superimposing unit that superimposes a noise signal having a slope of a signal level with respect to a change in frequency to reduce a signal peak of a voice other than a voice uttered by a user, and the noise signal is superposed by the noise superimposing unit. A voice recognition device comprising: a voice recognition means for performing voice recognition processing on a voice signal.

2. The voice recognition device according to claim 1, wherein the noise superimposing means superimposes a noise signal whose sound pressure level gradually decreases in a high frequency band on a voice signal.

3. A noise storage means for storing a plurality of noise patterns is further provided, and the noise superimposing means selects a noise pattern stored in the noise storage means and superimposes a noise signal on a voice signal. The voice recognition device according to claim 1.

4. The apparatus further comprises noise measuring means for measuring the environmental noise level around the user from the audio signal generated by the audio capturing means, wherein the noise superimposing means sets the environmental noise level measured by the noise measuring means. 4. The voice recognition device according to claim 3, wherein the noise pattern stored in the noise storage means is selected in response to the noise pattern.

5. The voice recognition device according to claim 3, wherein the noise superimposing means selects a noise pattern stored in the noise storing means based on a recognition result of the voice recognizing means.

6. The apparatus further comprises noise creating means for modifying a template noise signal to create a plurality of types of noise signals, wherein the noise superimposing means superimposes the noise signal created by the noise creating means on a voice signal. The voice recognition device according to claim 3, wherein

7. The apparatus further comprises noise measuring means for measuring the ambient noise level around the user from the voice signal generated by the voice capturing means, and the noise creating means sets the ambient noise level measured by the noise measuring means. 7. The voice recognition device according to claim 6, wherein the template noise signal is transformed in accordance with the template noise signal.

8. The voice recognition device according to claim 6, wherein the noise generation unit transforms the model noise signal based on the recognition result of the voice recognition unit.

9. A noise removing means for removing environmental noise of the audio signal generated by the audio capturing means, wherein the noise superimposing means adds a noise signal to the audio signal from which the environmental noise is removed by the noise removing means. The voice recognition device according to claim 1, wherein the voice recognition device is superimposed.

10. The noise superimposing means compares the audio signal level from the audio capturing means with the noise signal level to be superimposed, and selects a signal having a large absolute value of the signal level as the superimposed audio signal. The voice recognition device according to claim 1, wherein

11. A voice signal generated by capturing at least a voice uttered by a user, generating a voice signal, superimposing a noise signal having a signal level inclination with respect to a change in frequency on the voice signal, A voice recognition method comprising: superimposing a noise signal for flattening a frequency spectrum other than voice, and performing voice recognition processing on the voice signal on which the noise signal is superposed.