JP2009031809A

JP2009031809A - Speech recognition apparatus

Info

Publication number: JP2009031809A
Application number: JP2008240988A
Authority: JP
Inventors: Hiroshi Ono; 宏大野
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2008-09-19
Filing date: 2008-09-19
Publication date: 2009-02-12

Abstract

<P>PROBLEM TO BE SOLVED: To improve accuracy of speech recognition by appropriately removing a noise component included in a speech signal. <P>SOLUTION: A speech recognition apparatus includes an adaptive filter for removing the noise component from the speech signal which is input from a microphone. An LMS learning section repeatedly learns a filter coefficient based on an LMS method (S220), and the filter coefficient obtained as the result of the learning is set to the adaptive filter (S230). When a learning prohibition instruction is input from a control section concurrently with the start of the speech recognition, learning of the filter coefficient is stopped. Thereafter, when the speech signal required for the speech recognition is given to the speech recognition section, the filter coefficient is learned and updated again, according to a learning resumption instruction which is input from the control section (S260). <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、マイクロフォンから得た音声信号に基づいて、利用者がマイクロフォンに入力した音声を認識する音声認識装置に関する。 The present invention relates to a speech recognition apparatus that recognizes speech input to a microphone by a user based on a speech signal obtained from a microphone.

従来より、利用者から発せられた音声をマイクロフォンで集音し、これを予め認識語として記憶された音声のパターンと比較し、一致度の高い認識語を利用者が発声した語彙であると認識する音声認識装置が知られている。この種の音声認識装置は、例えばカーナビゲーション装置などに組み込まれている。 Conventionally, a voice uttered by a user is collected by a microphone, and compared with a voice pattern stored in advance as a recognized word, a recognized word with a high degree of matching is recognized as a vocabulary spoken by the user. A voice recognition device is known. This type of speech recognition device is incorporated in, for example, a car navigation device.

このような音声認識装置の音声認識率（音声認識の正解率）は、マイクロフォンから入力される音声信号に含まれる雑音量によって左右されることがよく知られているが、特に自動車などの車両内においては、車載オーディオ機器の動作時に、再生される音楽等が雑音として利用者の音声と共にマイクロフォンで集音されてしまう問題がある。 It is well known that the speech recognition rate (accuracy rate of speech recognition) of such a speech recognition device depends on the amount of noise included in the speech signal input from the microphone. However, there is a problem that music or the like to be reproduced is collected by a microphone together with the user's voice as noise during operation of the in-vehicle audio device.

この問題に対し、従来では、車載オーディオ機器と、音声認識装置とを連動させ、音声認識処理の際に、車載オーディオ機器にて再生される音楽等のボリュームを調節する（例えば車載オーディオ機器をミュートに設定する）ことで、再生される音楽等がマイクロフォンに入力されないようにし、一定度以上の音声認識率を確保するようにしていた。尚、このような先行技術は周知公用のため、関連文献を非開示とする。 To solve this problem, conventionally, the volume of music played on the in-vehicle audio device is adjusted during the speech recognition process by linking the in-vehicle audio device and the voice recognition device (for example, muting the in-vehicle audio device). In other words, the reproduced music or the like is not input to the microphone, and a voice recognition rate of a certain level or more is ensured. Since such prior art is well known and publicly used, related documents are not disclosed.

しかしながら、従来の音声認識装置では、車載オーディオ機器にて再生される音楽等のボリュームを調節するので、一時的に音楽等が利用者に聞こえなくなってしまい、その事が原因で利用者に不満が及ぶ可能性があった。 However, the conventional speech recognition apparatus adjusts the volume of music or the like played on the in-vehicle audio device, so the music or the like is temporarily inaudible to the user, which causes the user to be dissatisfied. There was a possibility.

そこで、本発明者らは、雑音源（車載オーディオ機器）から得られる参照信号に基づき、マイクロフォンから得られる音声信号に含まれる雑音成分を学習し、その音声信号から学習した雑音成分を除去する雑音除去部を音声認識装置に設けることにした。 Therefore, the present inventors learn a noise component included in an audio signal obtained from a microphone based on a reference signal obtained from a noise source (vehicle audio device), and remove noise learned from the audio signal. The removal unit is provided in the speech recognition apparatus.

しかしながら、最小二乗平均（ＬｅａｓｔＭｅａｎＳｑｕａｒｅ：ＬＭＳ）法などの周知の学習法では、雑音除去後の信号が小さくなる方向に学習を繰り返すため、利用者がマイクロフォンに音声を入力している間に雑音成分の学習が繰り返されると、その利用者による発話の影響を受けて雑音除去部が誤学習をし、音声信号に含まれる雑音成分が適切に除去できなくなってしまう問題があった。したがって、このような雑音除去部を音声認識装置に導入しても、音声認識の精度向上には限界があった。 However, in a known learning method such as the least mean square (LMS) method, learning is repeated in such a direction that the signal after noise reduction becomes smaller, so that noise is generated while the user is inputting sound into the microphone. When the component learning is repeated, there is a problem in that the noise removing unit erroneously learns due to the influence of the utterance by the user, and the noise component included in the voice signal cannot be removed appropriately. Therefore, even if such a noise removing unit is introduced into the speech recognition apparatus, there is a limit to improving speech recognition accuracy.

本発明はこうした問題に鑑みなされたものであり、音声認識対象の音声信号に含まれる雑音成分を適切に除去して高精度に音声認識可能な音声認識装置を提供することを目的とする。 The present invention has been made in view of these problems, and an object of the present invention is to provide a speech recognition apparatus capable of accurately recognizing speech by appropriately removing a noise component contained in a speech signal to be recognized.

かかる目的を達成するためになされた請求項１に記載の音声認識装置によれば、予め設定されたフィルタ係数に従い雑音源から入力される参照信号を濾波することで雑音除去信号生成手段が生成した雑音除去信号を用いて、雑音除去手段が、マイクロフォンから入力される音声信号に含まれる雑音成分を除去し、雑音除去後の音声信号を出力する。また、この音声認識装置は、係数更新手段を備えており、雑音除去手段から出力される音声信号に基づき、係数更新手段にて、雑音除去信号生成手段に設定すべきフィルタ係数を学習し、その結果得たフィルタ係数を、雑音除去信号生成手段に対して設定する。 According to the speech recognition apparatus of claim 1 made to achieve the above object, the noise removal signal generating means generates the reference signal input from the noise source in accordance with a preset filter coefficient. Using the noise removal signal, the noise removal means removes a noise component included in the voice signal input from the microphone, and outputs the voice signal after the noise removal. In addition, the speech recognition apparatus includes a coefficient updating unit. The coefficient updating unit learns a filter coefficient to be set in the noise removal signal generation unit based on the voice signal output from the noise removal unit. The obtained filter coefficient is set for the noise removal signal generating means.

一方、音声認識手段は、外部（例えば、ＰＴＴスイッチ等の操作スイッチ）から動作開始指令が入力されると、所定期間、雑音除去手段から出力される音声信号を取得し、その音声信号に基づき、マイクロフォンに入力された音声を認識する。 On the other hand, when an operation start command is input from the outside (for example, an operation switch such as a PTT switch), the voice recognition unit acquires a voice signal output from the noise removal unit for a predetermined period, and based on the voice signal, Recognizes the voice input to the microphone.

また、この音声認識装置では、学習速度切替手段が、音声認識手段の非動作時に、第一の学習速度で、係数更新手段にフィルタ係数を学習させ、音声認識手段が雑音除去手段から出力される音声信号を取得している間には、第一の学習速度より低い第二の学習速度で、係数更新手段にフィルタ係数を学習させる。 In this speech recognition apparatus, the learning speed switching means causes the coefficient updating means to learn the filter coefficient at the first learning speed when the speech recognition means is not operating, and the speech recognition means is output from the noise removal means. While the voice signal is being acquired, the coefficient updating means is made to learn the filter coefficient at a second learning speed lower than the first learning speed.

マイクロフォンに入力される利用者の音声は、定常及び準定常的な音とは異なり、突発的に発生する非定常的な音声であることから、音声認識手段が雑音除去手段から出力される音声信号を取得している期間、フィルタ係数の学習速度を遅くすれば、フィルタ係数の学習時に利用者の音声が与える影響を抑えることができ、係数更新手段によるフィルタ係数の誤学習を抑制することができる。 Unlike the normal and quasi-stationary sounds, the user's voice input to the microphone is a non-stationary voice that occurs suddenly. Therefore, the voice signal output from the noise removal unit by the voice recognition unit If the learning rate of the filter coefficient is slowed during the period when the filter coefficient is acquired, the influence of the user's voice during learning of the filter coefficient can be suppressed, and erroneous learning of the filter coefficient by the coefficient updating means can be suppressed. .

即ち、請求項１に記載の音声認識装置によれば、従来と比較して、適切にフィルタ係数の学習を係数更新手段に実行させることができ、雑音除去の精度を向上させることができる。したがって、本発明によれば、音声認識装置における音声認識の精度を向上させることができる。 That is, according to the speech recognition apparatus of the first aspect, the coefficient update unit can appropriately perform learning of the filter coefficient, and the noise removal accuracy can be improved as compared with the conventional case. Therefore, according to the present invention, the accuracy of speech recognition in the speech recognition apparatus can be improved.

尚、学習速度切替手段は、少なくとも音声認識手段が雑音除去手段から音声信号の取得を開始した時点から終了する時点まで、第二の学習速度で、係数更新手段にフィルタ係数の学習を行わせる構成にされていればよく、例えば、音声信号の取得完了後、音声認識手段による音声の認識が完了し音声認識手段の動作が停止するまで、第二の学習速度で、フィルタ係数の学習を係数更新手段に実行させても構わない。 The learning speed switching means is configured to cause the coefficient updating means to learn the filter coefficient at the second learning speed at least from the time when the voice recognition means starts to acquire the voice signal from the noise removal means to the time when it ends. For example, after the acquisition of the audio signal is completed, the learning of the filter coefficient is updated at the second learning speed until the recognition of the audio by the audio recognition unit is completed and the operation of the audio recognition unit stops. The means may be executed.

また、学習速度切替手段は、少なくとも音声認識手段の非動作時に第一の学習速度で係数更新手段にフィルタ係数を学習させる構成にされていればよく、音声認識手段の非動作時に加え、音声認識手段の動作時であって音声認識手段が雑音除去手段から出力される音声信号を取得していない期間に、第一の学習速度で係数更新手段にフィルタ係数を学習させる構成にされていてもよい。即ち、学習速度切替手段は、音声認識手段が音声の認識を行っているか否かにかかわらず、音声認識手段が音声信号を取得した直後から第一の学習速度で係数更新手段にフィルタ係数の学習を行わせる構成にされていてもよい。 The learning speed switching means may be configured to cause the coefficient updating means to learn the filter coefficient at the first learning speed at least when the speech recognition means is not operating. The filter updating unit may be made to learn the filter coefficient at the first learning speed during the operation of the unit and during the period when the voice recognition unit does not acquire the voice signal output from the noise removing unit. . That is, the learning speed switching means learns the filter coefficient to the coefficient updating means at the first learning speed immediately after the voice recognition means acquires the voice signal regardless of whether or not the voice recognition means performs voice recognition. It may be configured to perform.

その他、本発明の音声認識装置における音声認識手段は、外部から動作開始指令が入力された後、利用者による発声がなされた発声期間に限定して、雑音除去手段から出力される音声信号を取得する構成にされていると良い。音声認識手段をこのような構成とすれば、利用者による発話内容が含まれない雑音区間の音声信号を、音声認識の際に用いずに済み、音声認識の精度が向上する。 In addition, the speech recognition means in the speech recognition apparatus of the present invention acquires the speech signal output from the noise removal means only during the utterance period in which the user uttered after the operation start command was input from the outside. It is good to be configured to do. If the speech recognition means has such a configuration, it is not necessary to use a speech signal in a noise section that does not include the content of the utterance by the user at the time of speech recognition, and the accuracy of speech recognition is improved.

また、このように利用者による発声期間の音声信号を選択的に音声認識手段に取得させる場合には、雑音除去手段から出力される音声信号に基づいて、利用者による発声がなされた発声期間を判別し、雑音除去手段から出力される音声信号の内、その発声期間に該当する音声信号のみを選択的に、音声認識手段に取得させる取得制御手段を装置内に設ければ良い。 Further, when the voice recognition unit selectively acquires the voice signal during the utterance period by the user as described above, the utterance period during which the user uttered is changed based on the voice signal output from the noise removal unit. It is only necessary to provide in the apparatus an acquisition control means for making the voice recognition means selectively obtain only the voice signal corresponding to the utterance period among the voice signals output from the noise removal means.

ところで、動作開始指令と同時に音声認識手段が雑音除去手段から音声信号を取得しない場合には、音声認識手段が雑音除去手段から出力される音声信号を取得している期間のみ第二の学習速度でフィルタ係数が学習されるようにすると、装置構成が煩雑になる可能性がある。 By the way, when the voice recognition unit does not acquire a voice signal from the noise removal unit at the same time as the operation start command, only the period during which the voice recognition unit acquires the voice signal output from the noise removal unit is obtained at the second learning speed. If the filter coefficients are learned, the apparatus configuration may become complicated.

したがって、上述の音声認識装置においては、動作開始指令が音声認識手段に入力されると同時に、第二の学習速度で、係数更新手段にフィルタ係数を学習させ、音声認識手段が音声信号の取得を終了するまでの期間は、第二の学習速度によるフィルタ係数の学習を係数更新手段に継続させるように、学習速度切替手段を構成するとよい。 Therefore, in the above-described voice recognition device, the operation start command is input to the voice recognition unit, and at the same time, the coefficient update unit learns the filter coefficient at the second learning speed, and the voice recognition unit acquires the voice signal. The learning speed switching means may be configured so that the coefficient update means continues to learn the filter coefficient at the second learning speed during the period until the end.

このような構成にされた請求項２に記載の音声認識装置によれば、音声認識手段に外部から動作開始指令が入力されたか否かを監視する程度で、音声認識手段が雑音除去手段から音声信号を取得する際には、係数更新手段に第二の学習速度でフィルタ係数を学習させることができる。つまり、この音声認識装置によれば、簡単な装置構成（制御）で、係数更新手段の学習速度を適切に切り替えることができる。 According to the speech recognition apparatus of the second aspect configured as described above, the speech recognition means performs speech to the speech recognition means from the noise removal means only by monitoring whether or not an operation start command is externally input to the speech recognition means. When acquiring the signal, the coefficient updating means can learn the filter coefficient at the second learning speed. That is, according to this voice recognition device, the learning speed of the coefficient updating means can be appropriately switched with a simple device configuration (control).

その他、上述した発明は、請求項３に記載のように、係数更新手段が、ＬＭＳ法を用いて、雑音除去信号生成手段に設定すべきフィルタ係数を学習する音声認識装置に適用される良い。 In addition, the above-described invention may be applied to a speech recognition apparatus in which the coefficient updating unit learns a filter coefficient to be set in the noise removal signal generating unit using the LMS method.

ＬＭＳ法を用いる場合には、マイクロフォンに入力される音声に、雑音源以外の音源から発生した音声（即ち、利用者の音声）が含まれると、フィルタ係数の誤学習を引き起こしやすい。したがって、請求項３に記載のように、ＬＭＳ法を用いて学習を行う音声認識装置に、本発明（請求項１又は請求項２）を適用すれば、音声認識の精度を効果的に向上させることができる。 When the LMS method is used, if the voice input to the microphone includes voice generated from a sound source other than the noise source (that is, user voice), erroneous learning of the filter coefficient is likely to occur. Therefore, if the present invention (Claim 1 or Claim 2) is applied to a speech recognition apparatus that performs learning using the LMS method as described in Claim 3, the accuracy of speech recognition is effectively improved. be able to.

また、請求項１〜請求項３に記載の発明は、請求項４に記載のように、雑音源がオーディオ機器である音声認識装置に適用されるとよい。
請求項４に記載の音声認識装置によれば、オーディオ機器の動作によりスピーカから再生される音楽等のボリュームを調節しなくても、高精度に音声認識を行うことが可能であるので、便利である。 In addition, the invention described in claims 1 to 3 may be applied to a speech recognition apparatus in which the noise source is an audio device as described in claim 4.
According to the voice recognition device of the fourth aspect, it is possible to perform voice recognition with high accuracy without adjusting the volume of music or the like reproduced from the speaker by the operation of the audio device. is there.

以下に本発明の実施例について、図面とともに説明する。尚、図１は、音声認識装置１の概略構成を表すブロック図である。
図１に示す本実施例の音声認識装置１は、カーナビゲーション装置３に接続されており、マイクロフォン５に入力された利用者の音声を認識し、その音声に従う操作信号をカーナビゲーション装置３に入力することで、利用者の音声に従った操作をカーナビゲーション装置３に対して施す。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram illustrating a schematic configuration of the speech recognition apparatus 1.
A voice recognition device 1 according to this embodiment shown in FIG. 1 is connected to a car navigation device 3, recognizes a user's voice input to a microphone 5, and inputs an operation signal according to the voice to the car navigation device 3. Thus, an operation according to the user's voice is performed on the car navigation device 3.

この音声認識装置１は、主に、マイクロフォン５及び車載オーディオ機器７にアナログ−デジタル変換器（ＡＤＣ）１１，１３を介して接続されたオーディオキャンセラ部２０と、音声抽出部３１と、音声認識部３３と、ＰＴＴ（ＰｕｓｈｔｏＴａｌｋ）スイッチ３５と、制御部３７と、音声合成部３９と、から構成されている。 The speech recognition apparatus 1 mainly includes an audio canceller unit 20, a speech extraction unit 31, and a speech recognition unit connected to the microphone 5 and the in-vehicle audio device 7 via analog-digital converters (ADC) 11 and 13. 33, a PTT (Push to Talk) switch 35, a control unit 37, and a speech synthesis unit 39.

オーディオキャンセラ部２０は、主に、適応フィルタ２１と、減算部２３と、ＬＭＳ学習部２５と、から構成されており、マイクロフォン５からＡＤＣ１１を介して入力される音声信号ｙ（ｔ）を減算部２３に入力すると共に、車載オーディオ機器７からスピーカ９に入力されるオーディオ信号ｘ（ｔ）をＡＤＣ１３から取得し、そのオーディオ信号ｘ（ｔ）を適応フィルタ２１に入力する。 The audio canceller unit 20 is mainly composed of an adaptive filter 21, a subtracting unit 23, and an LMS learning unit 25, and a subtracting unit for the audio signal y (t) input from the microphone 5 via the ADC 11. The audio signal x (t) input to the speaker 9 from the in-vehicle audio device 7 is acquired from the ADC 13 and the audio signal x (t) is input to the adaptive filter 21.

適応フィルタ２１は、フィルタ係数ｗを記憶する図示しないレジスタ等を備えている。
ｗ＝（ｗ［０］，ｗ［１］，…，ｗ［Ｊ］）^T …式（１）
尚、上付き記号Ｔは、転置行列を意味する。また、パラメータＪ＋１は、タップ長を表す。 The adaptive filter 21 includes a register (not shown) that stores the filter coefficient w.
w = (w [0], w [1], ..., w [J]) ^T ... Formula (1)
The superscript T means a transposed matrix. The parameter J + 1 represents the tap length.

この適応フィルタ２１は、ＬＭＳ学習部２５の動作（詳細後述）により予めレジスタに設定されたフィルタ係数ｗと、雑音源としての車載オーディオ機器７から参照信号として得た上記オーディオ信号ｘ（ｔ）とを、次式に代入し演算することで、オーディオ信号ｘ（ｔ）をフィルタ係数ｗに従い濾波し、音声信号ｙ（ｔ）から雑音成分を除去するための雑音除去信号ｃ（ｔ）を生成する。そして、雑音除去信号ｃ（ｔ）を減算部２３に入力する。 The adaptive filter 21 includes a filter coefficient w set in a register in advance by an operation of the LMS learning unit 25 (details will be described later), and the audio signal x (t) obtained as a reference signal from the in-vehicle audio device 7 as a noise source. Is calculated by substituting into the following equation, the audio signal x (t) is filtered according to the filter coefficient w, and the noise removal signal c (t) for removing the noise component from the audio signal y (t) is generated. . Then, the noise removal signal c (t) is input to the subtraction unit 23.

ｃ（ｔ）＝ｘ^T・ｗ …式（２）
但し、パラメータｘは、下式で表されるオーディオ信号ｘ（ｔ）の時系列ベクトルである。また、ここでいうパラメータｔは、サンプリング周期を単位とする時間パラメータである。 c (t) = x ^T · w Equation (2)
However, the parameter x is a time series vector of the audio signal x (t) represented by the following equation. The parameter t here is a time parameter with a sampling period as a unit.

ｘ＝（ｘ（ｔ），ｘ（ｔ−１），
ｘ（ｔ−２），…ｘ（ｔ−Ｊ））^T …式（３）
一方、減算部２３は、ＡＤＣ１１を介してマイクロフォン５より入力される音声信号ｙ（ｔ）から雑音除去信号ｃ（ｔ）を減算することで、音声信号ｙ（ｔ）に含まれる雑音成分（即ち、車載オーディオ機器７の動作によりスピーカ９から再生される音声成分）を除去し、雑音除去後の音声信号ｚ（ｔ）を得る。 x = (x (t), x (t−1),
x (t-2), ... x (t-J)) ^T ... Formula (3)
On the other hand, the subtracting unit 23 subtracts the noise removal signal c (t) from the audio signal y (t) input from the microphone 5 via the ADC 11, so that the noise component included in the audio signal y (t) (that is, The audio component reproduced from the speaker 9 by the operation of the in-vehicle audio device 7 is removed, and the audio signal z (t) after noise removal is obtained.

ｚ（ｔ）＝ｙ（ｔ）−ｃ（ｔ） …式（４）
また、減算部２３は、減算した結果得られた雑音除去後の音声信号ｚ（ｔ）を、音声抽出部３１に入力する。 z (t) = y (t) −c (t) (formula 4)
Further, the subtractor 23 inputs the audio signal z (t) after noise removal obtained as a result of the subtraction to the audio extractor 31.

音声抽出部３１は、制御部３７からの動作開始指令を受けて動作を開始する構成にされており、動作を開始すると、オーディオキャンセラ部２０から入力された雑音除去後の音声信号ｚ（ｔ）が、音声区間（即ち、利用者による発声がなされた発声期間）の信号であるか、それとも、利用者の音声が含れず音声区間に属さない雑音区間の信号であるのかを判定し、音声区間の信号であると判定した場合には、その音声信号ｚ（ｔ）を音声認識部３３に入力する。そして音声区間が終了すると、動作を停止する。 The voice extraction unit 31 is configured to start operation upon receiving an operation start command from the control unit 37. When the operation starts, the voice signal z (t) after noise removal input from the audio canceller unit 20 is started. Is a signal in a speech section (ie, a speech period in which the user utters) or a noise section signal that does not include the user's voice and does not belong to the speech section, If it is determined that the signal is a signal, the voice signal z (t) is input to the voice recognition unit 33. When the voice section ends, the operation is stopped.

尚、判定方法としては、例えば入力信号の短時間パワーを一定時間毎に抽出していき、所定の閾値以上の短時間パワーが一定以上継続したか否かによって音声区間であるか雑音区間であるかを判定する手法がよく採用されている。 As a determination method, for example, the short-time power of the input signal is extracted at fixed time intervals, and is a voice interval or a noise interval depending on whether or not the short-time power equal to or greater than a predetermined threshold continues for a certain time. The method of determining whether or not is often adopted.

一方、音声認識部３３は、制御部３７から入力される動作開始指令に従い動作を開始し、音声抽出部３１から出力される音声信号ｚ（ｔ）を取得することにより、音声抽出部３１を介して、減算部２３から音声区間の信号ｚ（ｔ）を選択的に取得する。また、音声認識部３３は、音声信号ｚ（ｔ）の取得後に、その音声信号ｚ（ｔ）を音響分析し、音声信号ｚ（ｔ）から特徴量（例えばケプストラム）を抽出することで、特徴量の時系列データを得る。 On the other hand, the voice recognition unit 33 starts operation according to the operation start command input from the control unit 37 and acquires the voice signal z (t) output from the voice extraction unit 31, so that the voice recognition unit 33 passes through the voice extraction unit 31. Thus, the signal z (t) of the voice section is selectively acquired from the subtracting unit 23. Further, the voice recognition unit 33 acoustically analyzes the voice signal z (t) after obtaining the voice signal z (t), and extracts a feature amount (for example, a cepstrum) from the voice signal z (t). Get quantity time-series data.

その後、音声認識部３３は、特徴量の時系列データを、周知の技法を用いて、自身が備える図示しない音声辞書に登録された音声パターンと比較し、一致度の高い音声パターンに対応する語彙を、利用者が発声した語彙であると認識して、その認識結果を制御部３７に入力し、この後動作を停止する。 After that, the speech recognition unit 33 compares the time-series data of the feature amount with a speech pattern registered in a speech dictionary (not shown) provided in itself using a well-known technique, and the vocabulary corresponding to the speech pattern having a high degree of coincidence. Is recognized as a vocabulary spoken by the user, the recognition result is input to the control unit 37, and then the operation is stopped.

制御部３７は、ＰＴＴスイッチ３５が押されたタイミングや戻されたタイミングを監視する構成にされており、ＰＴＴスイッチ３５が押され、ＰＴＴスイッチ３５から動作開始指令信号が入力されたと判断すると（Ｓ１００でＹｅｓ）、オーディオキャンセラ部２０のＬＭＳ学習部２５に対して学習禁止指令を入力し（Ｓ１１０）、その後に音声認識部３３及び音声抽出部３１に動作開始指令を入力することで、音声認識部３３及び音声抽出部３１を作動させて、音声認識を開始する（Ｓ１２０）。尚、図２は、制御部３７の処理動作を表すフローチャートである。 The control unit 37 is configured to monitor the timing when the PTT switch 35 is pressed or returned, and determines that the PTT switch 35 is pressed and an operation start command signal is input from the PTT switch 35 (S100). Yes), a learning prohibition command is input to the LMS learning unit 25 of the audio canceller unit 20 (S110), and then an operation start command is input to the voice recognition unit 33 and the voice extraction unit 31, thereby the voice recognition unit. 33 and the voice extraction unit 31 are operated to start voice recognition (S120). FIG. 2 is a flowchart showing the processing operation of the control unit 37.

その後、制御部３７は、音声区間が終了し音声認識部３３による音声信号の取得が完了したか否かを、音声抽出部３１の動作状態に基づき判断し（Ｓ１３０）、音声区間が終了したと判断すると（Ｓ１３０でＹｅｓ）、ＬＭＳ学習部２５に学習再開指令を入力する（Ｓ１４０）と共に、音声認識部３３から認識結果を取得する（Ｓ１５０）。そして認識結果が正しいか否かを利用者に問い合わせるために、認識結果をトークバックする（Ｓ１６０）。 After that, the control unit 37 determines whether or not the voice section is finished and the voice recognition unit 33 has completed the acquisition of the voice signal based on the operation state of the voice extraction unit 31 (S130), and the voice section is finished. When the determination is made (Yes in S130), a learning restart command is input to the LMS learning unit 25 (S140), and a recognition result is acquired from the speech recognition unit 33 (S150). Then, in order to inquire the user whether or not the recognition result is correct, the recognition result is talked back (S160).

即ち、制御部３７は、音声合成部３９を制御して、音声合成部３９に、認識結果に従う音声信号を生成させ、その音声信号をスピーカ９に入力させる。尚、音声合成部３９は、図示しない波形データベース内に格納されている音声波形を用い、制御部３７からの音声の出力指示に基づく音声信号を合成してスピーカ９に出力する。従って、Ｓ１６０においては、認識結果が音声で利用者に通知される。 That is, the control unit 37 controls the voice synthesis unit 39 to cause the voice synthesis unit 39 to generate a voice signal according to the recognition result and input the voice signal to the speaker 9. The voice synthesizer 39 synthesizes a voice signal based on a voice output instruction from the controller 37 using a voice waveform stored in a waveform database (not shown) and outputs the synthesized voice signal to the speaker 9. Therefore, in S160, the recognition result is notified to the user by voice.

この後、制御部３７は、利用者の操作によりＰＴＴスイッチ３５等の操作スイッチから認識結果が正しいことを表す認識結果確定信号が入力されたか否か判断し（Ｓ１７０）、認識結果確定信号が入力されたと判断すると（Ｓ１７０でＹｅｓ）、確定後処理を実行する（Ｓ１８０）。一方、認識結果確定信号が入力されなかったと判断すると（Ｓ１７０でＮｏ）、確定後処理を実行せずに、当該処理を終了する。 Thereafter, the control unit 37 determines whether or not a recognition result confirmation signal indicating that the recognition result is correct is input from an operation switch such as the PTT switch 35 by the user's operation (S170), and the recognition result confirmation signal is input. If it is determined (Yes in S170), post-determination processing is executed (S180). On the other hand, if it is determined that the recognition result confirmation signal has not been input (No in S170), the process is terminated without executing the post-confirmation process.

尚、Ｓ１８０で行われる確定後処理において、制御部３７は、認識結果に従う操作信号をカーナビゲーション装置３に入力する。このような確定後処理は、周知の技術を用いたものであるので、詳細な説明を省略する。 In the post-confirmation process performed in S180, the control unit 37 inputs an operation signal according to the recognition result to the car navigation device 3. Such post-determination processing uses a well-known technique and will not be described in detail.

次に、オーディオキャンセラ部２０のＬＭＳ学習部２５の処理動作について図３を用いて説明する。図３は、音声認識装置１に電源が投入されると同時に、ＬＭＳ学習部２５が実行する学習処理を表すフローチャートである。 Next, the processing operation of the LMS learning unit 25 of the audio canceller unit 20 will be described with reference to FIG. FIG. 3 is a flowchart showing a learning process executed by the LMS learning unit 25 at the same time when the speech recognition apparatus 1 is turned on.

ＬＭＳ学習部２５は、学習処理の実行を開始すると、まず最初に、適応フィルタ２１に対して初期設定を施す（Ｓ２１０）。即ち、ＬＭＳ学習部２５は、予め定められた所定のフィルタ係数（初期値）を適応フィルタ２１に設定する。 When the LMS learning unit 25 starts executing the learning process, first, the LMS learning unit 25 performs initial setting on the adaptive filter 21 (S210). That is, the LMS learning unit 25 sets predetermined filter coefficients (initial values) in the adaptive filter 21.

その後、ＬＭＳ学習部２５は、減算部２３から出力される音声信号ｚ（ｔ）を用い、ＬＭＳ法に基づく次式に従い係数ｗ’を算出することで、次に適応フィルタ２１に設定すべきフィルタ係数ｗ’を学習する（Ｓ２２０）。 Thereafter, the LMS learning unit 25 uses the audio signal z (t) output from the subtracting unit 23 to calculate the coefficient w ′ according to the following equation based on the LMS method, and thereby the filter to be set to the adaptive filter 21 next. The coefficient w ′ is learned (S220).

ここで、代入する係数ｗは、既に適応フィルタ２１に設定したフィルタ係数ｗの値である。また、αは、係数ｗ’が発散するのを防止するための忘却係数であり、βは、除数がゼロになるのを防止するための正の定数である。その他μは、ステップサイズパラメータと呼ばれるものであり、フィルタ係数の学習速度に対応するパラメータである。

Here, the coefficient w to be substituted is the value of the filter coefficient w that has already been set in the adaptive filter 21. Further, α is a forgetting factor for preventing the coefficient w ′ from diverging, and β is a positive constant for preventing the divisor from becoming zero. Other μ is called a step size parameter, and is a parameter corresponding to the learning rate of the filter coefficient.

Ｓ２２０におけるフィルタ係数ｗ’の計算が完了すると、ＬＭＳ学習部２５は、Ｓ２２０で算出したフィルタ係数ｗ’を、新たなフィルタ係数ｗとして、適応フィルタ２１に設定する（Ｓ２３０）。 When the calculation of the filter coefficient w ′ in S220 is completed, the LMS learning unit 25 sets the filter coefficient w ′ calculated in S220 as the new filter coefficient w in the adaptive filter 21 (S230).

この後、ＬＭＳ学習部２５は、学習禁止指令が制御部３７より入力されているか否か判断し（Ｓ２４０）、入力されていなければ（Ｓ２４０でＮｏ）、当該装置の電源オフやエラー等により学習処理の終了指令が制御部３７から入力されているか否か判断する（Ｓ２５０）。そして、終了指令が入力されていれば（Ｓ２５０でＹｅｓ）、当該処理を終了し、終了指令が入力されていなければ（Ｓ２５０でＮｏ）、処理をＳ２２０に戻して、フィルタ係数ｗ’を学習し、その後フィルタ係数を更新する（Ｓ２３０）。 Thereafter, the LMS learning unit 25 determines whether or not a learning prohibition command is input from the control unit 37 (S240). If not input (No in S240), the LMS learning unit 25 learns by turning off the power of the device, an error, or the like. It is determined whether or not a processing end command is input from the control unit 37 (S250). If an end command is input (Yes in S250), the process ends. If no end command is input (No in S250), the process returns to S220 to learn the filter coefficient w ′. Thereafter, the filter coefficient is updated (S230).

また、Ｓ２４０において、学習禁止指令が制御部３７より入力されていると判断すると（Ｓ２４０でＹｅｓ）、ＬＭＳ学習部２５は、処理をＳ２６０に移して、学習再開指令が制御部３７から入力されているか否か判断する。そして学習再開指令が入力されていなければ（Ｓ２６０でＮｏ）、続くＳ２７０にて終了指令が入力されているか否か判断し、終了指令が入力されていれば（Ｓ２７０でＹｅｓ）、当該処理を終了し、終了指令が入力されていなければ（Ｓ２７０でＮｏ）、処理をＳ２６０に戻して、学習再開指令が制御部３７より入力されるまで待機する。 In S240, if it is determined that the learning prohibition command is input from the control unit 37 (Yes in S240), the LMS learning unit 25 moves the process to S260, and the learning restart command is input from the control unit 37. Determine whether or not. If a learning restart command is not input (No in S260), it is determined in S270 that an end command is input. If an end command is input (Yes in S270), the process ends. If the end command is not input (No in S270), the process returns to S260 and waits until the learning restart command is input from the control unit 37.

そして、学習再開指令が入力されたと判断すると（Ｓ２６０でＹｅｓ）、処理をＳ２２０に戻して、フィルタ係数ｗ’を学習し、その結果得られたフィルタ係数ｗ’を、新たなフィルタ係数ｗとして、適応フィルタ２１に設定する（Ｓ２３０）。 If it is determined that a learning restart command has been input (Yes in S260), the process returns to S220, the filter coefficient w ′ is learned, and the filter coefficient w ′ obtained as a result is set as a new filter coefficient w. The adaptive filter 21 is set (S230).

ＬＭＳ学習部２５は、このような動作を繰り返すことによって、図４に示すように、ＰＴＴスイッチ３５が押下（オン）されてから音声区間が終了するまでの間、フィルタ係数の学習動作を停止する。また、音声区間が終了して学習再開指令が入力されると、再び、次の学習禁止指令が入力されるまで、フィルタ係数の学習を継続する。尚、図４は、ＬＭＳ学習部２５の動作切替タイミングを表すタイムチャートである。 By repeating such an operation, the LMS learning unit 25 stops the filter coefficient learning operation from when the PTT switch 35 is pressed (turned on) until the end of the speech period, as shown in FIG. . Further, when the speech interval ends and the learning restart command is input, the filter coefficient learning is continued until the next learning prohibition command is input again. FIG. 4 is a time chart showing the operation switching timing of the LMS learning unit 25.

以上、本実施例の音声認識装置１について説明したが、この音声認識装置１では、制御部３７の動作により、音声認識部３３が音声抽出部３１を介してオーディオキャンセラ部２０から音声信号を取得している間、ＬＭＳ学習部２５によるフィルタ係数の学習が禁止されるので、音声認識のために利用者が発した音声がマイクロフォン５に入力される際に、フィルタ係数ｗの学習更新が行われるのを防止することができる。 The speech recognition device 1 according to the present embodiment has been described above. In this speech recognition device 1, the speech recognition unit 33 acquires a speech signal from the audio canceller unit 20 via the speech extraction unit 31 by the operation of the control unit 37. During this period, the learning of the filter coefficient by the LMS learning unit 25 is prohibited, so that the learning update of the filter coefficient w is performed when the voice uttered by the user for voice recognition is input to the microphone 5. Can be prevented.

したがって、この音声認識装置１によれば、音声認識部３３による音声信号取得の際に、マイクロフォン５に入力される利用者の音声の影響によって、フィルタ係数が不適切に学習更新されるのを防止することができ、音声認識の対象となる音声信号から精度よく雑音成分を取り除くことができる。結果、本実施例によれば、音声認識装置１における音声認識の精度を高めることができ、高い音声認識率を実現することが可能である。 Therefore, according to the voice recognition apparatus 1, when the voice signal is acquired by the voice recognition unit 33, the filter coefficient is prevented from being inappropriately learned and updated due to the influence of the voice of the user input to the microphone 5. Therefore, the noise component can be accurately removed from the speech signal that is the target of speech recognition. As a result, according to the present embodiment, the accuracy of speech recognition in the speech recognition apparatus 1 can be increased, and a high speech recognition rate can be realized.

その他、本実施例では、オーディオキャンセラ部２０から出力される音声信号ｚ（ｔ）に基づき、音声抽出部３１にて、利用者による発声がなされた発声期間を判別し、オーディオキャンセラ部２０から出力される音声信号の内、その発声期間に該当する音声信号のみを選択的に音声認識部３３に入力するようにしているので、利用者による発話内容が含まれない雑音区間の音声信号を、音声認識部３３に入力せずに済み、雑音に影響されず、音声認識部３３に正確な音声認識を行わせることができる。また、本実施例では、音声抽出部３１が、自動で発声期間を判別するので、利用者に発声期間に関する情報を操作スイッチから入力させなくて済み便利である。 In addition, in this embodiment, based on the audio signal z (t) output from the audio canceller unit 20, the audio extraction unit 31 determines the utterance period during which the user has made an utterance, and the audio canceller unit 20 outputs the utterance period. Since only the speech signal corresponding to the speech period is selectively input to the speech recognition unit 33, the speech signal in the noise section that does not include the utterance content by the user is It is not necessary to input the signal to the recognition unit 33, and the voice recognition unit 33 can perform accurate voice recognition without being affected by noise. In this embodiment, since the voice extraction unit 31 automatically determines the utterance period, it is convenient that the user does not input information regarding the utterance period from the operation switch.

また、本実施例では、ＰＴＴスイッチ３５から動作開始指令が入力されると同時に、その時点から音声抽出部３１が音声区間の検出を終了して音声認識部３３が音声信号の取得を終了するまでの期間、ＬＭＳ学習部２５によるフィルタ係数の学習を禁止するように制御部３７を構成しているので、簡単な制御で、利用者の発声期間には、フィルタ係数の学習を停止することができる。 Further, in this embodiment, at the same time when an operation start command is input from the PTT switch 35, the voice extraction unit 31 ends detection of the voice section from that time until the voice recognition unit 33 ends the acquisition of the voice signal. Since the control unit 37 is configured to prohibit the learning of the filter coefficient by the LMS learning unit 25 during this period, it is possible to stop the learning of the filter coefficient during the user's utterance period with simple control. .

さて、上記実施例ではフィルタ係数の学習を禁止することにより音声認識装置１の高性能化を実現したが、利用者の発声期間中に、フィルタ係数の学習速度を遅くすることで、従来と比較して高精度に音声認識を行えるようにすることも可能である。 In the above embodiment, the speech recognition apparatus 1 has been improved in performance by prohibiting the learning of filter coefficients. However, during the user's utterance period, the learning speed of the filter coefficients is slowed down, so that the comparison with the prior art is achieved. Thus, it is possible to perform voice recognition with high accuracy.

次には、このような構成にされた変形例の音声認識装置について説明することにする。尚、変形例の音声認識装置は、制御部３７及びＬＭＳ学習部２５の一部処理動作が異なる程度の構成であり、その他の装置内各部の構成は上述の音声認識装置１と同一である。したがって、以下では、上述の音声認識装置１と同一構成の各部の説明を省略することにし、図５及び図６を用いて、制御部３７及びＬＭＳ学習部２５の動作を説明する程度に留める。 Next, a modified speech recognition apparatus having such a configuration will be described. Note that the modified speech recognition apparatus has a configuration in which the partial processing operations of the control unit 37 and the LMS learning unit 25 are different, and the configuration of each other part in the apparatus is the same as that of the speech recognition apparatus 1 described above. Therefore, below, description of each part of the same structure as the speech recognition apparatus 1 described above will be omitted, and only the operation of the control unit 37 and the LMS learning unit 25 will be described with reference to FIGS. 5 and 6.

図５は、変形例の音声認識装置における制御部３７の処理動作を表すフローチャートである。図５に示すように、制御部３７は、ＰＴＴスイッチ３５から動作開始指令信号が入力されたと判断すると（Ｓ３００でＹｅｓ）、フィルタ係数の学習速度を遅くするための低速学習指令をオーディオキャンセラ部２０のＬＭＳ学習部２５に入力し（Ｓ３１０）、その後に音声認識部３３及び音声抽出部３１を作動させて音声認識を開始する（Ｓ３２０）。 FIG. 5 is a flowchart showing the processing operation of the control unit 37 in the voice recognition device according to the modification. As shown in FIG. 5, when the control unit 37 determines that the operation start command signal is input from the PTT switch 35 (Yes in S300), the audio canceller unit 20 sends a low speed learning command for slowing down the learning rate of the filter coefficient. Is input to the LMS learning unit 25 (S310), and then the speech recognition unit 33 and the speech extraction unit 31 are operated to start speech recognition (S320).

その後、制御部３７は、音声区間が終了し音声抽出部３１から音声認識部３３への音声信号入力が完了したか否かを、音声抽出部３１の動作状態に基づき判断し（Ｓ３３０）、音声区間が終了したと判断すると（Ｓ３３０でＹｅｓ）、フィルタ係数の学習速度を通常の学習速度に変更するための通常学習指令をＬＭＳ学習部２５に入力する（Ｓ３４０）。また、同時に、音声認識部３３から認識結果を取得する（Ｓ３５０）。そして認識結果が正しいか否かを利用者に問い合わせるために、認識結果をトークバックする（Ｓ３６０）。 Thereafter, the control unit 37 determines whether or not the voice section is finished and the voice signal input from the voice extraction unit 31 to the voice recognition unit 33 is completed based on the operation state of the voice extraction unit 31 (S330). If it is determined that the section has ended (Yes in S330), a normal learning command for changing the learning speed of the filter coefficient to the normal learning speed is input to the LMS learning unit 25 (S340). At the same time, a recognition result is acquired from the voice recognition unit 33 (S350). Then, in order to inquire the user whether or not the recognition result is correct, the recognition result is talked back (S360).

この後、制御部３７は、利用者の操作によりＰＴＴスイッチ３５等の操作スイッチから認識結果が正しいことを表す認識結果確定信号が入力されたか否か判断し（Ｓ３７０）、認識結果確定信号が入力されたと判断すると（Ｓ３７０でＹｅｓ）、確定後処理を実行する（Ｓ３８０）。一方、認識結果確定信号が入力されなかったと判断すると（Ｓ３７０でＮｏ）、確定後処理を実行せずに、当該処理を終了する。 Thereafter, the control unit 37 determines whether or not a recognition result confirmation signal indicating that the recognition result is correct is input from an operation switch such as the PTT switch 35 by the user's operation (S370), and the recognition result confirmation signal is input. If it is determined (Yes in S370), post-confirmation processing is executed (S380). On the other hand, if it is determined that the recognition result confirmation signal has not been input (No in S370), the process is terminated without executing the post-confirmation process.

次に、変形例の音声認識装置におけるＬＭＳ学習部２５の処理動作について図６を用いて説明する。図６は、音声認識装置に電源が投入されると同時に、変形例のＬＭＳ学習部２５が実行する学習処理を表すフローチャートである。 Next, the processing operation of the LMS learning unit 25 in the voice recognition device according to the modification will be described with reference to FIG. FIG. 6 is a flowchart showing a learning process executed by the LMS learning unit 25 of the modified example at the same time when the voice recognition apparatus is turned on.

ＬＭＳ学習部２５は、学習処理を開始するとＳ４１０で、初期設定として、予め定められた所定のフィルタ係数（初期値）を適応フィルタ２１に対して設定すると共に、フィルタ係数ｗ’算出の際に用いる式（５）のパラメータμを、初期値μ_Hに設定する（μ＝μ_H）。 When the learning process is started, the LMS learning unit 25 sets a predetermined predetermined filter coefficient (initial value) for the adaptive filter 21 as an initial setting and uses it in calculating the filter coefficient w ′ in S410. The parameter μ in the equation (5) is set to the initial value μ _H (μ = μ _H ).

この後、ＬＭＳ学習部２５は、減算部２３から出力される音声信号ｚ（ｔ）を用いて、ＬＭＳ法に基づく式（５）に従い係数ｗ’を算出する（Ｓ４２０）。この動作によりＬＭＳ学習部２５は、次に適応フィルタ２１に設定すべきフィルタ係数ｗ’を学習し、Ｓ４３０にて、フィルタ係数ｗ’を、新たなフィルタ係数ｗとして、適応フィルタ２１に設定する。 Thereafter, the LMS learning unit 25 uses the audio signal z (t) output from the subtraction unit 23 to calculate the coefficient w ′ according to Expression (5) based on the LMS method (S420). By this operation, the LMS learning unit 25 learns the filter coefficient w ′ to be set next in the adaptive filter 21, and sets the filter coefficient w ′ in the adaptive filter 21 as a new filter coefficient w in S 430.

続いて、ＬＭＳ学習部２５は、低速学習指令が制御部３７より入力されているか否か判断し（Ｓ４４０）、入力されていると判断すると（Ｓ４４０でＹｅｓ）、Ｓ４５０にて、学習速度を表すパラメータμに、予め定められた値μ_Lを設定する（μ＝μ_L）。尚、値μ_L及び値μ_Hには、不等式μ_L＜μ_Hの関係が成立する。 Subsequently, the LMS learning unit 25 determines whether or not a low-speed learning command is input from the control unit 37 (S440). If it is determined that the low-speed learning command is input (Yes in S440), the learning speed is represented in S450. the parameter mu, sets the predetermined value _{_{μ L (μ = μ L)}} . Incidentally, the value mu _L and the value mu _H, the relationship of inequality μ _L <μ _H is established.

式（５）を見れば理解できるように、パラメータμの値を小さくすると、フィルタ係数ｗ’の変化量を、小さくすることができる。つまり、パラメータμを小さくすることで、フィルタ係数ｗ’が収束するまでの時間を長期化することができ、学習速度を抑えることができる。ＬＭＳ学習部２５は、このようにパラメータμを通常より小さい値μ_Lに設定することで、フィルタ係数の学習速度を低くしているのである。 As can be understood from the expression (5), when the value of the parameter μ is reduced, the amount of change in the filter coefficient w ′ can be reduced. That is, by reducing the parameter μ, the time until the filter coefficient w ′ converges can be lengthened, and the learning speed can be suppressed. LMS learning section 25, by thus setting the parameter mu Usually smaller value mu _L, with each other to lower the learning speed of the filter coefficients.

この後、制御部３７は、処理をＳ４２０に移して、μ＝μ_Lである式（５）に従い、フィルタ係数ｗ’を算出し、その後フィルタ係数ｗを更新する（Ｓ４３０）。
一方、制御部３７は、Ｓ４４０にて、低速学習指令が入力されていないと判断すると（Ｓ４４０でＮｏ）、Ｓ４６０にて、制御部３７から通常学習指令が入力されているか否か判断する。 Thereafter, the control unit 37 moves the processing to S420, in accordance with Equation (5) is a mu = mu _L, and calculates the filter coefficient w ', then update the filter coefficients w (S430).
On the other hand, when determining that the low speed learning command is not input in S440 (No in S440), the control unit 37 determines whether or not the normal learning command is input from the control unit 37 in S460.

ここで、通常学習指令が入力されていると判断すると（Ｓ４６０でＹｅｓ）、制御部３７は、Ｓ４７０にて、学習速度を表すパラメータμを、μ_Hに変更する（μ＝μ_H）。そして、再び処理をＳ４２０に移し、μ＝μ_Hである式（５）に従い、フィルタ係数ｗ’を算出し、その後フィルタ係数ｗを更新する（Ｓ４３０）。 Here, the normal learning command is judged to be inputted (Yes in S460), the control unit 37, at S470, a parameter mu representing the learning speed is changed to _{_{μ H (μ = μ H)}} . Then, the process again proceeds to S420, the filter coefficient w ′ is calculated according to the equation (5) where μ = μ _H , and then the filter coefficient w is updated (S430).

また、制御部３７は、Ｓ４４０及びＳ４６０でＮｏと判断すると、Ｓ４８０にて、当該学習処理の終了指令が制御部３７から入力されているか否か判断する。そして、終了指令が入力されていないと判断すると（Ｓ４８０でＮｏ）、処理をＳ４２０に戻して、フィルタ係数ｗ’を学習し、その後フィルタ係数を更新する（Ｓ４３０）。一方、終了指令が入力されていると判断すると（Ｓ４８０でＹｅｓ）、当該学習処理を終了する。 In addition, when the control unit 37 determines No in S440 and S460, the control unit 37 determines whether an instruction to end the learning process is input from the control unit 37 in S480. If it is determined that an end command has not been input (No in S480), the process returns to S420, the filter coefficient w 'is learned, and then the filter coefficient is updated (S430). On the other hand, if it is determined that an end command has been input (Yes in S480), the learning process ends.

ＬＭＳ学習部２５は、このような処理を実行することによって、図７に示すように、ＰＴＴスイッチ３５が押下（オン）されてから、音声区間が終了するまでの間は、フィルタ係数の学習速度を低くする。また、音声区間が終了して通常学習指令が入力されると、再び、次の低速学習指令が入力されるまで、通常の学習速度でフィルタ係数の学習を行う。尚、図７は、学習速度の切替タイミングを表すタイムチャートである。 As shown in FIG. 7, the LMS learning unit 25 performs such processing, so that the learning speed of the filter coefficient is reduced from when the PTT switch 35 is pressed (turned on) until the end of the speech period. Lower. Further, when the normal learning command is input after the end of the speech section, the filter coefficient is learned at the normal learning speed until the next low-speed learning command is input again. FIG. 7 is a time chart showing learning speed switching timing.

以上、変形例について説明したが、変形例の音声認識装置では、音声認識部３３及び音声抽出部３１の作動と共にＬＭＳ学習部２５に低速学習指令を入力することで、音声認識部３３が音声抽出部３１を介してオーディオキャンセラ部２０から音声信号を取得している間、通常より低学習速度で、ＬＭＳ学習部２５に、フィルタ係数を学習させているので、その期間においてフィルタ係数の学習動作に及ぶ利用者の音声の影響を抑えることができ、ＬＭＳ学習部２５におけるフィルタ係数の誤学習を抑制することができる。 Although the modification has been described above, in the voice recognition device of the modification, the voice recognition unit 33 extracts the voice by inputting the low speed learning command to the LMS learning unit 25 together with the operation of the voice recognition unit 33 and the voice extraction unit 31. While the audio signal is acquired from the audio canceller unit 20 via the unit 31, the LMS learning unit 25 is learning the filter coefficient at a lower learning speed than usual. It is possible to suppress the influence of the user's voice, and to suppress erroneous learning of the filter coefficient in the LMS learning unit 25.

この結果、変形例の音声認識装置によれば、ＬＭＳ学習部２５に適切にフィルタ係数の学習を行わせることができ、オーディオキャンセラ部２０における雑音除去の精度を向上させることができる。したがって、変形例によれば、高精度に音声認識可能な音声認識装置を提供することができる。 As a result, according to the speech recognition apparatus of the modification, the LMS learning unit 25 can appropriately learn the filter coefficient, and the accuracy of noise removal in the audio canceller unit 20 can be improved. Therefore, according to the modified example, it is possible to provide a voice recognition device capable of voice recognition with high accuracy.

また、変形例では、制御部３７が、音声認識部３３の非動作時に加え、音声認識部３３が音声信号を取得した直後（即ち音声区間が終了した直後）から通常の学習速度でＬＭＳ学習部２５にフィルタ係数の学習を行わせる構成にされているので、連続してＰＴＴスイッチ３５から動作開始指令信号が入力され音声認識部３３が動作する場合にも、オーディオキャンセラ部２０にて適切な雑音除去が可能である。 In the modification, the LMS learning unit is operated at a normal learning speed immediately after the voice recognition unit 33 acquires the voice signal (that is, immediately after the voice section is finished), in addition to when the voice recognition unit 33 is not operating. 25, the filter coefficient learning is performed. Therefore, even when the operation start command signal is continuously input from the PTT switch 35 and the voice recognition unit 33 is operated, the audio canceller unit 20 performs appropriate noise. Removal is possible.

その他、変形例においても、音声抽出部３１が、利用者による発声がなされた発声期間に該当する音声信号のみを選択的に音声認識部３３に入力するので、利用者による発話内容が含まれない雑音区間の音声信号を、音声認識部３３に入力せずに済み、雑音に影響されず、音声認識部３３にて正確な音声認識を行うことができる。 In addition, in the modified example, since the voice extraction unit 31 selectively inputs only the voice signal corresponding to the utterance period in which the utterance is made by the user to the voice recognition unit 33, the utterance content by the user is not included. It is not necessary to input the speech signal in the noise section to the speech recognition unit 33, and the speech recognition unit 33 can perform accurate speech recognition without being affected by noise.

また、変形例の音声認識装置においては、ＰＴＴスイッチ３５から動作開始指令信号が入力されると同時に、低学習速度でＬＭＳ学習部２５にフィルタ係数を学習させる手法を採用しているので、簡単な制御で確実に、フィルタ係数の誤学習を抑制することができる。 In the modified speech recognition apparatus, since the operation start command signal is input from the PTT switch 35 and the LMS learning unit 25 learns the filter coefficient at a low learning speed at the same time, a simple method is adopted. It is possible to reliably suppress erroneous learning of filter coefficients by the control.

その他、上記実施例の音声認識装置によれば、車載オーディオ機器７の動作によりスピーカ９から再生される音楽等のボリュームを調節しなくても、高精度に音声認識を行うことができるので、ボリューム調整などによって利用者に不満が及ぶといった従来問題を解消することができる。 In addition, according to the speech recognition apparatus of the above embodiment, since the volume of music or the like reproduced from the speaker 9 is not adjusted by the operation of the in-vehicle audio device 7, speech recognition can be performed with high accuracy. Conventional problems such as dissatisfaction with users due to adjustments can be solved.

以上、本発明の実施例について説明したが、本発明の雑音除去信号生成手段は、本実施例の適応フィルタ２１に相当し、本発明の雑音除去手段は、減算部２３に相当する。また、係数更新手段は、ＬＭＳ学習部２５に相当し、音声認識手段は、音声区間における音声信号ｚ（ｔ）を取得して音声認識を行う音声認識部３３に相当する。 Although the embodiments of the present invention have been described above, the noise removal signal generating means of the present invention corresponds to the adaptive filter 21 of this embodiment, and the noise removal means of the present invention corresponds to the subtracting unit 23. The coefficient updating unit corresponds to the LMS learning unit 25, and the speech recognition unit corresponds to the speech recognition unit 33 that acquires the speech signal z (t) in the speech section and performs speech recognition.

その他、学習速度切替手段は、制御部３７が図５に示す処理に従うタイミングで低速学習指令及び通常学習指令をＬＭＳ学習部２５に入力する動作にて実現されている。尚、学習速度切替手段が、第二の学習速度で係数更新手段にフィルタ係数を学習させる動作は、本実施例において、ＬＭＳ学習部２５に、第二の学習速度に対応するパラメータμ＝μ_Lでフィルタ係数ｗ’の演算を行わせる動作にて実現されている。また、学習速度切替手段が、第一の学習速度で係数更新手段にフィルタ係数を学習させる動作は、本実施例において、ＬＭＳ学習部２５に、第一の学習速度に対応するパラメータμ＝μ_Hでフィルタ係数ｗ’の演算を行わせる動作にて実現されている。 In addition, the learning speed switching means is realized by an operation in which the control unit 37 inputs a low speed learning command and a normal learning command to the LMS learning unit 25 at a timing according to the processing shown in FIG. Note that the operation in which the learning speed switching means causes the coefficient updating means to learn the filter coefficient at the second learning speed is the parameter μ = μ _L corresponding to the second learning speed in the LMS learning unit 25 in this embodiment. This is realized by the operation of calculating the filter coefficient w ′. In addition, the operation in which the learning speed switching unit causes the coefficient updating unit to learn the filter coefficient at the first learning speed causes the LMS learning unit 25 to use the parameter μ = μ _H corresponding to the first learning speed in the present embodiment. This is realized by the operation of calculating the filter coefficient w ′.

また、本発明の音声認識装置は、上記実施例に限定されるものではなく、種々の態様を採ることができる。
例えば、制御部３７は、少なくとも音声認識部３３による音声信号の取得期間において、ＬＭＳ学習部２５の動作を禁止する、若しくは、ＬＭＳ学習部２５におけるフィルタ係数の学習速度を低速度化する構成にされていればよく、音声区間の終了後音声認識部３３における音声認識が完了し、音声認識の結果が得られるまで、ＬＭＳ学習部２５によるフィルタ係数の学習を禁止してもよいし、低学習速度でＬＭＳ学習部２５を動作させてもよい。 Moreover, the speech recognition apparatus of the present invention is not limited to the above-described embodiments, and can take various forms.
For example, the control unit 37 is configured to prohibit the operation of the LMS learning unit 25 or reduce the learning rate of the filter coefficient in the LMS learning unit 25 at least during the acquisition period of the audio signal by the audio recognition unit 33. And the learning of the filter coefficient by the LMS learning unit 25 may be prohibited until the speech recognition in the speech recognition unit 33 is completed and the result of speech recognition is obtained. The LMS learning unit 25 may be operated.

また、上記実施例では、フィルタ係数の学習方法としてＬＭＳ法、具体的にはＮｏｒｍａｌｉｚｅｄＬＭＳ（ＮＬＭＳ）アルゴリズムが採用された音声認識装置に、本発明を適用した例を示したが、その他の学習方法でフィルタ係数を学習する音声認識装置に本発明を適用しても構わない。尚、本発明を適用可能な学習方法としては、上述した適応アルゴリズム以外に、例えば、複素ＬＭＳアルゴリズム、ＦａｓｔＬＭＳ（ＦＬＭＳ）アルゴリズム、射影アルゴリズム、ＲＬＳ（ＲｅｃｕｒｓｉｖｅＬｅａｓｔＳｑｕａｒｅ）アルゴリズム、ＳＨＡＲＦ（ＳｉｍｐｌｅＨｙｐｅｒｓｔａｂｌｅＡｄａｐｔｉｖｅＲｅｃｕｒｓｉｖｅＦｉｌｔｅｒ）アルゴリズム、ＤＣＴ（ＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）を用いた適応フィルタ、ＳＡＮ（ＳｉｎｇｌｅＦｒｅｑｕｅｎｃｙＡｄａｐｔｉｖｅＮｏｔｃｈ）フィルタ、ニューラルネットワーク、遺伝的アルゴリズム等が挙げられる。 In the above embodiment, an example in which the present invention is applied to a speech recognition apparatus that employs the LMS method, specifically, the Normalized LMS (NLMS) algorithm, as a filter coefficient learning method has been described. The present invention may be applied to a speech recognition apparatus that learns filter coefficients. As a learning method to which the present invention can be applied, in addition to the adaptive algorithm described above, for example, complex LMS algorithm, FastLMS (FLMS) algorithm, projection algorithm, RLS (Recursive Least Square) algorithm, SHARF (Simple Hyperstable Adaptive Recursive). ) Algorithm, adaptive filter using DCT (Discrete Cosine Transform), SAN (Single Frequency Adaptive Notch) filter, neural network, genetic algorithm, and the like.

本実施例の音声認識装置１の概略構成を表すブロック図である。It is a block diagram showing schematic structure of the speech recognition apparatus 1 of a present Example. 制御部３７の処理動作を表すフローチャートである。7 is a flowchart illustrating a processing operation of a control unit 37. ＬＭＳ学習部２５が実行する学習処理を表すフローチャートである。It is a flowchart showing the learning process which the LMS learning part 25 performs. ＬＭＳ学習部２５の動作切替タイミングを表すタイムチャートである。3 is a time chart showing operation switching timing of an LMS learning unit 25. 変形例の制御部３７における処理動作を表すフローチャートである。It is a flowchart showing the processing operation in the control part 37 of a modification. 変形例のＬＭＳ学習部２５が実行する学習処理を表すフローチャートである。It is a flowchart showing the learning process which the LMS learning part 25 of a modification performs. 学習速度の切替タイミングを表すタイムチャートである。It is a time chart showing the switching timing of learning speed.

Explanation of symbols

１…音声認識装置、３…カーナビゲーション装置、５…マイクロフォン、７…車載オーディオ機器、９…スピーカ、１１，１３…ＡＤＣ、２０…オーディオキャンセラ部、２１…適応フィルタ、２３…減算部、２５…ＬＭＳ学習部、３１…音声抽出部、３３…音声認識部、３５…ＰＴＴスイッチ、３７…制御部、３９…音声合成部 DESCRIPTION OF SYMBOLS 1 ... Voice recognition apparatus, 3 ... Car navigation apparatus, 5 ... Microphone, 7 ... Car-mounted audio equipment, 9 ... Speaker, 11, 13 ... ADC, 20 ... Audio canceller part, 21 ... Adaptive filter, 23 ... Subtraction part, 25 ... LMS learning unit, 31 ... speech extraction unit, 33 ... speech recognition unit, 35 ... PTT switch, 37 ... control unit, 39 ... speech synthesis unit

Claims

A noise removal signal generating means for generating a noise removal signal for removing noise by filtering a reference signal input from a noise source according to a preset filter coefficient;
Using the noise removal signal generated by the noise removal signal generation means, a noise removal means for removing a noise component included in a voice signal input from a microphone and outputting the voice signal after the noise removal;
Coefficient updating means for learning filter coefficients to be set in the noise removal signal generating means based on the audio signal output from the noise removing means, and setting the filter coefficients obtained as a result of learning in the noise removal signal generating means When,
When an operation start command is input from the outside, a voice recognition unit that acquires a voice signal output from the noise removal unit for a predetermined period and recognizes a voice input to the microphone based on the voice signal;
When the voice recognition means is not operating, the coefficient update means learns the filter coefficient at the first learning speed, and the voice recognition means acquires the voice signal output from the noise removal means. Learning speed switching means for causing the coefficient update means to learn the filter coefficient at a second learning speed lower than the first learning speed;
A speech recognition apparatus comprising:

The learning speed switching means is configured to apply the coefficient at the second learning speed for a period until the voice recognition means ends acquisition of the voice signal at the same time when the operation start command is input to the voice recognition means. The speech recognition apparatus according to claim 1, wherein the update means is made to learn the filter coefficient.

2. The coefficient updating unit learns a filter coefficient to be set in the noise removal signal generation unit by using an LMS method based on an audio signal output from the noise removal unit. The speech recognition apparatus according to 2.

The speech recognition apparatus according to claim 1, wherein the noise source is an audio device.