JP4459729B2

JP4459729B2 - In-vehicle speech recognition system

Info

Publication number: JP4459729B2
Application number: JP2004176870A
Authority: JP
Inventors: 陽一北野
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2004-06-15
Filing date: 2004-06-15
Publication date: 2010-04-28
Anticipated expiration: 2024-06-15
Also published as: JP2006003400A

Description

本発明は、車両内において利用者の音声を認識するための車載音声認識システムに関する。 The present invention relates to an in-vehicle voice recognition system for recognizing a user's voice in a vehicle.

従来、雑音環境が様々に変化するフィールドにおいて、雑音環境が様々に変化してもそれに追随して常に良好な認識率を上げることができる音声認識システムがある。具体的には、このシステムでは、例えば使用されるフィールドにおいて想定される異なる雑音環境に対応する異なる音声モデルと、各音声モデルについて独立に音声認識処理を行うと共に、音声認識処理において認識されたシンボルの他にその確からしさを示す数値も計算し、その確からしさも出力する音声認識部と、認識されたシンボルの確からしさを比較し、その最も大きいものを取り出し、それに対応した認識シンボルをシステムの認識結果として出力する確率比較部とを備え、選択された音声モデルを用いて音声認識処理を行う。このように、異なる雑音環境に対応して複数の音声モデルを用意し、認識時の雑音環境に応じて最適な音声モデルを選択して音声認識を実行するので、雑音環境が様々に変化するフィールドにおいて、雑音環境が様々に変化してもそれに追随して常に良好な認識率を上げることができる。また、音声認識処理の負荷を大幅に軽減しつつ認識率を上げることができ、リアルタイムの音声認識が可能になる（例えば、特許文献１参照。）。 2. Description of the Related Art Conventionally, in a field where the noise environment changes variously, there is a speech recognition system that can always increase a good recognition rate even if the noise environment changes variously. Specifically, in this system, for example, different voice models corresponding to different noise environments assumed in the field to be used, and each voice model performs voice recognition processing independently, and symbols recognized in the voice recognition processing. In addition to the above, the numerical value indicating the certainty is also calculated, and the speech recognition unit that outputs the certainty is compared with the certainty of the recognized symbol, the largest one is extracted, and the recognition symbol corresponding to it is extracted from the system. A probability comparison unit that outputs a recognition result, and performs a speech recognition process using the selected speech model. In this way, multiple voice models are prepared corresponding to different noise environments, and the optimum voice model is selected according to the noise environment at the time of recognition and voice recognition is executed. Therefore, even if the noise environment changes variously, a good recognition rate can always be raised following the change. In addition, the recognition rate can be increased while greatly reducing the load of voice recognition processing, and real-time voice recognition is possible (see, for example, Patent Document 1).

一方、同様に利用者（話者）の音声を認識する装置には、音声認識辞書に様々な心的状態に対応した辞書を備え、音声認識時には話者の心的状態を推定し、その心的状態に対応した辞書を使用することにより認識率を向上させるものもある。具体的には、この装置では、話者状態や自動車の走行状態を検知するセンサ装置と、その出力から話者の心的状態を推定する心的状態推定装置を設けると共に、車室内での様々な緊張下における発音に基づいて作成された辞書を備える音声認識辞書群を設ける。そして、音声入力時に、辞書選択装置が、例えば余裕度推定値等の心的状態推定値に応じて音声認識辞書群から最適な辞書を選択し、音声認識手段がこの最適な辞書を利用して音声認識を実行する。これにより、話者の心的状態に応じた最適な辞書を用いて音声認識を行うのでその認識率を向上させることができる（例えば、特許文献２参照。）。
特開２０００−７５８８９号公報特開２００２−１４９１９１号公報 On the other hand, a device for recognizing the voice of a user (speaker) similarly includes a dictionary corresponding to various mental states in the speech recognition dictionary, and estimates the mental state of the speaker during speech recognition. Some use a dictionary corresponding to the target state to improve the recognition rate. Specifically, this device includes a sensor device that detects the speaker state and the running state of the automobile, and a mental state estimation device that estimates the mental state of the speaker from the output thereof. A speech recognition dictionary group including dictionaries created based on pronunciation under various tensions is provided. At the time of speech input, the dictionary selection device selects an optimal dictionary from the speech recognition dictionary group according to the mental state estimated value such as a margin estimated value, and the speech recognition means uses this optimal dictionary. Perform voice recognition. Thereby, since speech recognition is performed using the optimal dictionary according to a speaker's mental state, the recognition rate can be improved (for example, refer patent document 2).
JP 2000-75889 A JP 2002-149191 A

ところで、車両に搭載される車載機器の音声認識装置には、高い耐ノイズ性が求められる。そこで、従来はその対応策として、特許文献１に記載のシステムのように、ノイズの特徴を含んだ音響モデルを利用して音声認識を実行するものの、用意した音響モデルに含まれるノイズと実際に入力されるノイズとが必ずしも一致するとは限らず、両者が一致しない場合には認識性能が低下する場合があると共に、ノイズの少ない環境や音響的に異なる車両でも認識性能が低下する場合があるという問題があった。 By the way, a high noise resistance is required for a voice recognition device of an in-vehicle device mounted on a vehicle. Therefore, conventionally, as a countermeasure, speech recognition is performed using an acoustic model including noise characteristics as in the system described in Patent Document 1, but the noise included in the prepared acoustic model is actually used. The input noise does not always match, and if the two do not match, the recognition performance may be degraded, and the recognition performance may be degraded even in environments with little noise or acoustically different vehicles. There was a problem.

一方、特許文献２に記載の装置では、各種センサからの出力信号に基づいて、音声認識に利用する認識辞書を決定するものの、選択する辞書は１つであるため、認識辞書の選択が適切でなかった場合は、認識率が低下する可能性があるという問題があった。 On the other hand, in the apparatus described in Patent Document 2, although a recognition dictionary to be used for speech recognition is determined based on output signals from various sensors, since only one dictionary is selected, selection of a recognition dictionary is appropriate. If not, there is a problem that the recognition rate may decrease.

本発明は、上記課題に鑑みてなされたもので、環境により変化するノイズに正確に対応して音声認識を実行する車載音声認識システムを提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide an in-vehicle speech recognition system that performs speech recognition in response to noise that varies depending on the environment.

上記課題を解決するために、請求項１の発明に係る車載音声認識システムは、利用者が発話する音声を入力するための音声入力部（例えば後述する実施例のマイク１）と、車両の特定の環境条件として、車速、エアコン風量、ワイパ速度、窓開度、および、オーディオ音量のうち少なくとも一つを検出する車両センサ部（例えば後述する実施例の車速センサ５、エアコンＥＣＵ６、オーディオＥＣＵ７等）と、前記特定の環境条件毎に予め記憶された複数の音響モデル（例えば後述する実施例の音響モデルａ２３１、音響モデルｂ２３２・・・、音響モデルｎ２３３等）に対応する音声認識辞書部（例えば後述する実施例の音声認識辞書部２４）と、前記音声入力部からの音声信号を前記音声認識辞書部を利用して音声認識し、音声認識の結果をスコア化して認識スコアを求める音声認識実行部（例えば後述する実施例の音響パターン認識部２５）と、前記車両センサ部の出力から、実際の車両環境条件を判定する実環境判定部（例えば後述する実施例の環境評価部４の実行するステップＳ３の処理）と、前記実環境判定部により判定された実際の車両環境条件と、前記特定の車両環境条件毎に予め記憶された複数の音響モデルの環境条件との差が大きくなるほど低くなる該音響モデルのモデル信頼度を算出するモデル信頼度算出部（例えば後述する実施例の環境評価部４の実行するステップＳ８の処理）と、前記モデル信頼度算出部において算出された前記モデル信頼度とその音響モデルの認識スコアとを積算することで実環境スコアを算出し、該実環境スコアに基づき総合的な音声認識結果を算出する最終結果算出部（例えば後述する実施例の最終結果演算部９）とを備えることを特徴とする。 In order to solve the above-mentioned problem, an in-vehicle voice recognition system according to the invention of claim 1 includes a voice input unit (for example, a microphone 1 in an embodiment to be described later) for inputting voice uttered by a user, and vehicle identification. As an environmental condition , a vehicle sensor unit that detects at least one of a vehicle speed, an air conditioner air volume, a wiper speed, a window opening degree, and an audio volume (for example, a vehicle speed sensor 5, an air conditioner ECU 6, an audio ECU 7, etc. in an embodiment described later) And a speech recognition dictionary unit (for example, described later) corresponding to a plurality of acoustic models (for example, an acoustic model a231, an acoustic model b232,..., An acoustic model n233, etc. described later) stored in advance for each specific environmental condition. The speech recognition dictionary unit 24) and the speech signal input from the speech input unit are speech-recognized using the speech recognition dictionary unit. The recognition process section for obtaining the recognition score was scored as (e.g. acoustic pattern recognition unit 25 of the embodiment described later), from the output of the vehicle sensor unit, the real environment determination section determines the actual vehicle environmental conditions (e.g. and the process of step S3) to execute the environment evaluation unit 4 of the embodiment described later, the actual vehicle environment condition determined by the actual environment determination section, a plurality of acoustic previously stored for each of the specific vehicle environmental conditions A model reliability calculation unit that calculates the model reliability of the acoustic model that decreases as the difference from the environmental condition of the model increases (for example, the process in step S8 executed by the environment evaluation unit 4 in an embodiment described later); confidence the model reliability calculated in the calculation unit and the recognition score of the acoustic model calculates the actual environmental score by accumulating, overall basis of the actual environmental score Characterized in that it comprises a final result calculation unit that calculates a voice recognition result (e.g., the final result of the embodiment described later calculation unit 9).

以上の構成を備えた車載音声認識システムは、利用者が発話する音声を入力するための音声入力部と、車両の特定の環境条件を検出する車両センサ部と、特定の環境条件毎に予め記憶された複数の音響モデルに対応する音声認識辞書部とを備え、音声入力部から音声信号が入力されると、音声認識実行部が、音声認識辞書部を利用して該音声信号を音声認識し、音声認識の結果をスコア化すると共に、車両センサ部の出力から実際の車両環境条件を判定する実環境判定部の判定結果と予め記憶された複数の音響モデルの環境条件とをモデル信頼度算出部が比較し、該音響モデルのモデル信頼度を算出する。そして、最終結果算出部が、モデル信頼度算出部において算出されたモデル信頼度により、音声認識実行部で算出された音声認識結果のスコアを補正し、総合的な音声認識結果を算出することにより、複数の音響モデルによる音声認識結果と、車両状態を総合的に判断して認識結果を算出することができる。 The vehicle-mounted speech recognition system having the above configuration stores in advance a speech input unit for inputting speech uttered by a user, a vehicle sensor unit for detecting a specific environmental condition of the vehicle, and each specific environmental condition. A speech recognition dictionary unit corresponding to a plurality of acoustic models, and when a speech signal is input from the speech input unit, the speech recognition execution unit recognizes the speech signal using the speech recognition dictionary unit. In addition to scoring the result of speech recognition, calculating the model reliability of the determination result of the real environment determination unit that determines the actual vehicle environmental condition from the output of the vehicle sensor unit and the environmental conditions of a plurality of acoustic models stored in advance Compare each other and calculate the model reliability of the acoustic model. Then, the final result calculation unit corrects the score of the voice recognition result calculated by the voice recognition execution unit based on the model reliability calculated by the model reliability calculation unit, and calculates a comprehensive voice recognition result. The recognition result can be calculated by comprehensively judging the voice recognition result by a plurality of acoustic models and the vehicle state.

請求項１に記載の車載音声認識システムによれば、音声認識実行部で算出された音声認識結果のスコアを、実際の車両環境条件から判断された音響モデルのモデル信頼度により補正し、総合的な音声認識結果を算出することにより、複数の音響モデルによる音声認識結果と、車両状態を総合的に判断して認識結果を算出することができる。
従って、環境により変化するノイズに正確に対応することで、環境変化の影響を受けにくい音声認識を実行することができるという効果が得られる。 According to the in-vehicle speech recognition system according to claim 1, the score of the speech recognition result calculated by the speech recognition execution unit is corrected by the model reliability of the acoustic model determined from the actual vehicle environmental conditions, By calculating a simple speech recognition result, the recognition result can be calculated by comprehensively judging the speech recognition result by a plurality of acoustic models and the vehicle state.
Therefore, by accurately responding to noise that varies depending on the environment, it is possible to perform speech recognition that is less susceptible to environmental changes.

以下、図面を参照して本発明の実施例について説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（装置構成）
図１は、本発明の一実施例の車載音声認識システムの全体構成を示すブロック図である。
図１において、本実施例の車載音声認識システムは、利用者の音声を入力するためのマイク１を備えており、マイク１から入力された利用者の音声は音声認識部２へ入力され、用意された音響モデルのいくつか（もしくは全て）を利用して、順次または並列に音声認識される。音声認識部２は、音声認識を実行することにより、入力された音声を「認識結果テキスト」に変換して出力する処理部であって、音声認識部２において認識された「認識結果テキスト」は、音声認識結果テーブルａ３１、音声認識結果テーブルｂ３２・・・、音声認識結果テーブルｎ３３等、複数の音声認識結果テーブルを備えた音声認識結果格納部３へ出力される。 (Device configuration)
FIG. 1 is a block diagram showing the overall configuration of an in-vehicle speech recognition system according to an embodiment of the present invention.
In FIG. 1, the in-vehicle voice recognition system of this embodiment includes a microphone 1 for inputting a user's voice, and the user's voice input from the microphone 1 is input to the voice recognition unit 2 and prepared. Speech recognition is performed sequentially or in parallel using some (or all) of the generated acoustic models. The speech recognition unit 2 is a processing unit that performs speech recognition to convert the input speech into “recognition result text” and output the “recognition result text” recognized by the speech recognition unit 2. , Voice recognition result table a31, voice recognition result table b32..., Voice recognition result table n33, etc.

また、先に音声認識部２について詳細に説明すると、音声認識部２は、マイク１から入力されたアナログの音声をサンプリング及び量子化してディジタル音声信号に変換するＡ／Ｄ変換部２１と、Ａ／Ｄ変換部２１が出力するディジタル音声信号をスペクトル分析して、音声の周波数的な特徴を捉える周波数分析部２２と、音響モデルａ２３１、音響モデルｂ２３２・・・、音響モデルｎ２３３等、複数の音響モデルを備えた音響モデル格納部２３と、音響モデル格納部２３に備えられた「音響モデル」と音声認識辞書部２４の内容を比較しながら、周波数分析部２２が出力するディジタル音声信号を認識する音響パターン認識部２５とを備えている。 The speech recognition unit 2 will be described in detail. The speech recognition unit 2 samples and quantizes analog speech input from the microphone 1 and converts it into a digital speech signal. A frequency analysis unit 22 that performs spectrum analysis on the digital audio signal output from the / D conversion unit 21 and captures frequency characteristics of the audio, and a plurality of acoustic models such as an acoustic model a231, an acoustic model b232, and an acoustic model n233. While comparing the contents of the acoustic model storage unit 23 with the model and the “acoustic model” provided in the acoustic model storage unit 23 with the contents of the speech recognition dictionary unit 24, the digital speech signal output by the frequency analysis unit 22 is recognized. And an acoustic pattern recognition unit 25.

なお、音響モデル格納部２３に備えられる複数の音響モデルは、環境に変化するノイズに対応するために、例えば「車両Ａにより時速６０［ｋｍ／ｈ］で走行し、かつエアコン（エアコンディショナ）風量が３０［％］時」というような特定の環境にそれぞれ適した音響モデルであって、各音響モデルを作成する際の条件を、本実施例では「モデル環境値」と呼ぶものとする。また、音声認識辞書部２４には、特定の環境すなわち「モデル環境値」毎に予め記憶された複数の音響モデルａ２３１、音響モデルｂ２３２・・・、音響モデルｎ２３３に対応する音声認識辞書が記憶されている。更に、音響パターン認識部２５は、音声認識に利用した音響モデル毎の「認識結果テキスト」と、対応する「モデル環境値」別の認識辞書を実際の環境下で認識実行した時の「認識スコア」の組とを、認識結果として１つ以上音声認識結果格納部３へ出力する。 The plurality of acoustic models provided in the acoustic model storage unit 23 is, for example, “runs at a speed of 60 [km / h] by the vehicle A and is operated by an air conditioner) in order to cope with noise that changes in the environment. The acoustic models suitable for specific environments such as “when the air volume is 30 [%]”, and the conditions for creating each acoustic model are referred to as “model environment values” in this embodiment. The speech recognition dictionary unit 24 stores speech recognition dictionaries corresponding to a plurality of acoustic models a231, acoustic models b232,. ing. Furthermore, the acoustic pattern recognition unit 25 recognizes the “recognition score” obtained when the recognition dictionary for each acoustic model used for speech recognition and the corresponding recognition dictionary for each “model environment value” are recognized and executed in an actual environment. Are output to the speech recognition result storage unit 3 as one or more recognition results.

一方、環境評価部４は、車両の走行速度を検出する車速センサ５、車両に搭載された空調装置の制御を行うエアコンＥＣＵ６、車両に搭載された音響装置の制御を行うオーディオＥＣＵ７等の出力信号を監視して、車速センサ５、エアコンＥＣＵ６、オーディオＥＣＵ７等の各センサ値や各制御ＥＣＵの状態を、一時記憶部８に一時的に保存すると共に、音響モデル格納部２３に備えられた音響モデルａ２３１、音響モデルｂ２３２・・・、音響モデルｎ２３３それぞれの「モデル信頼度」を算出する処理部である。 On the other hand, the environment evaluation unit 4 outputs output signals from a vehicle speed sensor 5 that detects the traveling speed of the vehicle, an air conditioner ECU 6 that controls an air conditioner mounted on the vehicle, an audio ECU 7 that controls an acoustic device mounted on the vehicle, and the like. And the sensor values of the vehicle speed sensor 5, air conditioner ECU 6, audio ECU 7, etc. and the state of each control ECU are temporarily stored in the temporary storage unit 8 and the acoustic model provided in the acoustic model storage unit 23. a processing unit that calculates “model reliability” of each of a231, acoustic model b232..., acoustic model n233.

具体的には、一時記憶部８に保存された各センサ値や各制御ＥＣＵの状態を本実施例では「実環境値」と呼ぶものとし、環境評価部４は、「実環境値」と「モデル環境値」、及び車両固有の値である「車両係数」を用いて下記（１）式によりモデル信頼度を算出する。なお、「車両係数」は、各音響モデルを設定したときの車両と、実際に使用される車両との個体差を補正する係数値である。なお、下記（１）式において、“ＡＢＳ（）”はカッコ内の値の絶対値をとることを意味する。 Specifically, the sensor values stored in the temporary storage unit 8 and the states of the control ECUs are referred to as “actual environment values” in the present embodiment, and the environment evaluation unit 4 includes the “actual environment values” and “ The model reliability is calculated by the following equation (1) using the “model environment value” and the “vehicle coefficient” which is a value unique to the vehicle. The “vehicle coefficient” is a coefficient value for correcting individual differences between the vehicle when each acoustic model is set and the vehicle actually used. In the following formula (1), “ABS ()” means to take the absolute value of the value in parentheses.

なお、上記（１）式における記号の意味は、以下表１に示す通りとする。 The meanings of symbols in the above formula (1) are as shown in Table 1 below.

また、環境評価部４は、「モデル信頼度」と「認識スコア」を用いて、各音響モデルに対応した全ての「認識結果テキスト」に対する「実環境スコア」を算出する。 In addition, the environment evaluation unit 4 calculates “actual environment scores” for all “recognition result texts” corresponding to each acoustic model, using “model reliability” and “recognition score”.

また、最終結果演算部９は、音声認識結果格納部３の各音声認識結果テーブルに格納されている全ての「認識結果テキスト」を比較して、最適と思われる結果を最終認識結果として出力する処理部であって、具体的には、音声認識結果格納部３の各音声認識結果テーブルに同一の「認識結果テキスト」が存在した場合は、一時記憶部８に保存された「実環境値」を参照しながら、各音響モデルに対応してそれぞれ得られた「認識結果テキスト」の「実環境スコア」を全て加算し、各「認識結果テキスト」の「最終評価値」として最終結果格納部１０の最終結果テーブル１０１に格納する。そして、「最終評価値」の高い順に「認識結果テキスト」を出力する。なお、最終結果演算部９の処理について、詳細は後述する。 Further, the final result calculation unit 9 compares all “recognition result texts” stored in each speech recognition result table of the speech recognition result storage unit 3 and outputs a result that seems to be optimal as a final recognition result. Specifically, when the same “recognition result text” exists in each speech recognition result table of the speech recognition result storage unit 3, the “real environment value” stored in the temporary storage unit 8 is a processing unit. , The “real environment score” of the “recognition result text” obtained corresponding to each acoustic model is added, and the final result storage unit 10 is used as the “final evaluation value” of each “recognition result text”. Is stored in the final result table 101. Then, “recognition result text” is output in descending order of “final evaluation value”. Details of the processing of the final result calculation unit 9 will be described later.

（音声認識処理）
次に、本実施例の車載音声認識システムの動作について図面を参照して詳細に説明する。図２は、本実施例の車載音声認識システムの音声認識処理動作を示すフローチャートである。
図２において、まずマイク１から音声が入力されると（ステップＳ１）、音声認識部２は、音声入力開始・終了を環境評価部４に通知する（ステップＳ２）。
一方、環境評価部４では、車速センサ５、エアコンＥＣＵ６、オーディオＥＣＵ７等の出力信号を監視して、音声入力中における制御ＥＣＵの状態やセンサ値等の「実環境値」を一時記憶部８に保存する（ステップＳ３）。 (Voice recognition processing)
Next, the operation of the in-vehicle speech recognition system of this embodiment will be described in detail with reference to the drawings. FIG. 2 is a flowchart showing the speech recognition processing operation of the in-vehicle speech recognition system of the present embodiment.
In FIG. 2, when voice is first input from the microphone 1 (step S1), the voice recognition unit 2 notifies the environment evaluation unit 4 of the start / end of voice input (step S2).
On the other hand, the environment evaluation unit 4 monitors output signals from the vehicle speed sensor 5, the air conditioner ECU 6, the audio ECU 7, and the like, and the “actual environment value” such as the state of the control ECU and the sensor value during voice input is stored in the temporary storage unit 8. Save (step S3).

一方、マイク１から入力された音声は、Ａ／Ｄ変換部２１によりディジタル音声信号に変換され、周波数分析部２２によりスペクトルを分析する周波数分析が実行される（ステップＳ４）。
また、周波数分析部２２により、周波数分析結果からディジタル音声信号の周波数的な特徴が抽出される（ステップＳ５）。
また、ディジタル音声信号の周波数的な特徴が抽出されたら、次に音響パターン認識部２５は、音響モデル格納部２３に備えられた音響モデルａ２３１、音響モデルｂ２３２・・・、音響モデルｎ２３３それぞれと、音声認識辞書部２４を利用して、音響モデル毎に音声認識を実行する（ステップＳ６）。 On the other hand, the voice input from the microphone 1 is converted into a digital voice signal by the A / D converter 21 and the frequency analysis for analyzing the spectrum is executed by the frequency analyzer 22 (step S4).
Further, the frequency analysis unit 22 extracts the frequency characteristics of the digital audio signal from the frequency analysis result (step S5).
When the frequency characteristics of the digital audio signal are extracted, the acoustic pattern recognition unit 25 then selects the acoustic model a231, acoustic model b232,..., Acoustic model n233 provided in the acoustic model storage unit 23, respectively. Using the speech recognition dictionary unit 24, speech recognition is executed for each acoustic model (step S6).

また、音響パターン認識部２５は、「認識結果テキスト」と「認識スコア」を、音声認識結果格納部３に備えられた音声認識結果テーブルａ３１、音声認識結果テーブルｂ３２・・・、音声認識結果テーブルｎ３３等に、音響モデル毎に格納する（ステップＳ７）。
一方、環境評価部４は、「実環境値」「車両係数」「モデル環境値」から、上述の（１）式を利用して音響モデル毎に「モデル信頼度」を算出する（ステップＳ８）。
更に、環境評価部４は、全ての認識結果に対して、「モデル信頼度」と「認識スコア」から「実環境スコア」を算出し、各音声認識結果テーブルへ格納する（ステップＳ９）。 In addition, the acoustic pattern recognition unit 25 uses the “recognition result text” and the “recognition score” to generate a speech recognition result table a31, a speech recognition result table b32,. In n33 and the like, each acoustic model is stored (step S7).
On the other hand, the environment evaluation unit 4 calculates “model reliability” for each acoustic model from the “real environment value”, “vehicle coefficient”, and “model environment value” using the above-described equation (1) (step S8). .
Furthermore, the environment evaluation unit 4 calculates the “real environment score” from the “model reliability” and the “recognition score” for all the recognition results, and stores them in each speech recognition result table (step S9).

ここで、音声認識結果格納部３の音声認識結果テーブルａ３１、音声認識結果テーブルｂ３２・・・、音声認識結果テーブルｎ３３等に格納された各音響モデル毎の「モデル環境値」、「モデル信頼度」、「認識結果テキスト」、「認識スコア」、「実環境スコア」と「候補順位」について例示すると、下記の表２から表６に示すようになる。例えば、表２に示す例は、音声認識結果テーブルａ３１の内容である。また、表３に示す例は、音声認識結果テーブルｂ３２の内容である。同様に、表４に示す例は、音声認識結果テーブルｃ（図示せず）の内容である。表５に示す例は、音声認識結果テーブルｄ（図示せず）の内容である。表６に示す例は、音声認識結果テーブルｅ（図示せず）の内容である。 Here, “model environment value” and “model reliability” for each acoustic model stored in the speech recognition result table a31, the speech recognition result table b32..., The speech recognition result table n33, etc. of the speech recognition result storage unit 3. "Recognition result text", "Recognition score", "Real environment score", and "Candidate ranking" are shown in Tables 2 to 6 below. For example, the example shown in Table 2 is the content of the speech recognition result table a31. The example shown in Table 3 is the contents of the speech recognition result table b32. Similarly, the example shown in Table 4 is the contents of a speech recognition result table c (not shown). The example shown in Table 5 is the contents of the speech recognition result table d (not shown). The example shown in Table 6 is the contents of the speech recognition result table e (not shown).

なお、表２から表６に示す例は、実際には実環境として、時速５５［ｋｍ／ｈ］で走行中において、エアコンの風量を２０［％］とし、ワイパは未使用で、車両の窓は全閉、更にオーディオ音量を２０［ｄＢ］とした状態で発話された「小沢から宇都宮」という音声の認識結果である。この場合、実環境上の変数はそれぞれＶ＝５５、Ｔ＝２０、Ｗ＝０、Ｏ＝０、Ａ＝２０となる。また、表２から表６に示す例では、車両係数Ｃ＝１．２としている。なお、この車両係数は、音響モデル環境を設定した際の車両と、実環境における車両との音響特性の差分を補正するための値である。 In the examples shown in Tables 2 to 6, the actual environment is actually running at 55 [km / h] per hour, the air volume of the air conditioner is 20 [%], the wiper is unused, and the vehicle window Is a recognition result of the voice “Ozawa to Utsunomiya” uttered with the audio volume set to 20 [dB] in the fully closed state. In this case, the variables in the actual environment are V = 55, T = 20, W = 0, O = 0, and A = 20, respectively. In the examples shown in Tables 2 to 6, the vehicle coefficient C is 1.2. The vehicle coefficient is a value for correcting a difference in acoustic characteristics between the vehicle when the acoustic model environment is set and the vehicle in the actual environment.

次に、最終結果演算部９は、音声認識結果格納部３の各音声認識結果テーブルに格納されている全ての「認識結果テキスト」を比較し、同一の「認識結果テキスト」が存在した場合は、一時記憶部８に保存された「実環境値」を参照しながら、各音響モデルに対応してそれぞれ得られた「認識結果テキスト」の「実環境スコア」を全て加算し、各「認識結果テキスト」の「最終評価値」として最終結果格納部１０の最終結果テーブル１０１に格納する（ステップＳ１０）。 Next, the final result calculation unit 9 compares all “recognition result texts” stored in the respective speech recognition result tables of the speech recognition result storage unit 3 and if the same “recognition result text” exists. , While referring to the “real environment values” stored in the temporary storage unit 8, all the “real environment scores” of the “recognition result texts” obtained corresponding to the respective acoustic models are added, and each “recognition result” The “final evaluation value” of “text” is stored in the final result table 101 of the final result storage unit 10 (step S10).

下記表７は、最終結果格納部１０の最終結果テーブル１０１に格納された各「認識結果テキスト」とその「最終評価値」、及び「候補順位」の一例を示す表であって、各音響モデルに対応してそれぞれ得られた「認識結果テキスト」及び「実環境スコア」は、上記の表２から表６に示す例に対応している。 Table 7 below is a table showing an example of each “recognition result text”, its “final evaluation value”, and “candidate ranking” stored in the final result table 101 of the final result storage unit 10. The “recognition result text” and the “real environment score” respectively obtained corresponding to the above correspond to the examples shown in Tables 2 to 6 above.

上記表７に示すように、本実施例の車載音声認識システムは、「最終評価値」の高い順に「認識結果テキスト」を順位付けして出力する（ステップＳ１１）。 As shown in Table 7 above, the in-vehicle speech recognition system according to the present embodiment ranks and outputs “recognition result text” in descending order of “final evaluation value” (step S11).

なお、本実施例では、複数の音響モデルに対して認識処理を行う場合、音声認識部２に対する計算負荷が高く（重く）なるため、外部サーバや車載されている他のＥＣＵ等に処理を分散させる分散処理を行い、音声認識部２に対する負荷を軽減して、音声認識の処理を加速するようにしても良い。 In this embodiment, when the recognition process is performed on a plurality of acoustic models, the calculation load on the speech recognition unit 2 is high (heavy), so the process is distributed to an external server, another on-vehicle ECU, or the like. The distributed processing may be performed to reduce the load on the speech recognition unit 2 and accelerate the speech recognition processing.

また、本実施例では、環境評価部４が実環境判定部とモデル信頼度算出部とを備えている。具体的には、環境評価部４が実行するステップＳ３の処理が実環境判定部に相当し、環境評価部４が実行するステップＳ８の処理がモデル信頼度算出部に相当する。 In the present embodiment, the environment evaluation unit 4 includes an actual environment determination unit and a model reliability calculation unit. Specifically, the process of step S3 executed by the environment evaluation unit 4 corresponds to a real environment determination unit, and the process of step S8 executed by the environment evaluation unit 4 corresponds to a model reliability calculation unit.

以上説明したように、本実施例の車載音声認識システムによれば、利用者が発話する音声を入力するためのマイク１と、車両の特定の環境条件を検出する車速センサ５、エアコンＥＣＵ６、オーディオＥＣＵ７等の車両センサ部と、特定の環境条件毎に予め記憶された複数の音響モデルに対応する音声認識辞書部２４とを備え、マイク１から音声信号が入力されると、音響パターン認識部２５が、音声認識辞書部２４を利用して該音声信号を音声認識し、音声認識の結果をスコア化すると共に、環境評価部４が、車両センサ部の出力から実際の車両環境条件を判定し、該判定結果と予め記憶された複数の音響モデルの環境条件とを比較して該音響モデルのモデル信頼度を算出する。そして、最終結果演算部９が、算出されたモデル信頼度により、音響パターン認識部２５で算出された音声認識結果のスコアを補正し、総合的な音声認識結果を算出する。 As described above, according to the in-vehicle voice recognition system of the present embodiment, the microphone 1 for inputting the voice spoken by the user, the vehicle speed sensor 5 for detecting a specific environmental condition of the vehicle, the air conditioner ECU 6, the audio A vehicle sensor unit such as an ECU 7 and a speech recognition dictionary unit 24 corresponding to a plurality of acoustic models stored in advance for each specific environmental condition. When a speech signal is input from the microphone 1, the acoustic pattern recognition unit 25 However, the speech recognition dictionary unit 24 is used to recognize the speech signal, score the result of speech recognition, and the environment evaluation unit 4 determines the actual vehicle environment condition from the output of the vehicle sensor unit, The model reliability of the acoustic model is calculated by comparing the determination result with environmental conditions of a plurality of acoustic models stored in advance. Then, the final result calculation unit 9 corrects the score of the speech recognition result calculated by the acoustic pattern recognition unit 25 based on the calculated model reliability, and calculates a comprehensive speech recognition result.

これにより、音響パターン認識部２５で算出された音声認識結果のスコアを、実際の車両環境条件から判断された音響モデルのモデル信頼度により補正し、総合的な音声認識結果を算出することができる。
従って、環境により変化するノイズに正確に対応することで、環境変化の影響を受けにくい音声認識を実行することができるという効果が得られる。 Thereby, the score of the speech recognition result calculated by the acoustic pattern recognition unit 25 is corrected by the model reliability of the acoustic model determined from the actual vehicle environmental conditions, and a comprehensive speech recognition result can be calculated. .
Therefore, by accurately responding to noise that varies depending on the environment, it is possible to perform speech recognition that is less susceptible to environmental changes.

本発明の一実施例の車載音声認識システムの全体構成を示すブロック図である。1 is a block diagram illustrating an overall configuration of an in-vehicle speech recognition system according to an embodiment of the present invention. 同実施例の車載音声認識システムの音声認識処理動作を示すフローチャートである。It is a flowchart which shows the speech recognition process operation | movement of the vehicle-mounted speech recognition system of the Example.

Explanation of symbols

１マイク（音声入力部）
４環境評価部（実環境判定部、モデル信頼度算出部）
５車速センサ（車両センサ部）
６エアコンＥＣＵ（車両センサ部）
７オーディオＥＣＵ（車両センサ部）
９最終結果演算部（最終結果算出部）
２４音声認識辞書部（音声認識辞書部）
２５音響パターン認識部（音声認識実行部）
Ｓ３実環境判定部
Ｓ８モデル信頼度算出部
２３１音響モデルａ（音響モデル）
２３２音響モデルｂ（音響モデル）
２３３音響モデルｎ（音響モデル）

1 Microphone (voice input unit)
4 Environmental evaluation department (real environment judgment department, model reliability calculation department)
5 Vehicle speed sensor (vehicle sensor)
6 Air conditioner ECU (vehicle sensor unit)
7 Audio ECU (vehicle sensor unit)
9 Final result calculation unit (final result calculation unit)
24 Voice recognition dictionary part (voice recognition dictionary part)
25 Acoustic pattern recognition unit (voice recognition execution unit)
S3 Real environment determination unit S8 Model reliability calculation unit 231 Acoustic model a (acoustic model)
232 Acoustic model b (acoustic model)
233 Acoustic model n (acoustic model)

Claims

A voice input unit for inputting voice spoken by the user;
As a specific environmental condition of the vehicle, a vehicle sensor unit for detecting at least one of a vehicle speed, an air conditioner air volume, a wiper speed, a window opening, and an audio volume ;
A speech recognition dictionary corresponding to a plurality of acoustic models stored in advance for each of the specific environmental conditions;
A voice recognition execution unit that recognizes a voice signal from the voice input unit using the voice recognition dictionary unit, scores a result of the voice recognition, and obtains a recognition score ;
An actual environment determination unit that determines an actual vehicle environmental condition from the output of the vehicle sensor unit;
The real and actual vehicle environment condition determined by the environment determination section, the model reliability of the acoustic model which the difference becomes more larger lower the environmental conditions of the plurality of acoustic models stored in advance for each of the specific vehicle environmental conditions A model reliability calculation unit for calculating
The real environment score is calculated by accumulating the model reliability calculated in the model reliability calculation unit and the recognition score of the acoustic model, and a final speech recognition result is calculated based on the real environment score A vehicle-mounted speech recognition system comprising a result calculation unit.