JP2004325688A

JP2004325688A - Speech recognition system

Info

Publication number: JP2004325688A
Application number: JP2003119018A
Authority: JP
Inventors: Kazuaki Minami; 見並　　一明; Izuru Yagakinai; 出野垣内
Original assignee: Toyota Motor Corp; Toyota InfoTechnology Center Co Ltd
Current assignee: Toyota Motor Corp; Toyota InfoTechnology Center Co Ltd
Priority date: 2003-04-23
Filing date: 2003-04-23
Publication date: 2004-11-18

Abstract

<P>PROBLEM TO BE SOLVED: To provide speech recognition technology improving a speech recognition result. <P>SOLUTION: Provided are a center server 100 equipped with a plurality of speech recognizing means 102 and a user terminal 200 which acquires a speech recognizing means 102 from the center server 100 and performs speech recognition processing by the acquired speech recognizing means 103. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識技術に関する。
【０００２】
【従来の技術】
一般に、自動車に搭載されるカーナビゲーションシステムなどの、移動体用の情報端末において、端末操作のインタフェースとして音声認識手段が用いられている。移動体用の情報端末において音声認識手段による操作を行うことで、ユーザは、地名検索或いは目的地の設定等の、キーボード等の複雑な入力操作を情報端末に対して行うことなく、当該情報端末を利用していた。
【０００３】
ところで、上記移動体用の情報端末の音声認識に関して、音声認識結果を向上させるために、様々な技術が鑑みられてきた。このような、音声認識に関する技術として、情報センタとデータをやりとりできる車載用情報端末に音声認識用テーブルを送信する技術（例えば、特許文献１参照）が開示されている。
【０００４】
【特許文献１】
特開２０００−１０５６８１号公報
【０００５】
【発明が解決しようとする課題】
しかしながら、特許文献１の技術では、音声認識に用いられている音声認識用テーブルを情報センタから車載用情報端末に提供して、この音声認識用テーブルの情報を更新していた。そして、車載用情報端末に備えられた音声認識手段が、この更新された音声認識用テーブルを用いることで、音声認識結果を向上させようとしていた。すなわち、特許文献１の技術において、車載用情報端末において実際にユーザが利用できる音声認識手段は、予め車載用情報端末に備えられた音声認識手段のみであった。
【０００６】
情報端末において、音声認識手段とは、ユーザから発せられた音声情報から、特徴的なパターン情報などを抽出するものである。従って、音声認識用テーブルによって音声認識に用いる情報を更新したとしても、抽出する手段が単一であるため、音声認識結果を向上させることは容易ではなかった。
【０００７】
本発明は上記事項に鑑みて為されたものであり、音声認識手段における音声認識結果を向上することを、解決すべき課題とする。
【０００８】
【課題を解決するための手段】
本発明は前記課題を解決するために、以下の手段を採用した。
【０００９】
すなわち、本発明は、複数の音声認識手段を備えるセンタサーバと、センタサーバから前記音声認識手段を取得し、取得した音声認識手段により音声認識処理を行うユーザ端末とを有する。
【００１０】
本発明では、ユーザ端末が、センタサーバにある複数の音声認識手段から音声認識手段を取得する。従って、本発明によれば、ユーザの操作指示に沿った正確な音声認識結果を生成する音声認識手段を取得することができる。
【００１１】
また、本発明は、前記ユーザ端末が、端末音声認識手段を備え、センタサーバから音声認識手段を取得した場合には、前記端末音声認識手段に換えて前記音声認識手段により音声認識処理を行ってもよい。
【００１２】
本発明では、センタサーバから取得した音声認識手段を、ユーザ端末に予め備えられた音声認識手段に換えて音声認識処理を行う。従って、本発明によれば、ユーザ端末において、ユーザの操作指示に沿った正確な音声認識結果を生成する音声認識手段によって音声認識処理を行うことができる。
【００１３】
また、本発明は、前記センタサーバが、前記ユーザ端末からの音声情報を取得する音声情報取得手段をさらに備え、前記複数の音声認識手段が、取得した音声情報に対して個々の音声認識手段毎に音声認識処理を行ってもよい。
【００１４】
本発明では、センタサーバがユーザ端末から音声情報を取得して、複数の音声認識手段によって音声認識処理を行う。従って、本発明によれば、ユーザ端末に備えられた音声認識手段だけでなく、複数の音声認識手段によって生成した音声認識結果からユーザの操作指示に沿った正確な音声認識結果を選択することができる。
【００１５】
また、本発明は、前記センタサーバでは、個々の音声認識手段が、それぞれ音声認識処理の認識結果を生成し、前記認識結果に基づいて、前記ユーザ端末に提供する音声情報取得手段を決定し、前記ユーザ端末では、前記センタサーバが提供した音声認識手段を取得してもよい。
【００１６】
本発明では、認識結果によってユーザ端末が取得する音声認識手段が決定する。従って、本発明によれば、ユーザの操作指示に沿った正確な音声認識結果を得ることができる。
【００１７】
また、本発明は、前記センタサーバでは、個々の音声認識手段が、それぞれ音声認識処理の認識結果を生成し、前記認識結果を、前記ユーザ端末に提供してもよい。
【００１８】
本発明では、センタサーバがユーザ端末から音声情報を取得して、複数の音声認識手段によって音声認識処理を行う。従って、本発明によれば、ユーザ端末に備えられた音声認識手段だけでなく、複数の音声認識手段によって生成した音声認識結果からユーザの操作指示に沿った正確な音声認識結果を選択することができる。
【００１９】
前記センタサーバが、ユーザ端末が取得した、音声認識手段ないし認識結果に関する履歴情報を格納する、履歴データベースをさらに備え、当該履歴データベースに基づいて、前記ユーザ端末に提供する音声認識手段ないし認識結果を決定してもよい。
【００２０】
従って、本発明によれば、過去の音声認識結果に応じてユーザに最適な音声認識結果を当該ユーザ端末に提供することができる。
【００２１】
前記センタサーバが、個々のユーザ端末に関連付けられたユーザ端末情報を格納する、端末データベースをさらに備え、当該端末データベースに基づいて、前記ユーザ端末に提供する音声情報取得手段を決定してもよい。
【００２２】
従って、本発明によれば、ユーザ端末に最適な音声認識結果を、当該ユーザ端末に提供することができる。
【００２３】
また、本発明は、以上の何れかの機能を実現させるプログラムであってもよい。また、本発明は、そのようなプログラムをコンピュータが読み取り可能な記憶媒体に記録してもよい。
【００２４】
さらに、本発明は、以上の何れかの機能を実現する装置であってもよい。
【００２５】
【発明の実施の形態】
以下、図面を参照して、本発明の実施の形態に係る音声認識システムを図１から図５の図面に基づいて説明する。
【００２６】
〈システム構成〉
図１は、本実施の形態に係る音声認識システムの概略構成図の一例である。図１に示すように、本実施の形態に係る音声認識システムは、センタサーバ１００と、本発明のユーザ端末の一例である車載用情報端末２００とを備える。そして、本音声認識システムでは、センタサーバ１００と車載用情報端末２００とが、既存の通信網を利用したネットワーク３００を介して接続している。なお、このネットワーク３００としては、公衆の携帯電話網、無線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）用通信網、及びＥＴＣ（ＥｌｅｃｔｒｏｎｉｃＴｏｌｌＣｏｌｌｅｃｔｉｏｎＳｙｓｔｅｍ）用無線通信網等の、既存の様々な通信網を用いることができる。また、図１において、センタサーバ１００に対して、一つの車載用情報端末２００が接続されているが、本実施の形態に係る音声認識システムでは、図示しない複数の車載用情報端末２００がネットワーク３００を介して接続している。
【００２７】
センタサーバ１００は、サーバ装置などの、既存の情報処理装置によって構成される。本実施の形態において、これら既存の情報処理装置に、本発明の音声認識システムを実現するプログラムを導入（インストール）する。このプログラムをインストールすることによって、既存の情報処理装置を本実施の形態に係るセンタサーバ１００として用いることができる。
【００２８】
図１に示すように、本実施の形態に係るセンタサーバ１００は、以下の構成要素を備える。すなわち、本センタサーバ１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１、音声認識エンジン１０３及び音声認識データベース１０４とを有する音声認識手段１０２、履歴データベース１０５、端末データベース１０６、及び通信手段１０７とを備える。
【００２９】
車載用情報端末２００は、プログラムに基づいて各種情報処理を行うＣＰＵ２０１、音声情報に基づいて音声認識結果を生成する音声認識エンジン２０３及び音声認識結果を生成する際に音声認識エンジン１０３が参照する音声認識データベース２０４を備える音声認識手段２０２、ユーザの操作指示を音声指示で受け付けるマイク等の音声入力手段２０５、ユーザへの音声応答を生成する音声合成手段２０６、上記音声応答を出力するスピーカ等の音声出力手段２０７、液晶ディスプレイ等の表示手段２０８、及びセンタサーバ１００との通信に用いる通信手段２０９とを備える。
【００３０】
〈センタサーバの構成〉
次に、本センタサーバ１００の構成要素について説明する。ＣＰＵ１０１は、ハードディスク装置等の記憶装置（不図示）にインストールされたプログラムを実行して、本音声認識システムに係るセンタサーバの各種機能を実現する。
【００３１】
音声認識手段１０２は、音声情報に基づいて音声パターン情報を抽出し、この抽出した音声パターン情報及び音声認識データベース１０４から音声認識結果を生成する、音声認識エンジン１０３を備える。また、音声認識手段１０２は、音声認識結果を生成するための参照用音声パターン情報を格納する、音声認識データベース１０４を備える。なお、本音声認識システムにおいて、音声認識手段１０２の音声認識エンジン１０３とは、ＣＰＵ１０１によって実行されるプログラムによって実現される手段である。
【００３２】
音声認識エンジン１０３は、車載用情報端末２００の音声入力手段２０５に入力された音声情報から、音声パターン情報を抽出する。そして、音声認識エンジン１０３は、音声認識データベース１０４に格納されている音声パターン情報を参照して、抽出した音声パターン情報から音声認識結果を生成する。なお、本音声認識システムにおいて、抽出する音声パターン情報としては、音声情報のうちユーザが発した言葉を特定することのできる特徴となる情報が挙げられる。上記音声パターン情報を抽出する処理の一例として、取得した音声情報からユーザが発した言語の音声情報のみを抽出し、言語の分類、音階、アクセント位置、及び抑揚の付け方などを解析するアルゴリズムが挙げられる。
【００３３】
そして、本音声認識システムにおいて、センタサーバ１００には、複数の音声認識手段１０２が任意数のセット（例えば、ｎ個の音声認識手段１０２のセット）が備えられている。本音声認識システムにおいて、センタサーバ１００に複数の音声認識手段を備えるのは、以下の理由による。すなわち、センタサーバ１００は、複数の音声認識手段１０２によって、複数の音声認識結果を生成する。ユーザは、この複数の音声認識結果から、ユーザ自身の発した音声情報を正しく反映した音声認識結果を選択する。センタサーバ１００では、ユーザが選択した音声認識結果、及びその音声認識結果を生成した音声認識手段１０２を特定する情報を、履歴データベース１０５に格納する。そして、センタサーバ１００では、この履歴データベース１０５から、ユーザの車載用情報端末２００に提供する音声認識手段１０２を決定する。
【００３４】
このため、それぞれの音声認識手段１０２では、音声認識エンジン１０３における音声パターン情報を抽出する処理である、上述のアルゴリズムが異なる。また、それぞれの音声認識手段１０２では、音声認識データベース１０４に格納されている音声パターン情報が異なる。従って、本センタサーバ１００では、それぞれの音声認識手段１０２毎に異なる音声認識結果を生成することができる。
【００３５】
履歴データベース１０５は、複数の音声認識手段１０２による音声認識結果のうち、ユーザによって選択された音声認識結果を生成した音声認識手段１０２の情報を個々の車載用情報端末２００毎に対応付けて、履歴情報として格納している。本実施の形態において、履歴データベース１０５とは、ユーザの車載用情報端末２００に提供する最適な音声認識手段１０２、及び最適な音声認識結果を決定するために、センタサーバ１００が用いる。そして、ユーザの車載用情報端末２００に提供する最適な音声認識手段１０２、及び最適な音声認識結果を決定するために、履歴データベース１０５には、過去に提供した音声認識手段１０２、及び音声認識結果の統計情報（例えば、過去に提供した音声認識手段１０２及び音声認識結果を特定する番号、或いは個々の音声認識手段１０２毎の車載用情報端末２００へ提供された提供回数など）が格納される。
【００３６】
図２は、上述の履歴データベース１０５のデータテーブルの一例である。履歴データベース１０５には、番号１０５ａ、認識結果生成日時１０５ｂ、本音声認識システムを利用したユーザを特定するユーザＩＤ１０５ｃ、ユーザＩＤ１０５ｃのユーザが選択した音声認識手段１０２を特定する統計情報の一例である選択結果ナンバー１０５ｄ、及び音声認識を行った音声情報の内容１０５ｅが格納されている。そして、この履歴データベース１０５では、最適な音声認識手段１０２及び音声認識結果を特定する統計情報を生成するために、個々の車載用情報端末２００を利用するユーザに付与されるユーザＩＤ１０５ｃと、選択結果ナンバー１０５ｄとが対応付けられている。
【００３７】
端末データベース１０６は、本音声認識システムを利用する車載用情報端末２００に関する情報を格納している。
【００３８】
図３は、上述の端末データベース１０６のデータテーブルの一例である。端末データベース１０６には、番号１０６ａ、ユーザＩＤ１０６ｂ、車載用情報端末２００の機種を特定する端末ナンバー１０６ｃ、ＣＰＵの型番１０６ｄ、メインメモリ容量１０６ｅ、ハードディスク装置の有無１０６ｆ、ハードディスク装置の容量１０６ｇなどの、車載用情報端末２００の処理能力、記憶容量などの性能を示す情報を、個々のユーザの車載用情報端末２００毎に関連付けて格納している。
【００３９】
センタサーバ１００は、この端末データベース１０６の情報に基づいて、個々の車載用情報端末２００に提供する最適な音声認識手段１０２に含まれるデータ量を調整する。
【００４０】
通信手段１０７は、ネットワーク３００を介して、個々のユーザの車載用情報端末２００と通信するために備えられている。この通信手段１０７は、車載用情報端末２００から送信される音声情報を取得する。また、通信手段１０７は、複数のうち、当該車載用情報端末２００に最適な音声認識手段１０２を当該ユーザの車載用情報端末２００に、ダウンロード可能な状態で提供する。
【００４１】
〈処理フローチャート〉
次に、本実施の形態に係る、音声認識システムによる音声認識処理について、フローチャートを用いて説明する。
【００４２】
図４は、本音声認識システムにおける、センタサーバ１００との通信による車載用情報端末２００の音声認識処理を説明するフローチャートである。
【００４３】
まず、車載用情報端末２００は、音声入力手段２０５が、ユーザの操作指示等のユーザが発した音声を音声情報として取得する（図４におけるステップ１０１、以下Ｓ１０１のように省略する）。
【００４４】
音声入力手段２０５が取得した音声情報は、以前センタサーバ１００から取得した音声認識手段２０２、或いは予め車載用情報端末２００に備えられた音声認識手段２０２に入力される（Ｓ１０２）。上記音声情報を受け付けた音声認識手段２０２は、音声認識エンジン２０３と音声認識データベース２０４とによって、音声認識処理を行い、音声認識結果を生成する（Ｓ１０３）。そして、車載用情報端末２００は、この音声認識結果とユーザが発した音声とが一致するか否かを確認するために、スピーカ等の音声出力手段２０７からこの音声認識結果を出力する。
【００４５】
そして、車載用情報端末２００は、ユーザに対して当該音声認識結果が、ユーザの操作指示に対応しているか否か、すなわち、当該音声認識結果にユーザが満足しているか否かを確認する（Ｓ１０４）。ここで、当該音声認識結果に満足しているか否かとは、当該音声認識結果が、当該音声情報を発したユーザの意図した操作指示を反映しているか否か、と言うことを示す。また、このとき、当該音声認識結果にユーザが満足しているか否かを確認するために、車載用情報端末２００は、音声入力手段２０５、或いはキーボード（不図示）等の入力手段からユーザの入力を受け付ける。
【００４６】
なお、ステップ１０４において、当該音声認識結果に満足している旨の入力が当該ユーザから為された場合には、車載用情報端末２００は、ステップ１０２に戻り、次の音声情報の受け付けに対して待機する。
【００４７】
当該音声認識結果にユーザが満足していない場合には、車載用情報端末２００は、通信手段２０９によって、当該音声情報を、ネットワーク３００を介してセンタサーバ１００に送信する（Ｓ１０５）。
【００４８】
センタサーバ１００の通信手段１０７は、車載用情報端末２００から送信された当該音声情報を、ネットワーク３００を介して受信する（Ｓ１０６）。
【００４９】
センタサーバ１００は、複数の音声認識手段１０２によって、音声認識処理を行う。まず、個々の音声認識手段１０２では、取得した当該音声情報から、音声パターン情報を抽出する。そして、音声認識エンジン１０３は、抽出した音声パターン情報と音声認識データベース１０４に格納された参照用音声パターン情報とから、音声認識結果を生成する（Ｓ１０７）。
【００５０】
例えば、本実施の形態において、センタサーバ１００では、任意数の音声認識手段１０２として、１０個の音声認識手段１０２を備えているとする。そして、取得した音声情報に基づいて１０個の音声認識手段１０２のうち、７個の音声認識手段１０２が音声認識結果「Ａ」を生成し、３個の音声認識手段１０２が音声認識結果「Ｂ」を生成したとする。
【００５１】
そして、センタサーバ１００は、生成した音声認識結果を車載用情報端末２００に送信する（Ｓ１０８）。
【００５２】
なお、車載用情報端末２００に送信する音声認識結果の提供方法として、複数の音声認識結果のうち、最も多くの音声認識手段によって生成された音声認識結果を送信してもよい。例えば、上記のステップ１０７における音声認識結果の生成例であれば、音声認識結果「Ａ」を、当該車載用情報端末２００に提供する。
【００５３】
また、当該音声認識結果の提供方法として、履歴データベース１０５に基づいて、過去に同一の車載用情報端末２００に提供した履歴がある音声認識手段１０２によって生成された音声認識結果を送信してもよい。例えば、上記のステップ１０７における音声認識結果の生成例であれば、過去の履歴に基づいて選択された音声認識結果が、音声認識結果「Ｂ」であれば、この音声認識結果「Ｂ」を、当該車載用情報端末２００に提供する。
【００５４】
また、当該音声認識結果の提供方法として、例えば、複数の音声認識手段によって生成された複数の音声認識結果を、全て車載用情報端末２００に提供してもよい。さらに、当該音声認識結果の提供方法として、複数の音声認識手段によって生成された複数の音声認識結果を表示した、音声認識結果リストを提供してもよい。例えば、取得した音声情報に基づいて１０個の音声認識手段１０２のうち、７個の音声認識手段１０２が音声認識結果「Ａ」を生成し、３個の音声認識手段１０２が音声認識結果「Ｂ」を生成した場合、この音声認識結果とその結果を生成した音声認識手段１０２の数を、当該音声認識リストに表示する。
【００５５】
そして、車載用情報端末２００は、上記の各種の音声認識結果の提供方法によってセンタサーバ１００が提供した音声認識結果を取得する。なお、車載用情報端末２００が取得した音声認識結果を、ユーザに選択された音声認識結果とする（Ｓ１０９）。
【００５６】
センタサーバ１００は、このユーザに選択された音声認識結果に関する情報を履歴データベース１０５に格納して、履歴データベース１０５の情報を更新する。このとき、個々の車載用情報端末２００に対する音声認識処理の回数が所定の回数に到達した場合には、センタサーバ１００は、当該車載用情報端末２００に対して音声認識手段１０２の提供処理を行う（Ｓ１１０）。
【００５７】
そして、車載用情報端末２００は、センタサーバ１００が提供する音声認識手段１０２のダウンロードを行う（Ｓ１１１）。なお、この音声認識手段１０２の提供処理については、図５に示すフローチャートを用いて、後に説明する。
【００５８】
次に、本音声認識システムにおいて、センタサーバ１００の音声認識手段１０２を、車載用情報端末２００の音声認識手段２０２として提供する、音声認識手段１０２の提供処理について、図５のフローチャートを用いて説明する。
【００５９】
センタサーバ１００は、センタサーバ１００での個々の車載用情報端末２００に対する音声認識処理の延べ回数が、所定の回数（例えば、定数Ｍの倍数）に達したか否かを判断する（図５におけるステップ２０１、以下Ｓ２０１と省略する）。このとき、個々の車載用情報端末２００に対する音声認識処理の延べ回数が所定の回数でない場合には、センタサーバ１００は、本処理を終了する。
【００６０】
個々の車載用情報端末２００に対する音声認識処理の延べ回数が、上記所定の回数である場合、センタサーバ１００は、履歴データベース１０５を参照する。センタサーバ１００は、上記統計情報を格納した履歴データベース１０５を参照して、複数の音声認識手段１０２のうち、当該車載用情報端末２００からの音声情報に対して、生成した音声認識結果が最も多く選択された音声認識手段１０２を、当該車載用情報端末２００に対して最適な音声認識手段１０２と判断する（Ｓ２０２）。
【００６１】
例えば、本実施の形態において、センタサーバ１００に備えられた１０個の音声認識手段１０２が音声認識結果を生成するとする。このとき、上記１０個の音声認識手段１０２のうち、「音声認識手段１０２・１」が当該車載用情報端末２００対して最も多く音声認識結果を提供したことが、センタサーバ１００によって履歴データベース１０５の情報から判断される。そして、センタサーバ１００は、この「音声認識手段１０２・１」を当該車載用情報端末２００に対して最適な音声認識手段１０２と判断する。
【００６２】
最適な音声認識手段１０２を選択したセンタサーバ１００は、当該車載用情報端末２００から、端末２００の処理能力、記憶容量、予め備えられた音声認識手段２０２に関する情報等を取得する。このとき、センタサーバ１００は、当該車載用情報端末２００の処理能力及び記憶容量に応じて、最適な音声認識手段１０２の音声認識エンジン１０３及び音声認識データベース１０４の容量を調整する。
【００６３】
そして、センタサーバ１００は、音声認識手段２０２に関する情報から、当該車載用情報端末２００の音声認識エンジン２０３が最適な音声認識手段１０２の音声認識エンジン１０３と同一のものであるか否かを判断する（Ｓ２０４）。
【００６４】
当該車載用情報端末２００の音声認識エンジン２０３が最適な音声認識手段１０２の音声認識エンジン１０３と同一でない場合には、最適な音声認識エンジン１０３を当該車載用情報端末２００にダウンロードできるように提供する（Ｓ２０５）。
【００６５】
センタサーバ１００は、最適な音声認識エンジン１０３が当該車載用情報端末２００の音声認識エンジン２０３と同一の場合、及び最適な音声認識エンジン１０３のダウンロードを終了した場合に、当該車載用情報端末２００の記憶容量の情報を取得する（Ｓ２０６）。
【００６６】
そして、センタサーバ１００は、当該車載用情報端末２００の記憶容量の情報に応じて、最適な容量の音声認識データベース１０４をダウンロードできるような形態にて提供する（Ｓ２０７）。
【００６７】
〈実施の形態の効果〉
本実施の形態に係る音声認識システムを実現することにより、以下のような効果が得られる。
【００６８】
本実施の形態に係る音声認識システムによれば、個々の車載用情報端末における音声認識結果がユーザの操作指示に沿ったものでない場合であっても、センタサーバの複数の音声認識手段による複数の音声認識結果を参照することで、ユーザの操作指示に沿った正確な音声認識結果を得ることができる。
【００６９】
また、本実施の形態に係る音声認識システムによれば、センタサーバの複数の音声認識手段のうち最適な音声認識手段を、車載用情報端末の音声認識手段として提供することができる。
【００７０】
また、本実施の形態に係る音声認識システムによれば、ユーザの音声情報の特徴に応じた複数の音声認識手段によって音声認識処理を行うため、正確な音声認識結果を得ることができる。
【００７１】
〈変形例〉
本実施の形態において、本発明の音声認識システムは、主に車載用情報端末に対する音声認識処理の一例について説明したが、本発明ではこれに限らず、その他の音声認識システムに対して広く実施することができる。
【００７２】
例えば、本実施の形態に係る音声認識システムを、他の様々な情報端末における音声認識処理に対して用いることができる。そのとき、センタサーバは、本実施の形態における車載用情報端末に対する音声認識処理を、ユーザ端末の一例である上記他の様々な情報端末に対して実行する。
【００７３】
また、本音声認識システムにおいて、音声認識手段１０２の提供処理は、個々の車載用情報端末２００に対してセンタサーバ１００による音声認識処理の回数が所定の回数に達した場合に行っていたが、本実施の形態の音声認識手段１０２の提供処理において、これに限定されない。例えば、ユーザの操作指示に応じて、車載用情報端末２００に対して音声認識手段１０２の提供処理を行ってもよい。
【００７４】
また、本音声認識システムにおいて、車載用情報端末２００の情報は、当該車載用情報端末２００から取得していたが、本実施の形態ではこれに限定されない。例えば、最適な音声認識手段１０２を選択したセンタサーバ１００は、端末データベース１０６に格納された、当該車載用情報端末２００の処理能力、記憶容量、備えられた音声認識手段２０２に関する情報等の、当該車載用情報端末２００の情報を取得してもよい。また、センタサーバ１００は、当該車載用情報端末２００の処理能力及び記憶容量に応じて、最適な音声認識手段１０２の音声認識エンジン１０３及び音声認識データベース１０４の容量を調整する。
【００７５】
また、本音声認識システムにおいて、最適な音声認識手段１０２を、センタサーバ１００が車載用情報端末２００に提供する際に、提供する最適な音声認識手段１０２のリストを、当該車載用情報端末２００に提供してもよい。そして、ユーザは、この最適な音声認識手段１０２の情報を参照して、提供を受ける音声認識手段１０２を決定してもよい。
【００７６】
また、本音声認識システムにおいて、車載用情報端末２００に提供する音声認識手段１０２は、ダウンロード可能な状態で提供されたが、車載用情報端末２００からセンタサーバ１００側に取得しにいってもよい。また本音声認識システムにおいて、音声認識結果は、車載用情報端末２００からセンタサーバ１００側に取得しにいってもよい
【００７７】
【発明の効果】
本発明の音声認識システムによれば、音声認識手段における音声認識結果が向上するという優れた効果を得ることができる。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る、音声認識システムの概略構成図である。
【図２】本実施の形態に係る、履歴データベースのデータテーブルの一例である。
【図３】本実施の形態に係る、端末データベースのデータテーブルの一例である。
【図４】本実施の形態に係る、センタサーバによる音声認識処理のフローチャートの一例である。
【図５】本実施の形態に係る、センタサーバから車載用情報端末への音声認識手段の提供処理のフローチャートの一例である。
【符号の説明】
１００センタサーバ
１０１ＣＰＵ
１０２音声認識手段
１０３音声認識エンジン
１０４音声認識データベース
１０５履歴データベース
１０５ａ番号
１０５ｂ認識結果生成日時
１０５ｃユーザＩＤ
１０５ｄ選択結果ナンバー
１０５ｅ内容
１０６端末データベース
１０６ａ番号
１０６ｂユーザＩＤ
１０６ｃ端末ナンバー
１０６ｄＣＰＵ型番
１０６ｆハードディスク装置有無
１０６ｇハードディスク装置容量
１０７通信手段
２００車載用情報端末
２０１ＣＰＵ
２０２音声認識手段
２０３音声認識エンジン
２０４音声認識データベース
２０５音声入力手段
２０６音声合成手段
２０７音声出力手段
２０８表示手段
２０９通信手段
３００ネットワーク[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech recognition technology.
[0002]
[Prior art]
2. Description of the Related Art Generally, in an information terminal for a mobile body such as a car navigation system mounted on an automobile, a voice recognition unit is used as an interface for operating the terminal. By performing an operation using the voice recognition means in the information terminal for a mobile object, the user can perform a complicated input operation such as a keyboard such as a place name search or a destination setting on the information terminal without performing the operation on the information terminal. Was used.
[0003]
By the way, regarding the voice recognition of the information terminal for a mobile object, various techniques have been considered in order to improve the voice recognition result. As a technique related to such speech recognition, a technique of transmitting a speech recognition table to an in-vehicle information terminal capable of exchanging data with an information center (for example, see Patent Document 1).
[0004]
[Patent Document 1]
JP 2000-105681 A
[0005]
[Problems to be solved by the invention]
However, in the technique of Patent Literature 1, a speech recognition table used for speech recognition is provided from an information center to an in-vehicle information terminal, and information in the speech recognition table is updated. Then, the voice recognition means provided in the in-vehicle information terminal attempts to improve the voice recognition result by using the updated voice recognition table. That is, in the technology of Patent Literature 1, the only voice recognition unit that can be actually used by the user in the vehicle-mounted information terminal is the voice recognition unit provided in the vehicle-mounted information terminal in advance.
[0006]
In the information terminal, the voice recognition means extracts characteristic pattern information and the like from voice information emitted from a user. Therefore, even if the information used for speech recognition is updated by the speech recognition table, it is not easy to improve the speech recognition result because there is only one extraction unit.
[0007]
The present invention has been made in view of the above, and an object of the present invention is to improve a speech recognition result in a speech recognition unit.
[0008]
[Means for Solving the Problems]
The present invention employs the following means in order to solve the above problems.
[0009]
That is, the present invention includes a center server including a plurality of voice recognition units, and a user terminal that acquires the voice recognition unit from the center server and performs a voice recognition process using the obtained voice recognition unit.
[0010]
In the present invention, the user terminal acquires a voice recognition unit from a plurality of voice recognition units in the center server. Therefore, according to the present invention, it is possible to obtain a voice recognition unit that generates an accurate voice recognition result according to a user's operation instruction.
[0011]
Further, according to the present invention, when the user terminal includes a terminal voice recognition unit and obtains a voice recognition unit from a center server, the voice recognition unit performs a voice recognition process in place of the terminal voice recognition unit. Is also good.
[0012]
In the present invention, voice recognition processing is performed by replacing voice recognition means acquired from the center server with voice recognition means provided in the user terminal in advance. Therefore, according to the present invention, in the user terminal, the voice recognition processing can be performed by the voice recognition unit that generates an accurate voice recognition result in accordance with the operation instruction of the user.
[0013]
Further, in the invention, it is preferable that the center server further includes voice information acquiring means for acquiring voice information from the user terminal, and wherein the plurality of voice recognizing means comprises: May be subjected to voice recognition processing.
[0014]
In the present invention, the center server acquires voice information from the user terminal, and performs voice recognition processing by a plurality of voice recognition units. Therefore, according to the present invention, it is possible to select an accurate voice recognition result in accordance with a user's operation instruction from voice recognition results generated by a plurality of voice recognition means as well as voice recognition means provided in a user terminal. it can.
[0015]
Further, according to the present invention, in the center server, each voice recognition unit generates a recognition result of the voice recognition process, and determines a voice information acquisition unit to be provided to the user terminal based on the recognition result. The user terminal may acquire the voice recognition means provided by the center server.
[0016]
In the present invention, the speech recognition means acquired by the user terminal is determined based on the recognition result. Therefore, according to the present invention, it is possible to obtain an accurate speech recognition result according to a user's operation instruction.
[0017]
According to the present invention, in the center server, each of the voice recognition units may generate a recognition result of the voice recognition process, and provide the recognition result to the user terminal.
[0018]
In the present invention, the center server acquires voice information from the user terminal, and performs voice recognition processing by a plurality of voice recognition units. Therefore, according to the present invention, it is possible to select an accurate voice recognition result in accordance with a user's operation instruction from voice recognition results generated by a plurality of voice recognition means as well as voice recognition means provided in a user terminal. it can.
[0019]
The center server further includes a history database storing history information on the voice recognition unit or the recognition result obtained by the user terminal, and based on the history database, the voice recognition unit or the recognition result provided to the user terminal. You may decide.
[0020]
Therefore, according to the present invention, it is possible to provide a speech recognition result optimal for a user to the user terminal according to a past speech recognition result.
[0021]
The center server may further include a terminal database storing user terminal information associated with each user terminal, and may determine a voice information acquisition unit to be provided to the user terminal based on the terminal database.
[0022]
Therefore, according to the present invention, an optimal speech recognition result for a user terminal can be provided to the user terminal.
[0023]
Further, the present invention may be a program for realizing any of the above functions. In the present invention, such a program may be recorded on a computer-readable storage medium.
[0024]
Furthermore, the present invention may be an apparatus that realizes any of the above functions.
[0025]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a speech recognition system according to an embodiment of the present invention will be described with reference to the drawings based on the drawings of FIGS. 1 to 5.
[0026]
<System configuration>
FIG. 1 is an example of a schematic configuration diagram of a speech recognition system according to the present embodiment. As shown in FIG. 1, the voice recognition system according to the present embodiment includes a center server 100 and an in-vehicle information terminal 200 which is an example of the user terminal of the present invention. In the voice recognition system, the center server 100 and the in-vehicle information terminal 200 are connected via a network 300 using an existing communication network. As the network 300, various existing communication networks such as a public cellular phone network, a wireless LAN (Local Area Network) communication network, and an ETC (Electronic Toll Collection System) wireless communication network can be used. it can. In FIG. 1, one in-vehicle information terminal 200 is connected to the center server 100. However, in the voice recognition system according to the present embodiment, a plurality of in-vehicle information terminals 200 (not shown) Connected through.
[0027]
The center server 100 is configured by an existing information processing device such as a server device. In the present embodiment, a program that implements the speech recognition system of the present invention is installed (installed) in these existing information processing apparatuses. By installing this program, an existing information processing apparatus can be used as center server 100 according to the present embodiment.
[0028]
As shown in FIG. 1, the center server 100 according to the present embodiment includes the following components. That is, the center server 100 includes a CPU (Central Processing Unit) 101, a voice recognition unit 102 having a voice recognition engine 103 and a voice recognition database 104, a history database 105, a terminal database 106, and a communication unit 107.
[0029]
The in-vehicle information terminal 200 includes a CPU 201 that performs various types of information processing based on a program, a voice recognition engine 203 that generates a voice recognition result based on voice information, and a voice that the voice recognition engine 103 refers to when generating a voice recognition result. A voice recognition unit 202 including a recognition database 204, a voice input unit 205 such as a microphone that receives a user's operation instruction by a voice instruction, a voice synthesis unit 206 that generates a voice response to the user, and a voice such as a speaker that outputs the voice response. An output unit 207, a display unit 208 such as a liquid crystal display, and a communication unit 209 used for communication with the center server 100 are provided.
[0030]
<Center server configuration>
Next, components of the center server 100 will be described. The CPU 101 executes a program installed in a storage device (not shown) such as a hard disk device to implement various functions of the center server according to the voice recognition system.
[0031]
The voice recognition unit 102 includes a voice recognition engine 103 that extracts voice pattern information based on the voice information and generates a voice recognition result from the extracted voice pattern information and the voice recognition database 104. In addition, the voice recognition unit 102 includes a voice recognition database 104 that stores reference voice pattern information for generating a voice recognition result. In the present voice recognition system, the voice recognition engine 103 of the voice recognition means 102 is a means realized by a program executed by the CPU 101.
[0032]
The voice recognition engine 103 extracts voice pattern information from the voice information input to the voice input means 205 of the vehicle-mounted information terminal 200. Then, the voice recognition engine 103 refers to the voice pattern information stored in the voice recognition database 104 and generates a voice recognition result from the extracted voice pattern information. In the present voice recognition system, examples of the voice pattern information to be extracted include information which is a feature of the voice information that can specify a word spoken by a user. As an example of the process of extracting the voice pattern information, there is an algorithm that extracts only voice information of a language issued by a user from the obtained voice information and analyzes a language classification, a scale, an accent position, an inflection method, and the like. Can be
[0033]
In the present speech recognition system, the center server 100 is provided with an arbitrary number of sets of a plurality of speech recognition units 102 (for example, a set of n speech recognition units 102). In the present voice recognition system, the center server 100 is provided with a plurality of voice recognition means for the following reasons. That is, the center server 100 generates a plurality of speech recognition results by the plurality of speech recognition units 102. The user selects, from the plurality of speech recognition results, a speech recognition result that accurately reflects speech information issued by the user himself. The center server 100 stores in the history database 105 the speech recognition result selected by the user and information specifying the speech recognition means 102 that generated the speech recognition result. Then, the center server 100 determines the voice recognition means 102 to be provided to the user's in-vehicle information terminal 200 from the history database 105.
[0034]
For this reason, the above-described algorithms, which are processes for extracting voice pattern information in the voice recognition engine 103, are different in each voice recognition unit 102. Further, the voice pattern information stored in the voice recognition database 104 differs between the voice recognition units 102. Therefore, the center server 100 can generate a different voice recognition result for each voice recognition unit 102.
[0035]
The history database 105 associates the information of the voice recognition unit 102 that has generated the voice recognition result selected by the user among the voice recognition results by the plurality of voice recognition units 102 with each of the in-vehicle information terminals 200, and It is stored as information. In the present embodiment, the history database 105 is used by the center server 100 to determine the optimal speech recognition means 102 to be provided to the user's in-vehicle information terminal 200 and the optimal speech recognition result. Then, in order to determine the optimal speech recognition means 102 to be provided to the user's vehicle-mounted information terminal 200 and the optimal speech recognition result, the history database 105 includes the speech recognition means 102 provided in the past and the speech recognition result. (For example, a number specifying the voice recognition means 102 and the voice recognition result provided in the past, or the number of times the voice recognition means 102 has been provided to the in-vehicle information terminal 200 for each voice recognition means 102) is stored.
[0036]
FIG. 2 is an example of a data table of the history database 105 described above. In the history database 105, a number 105a, a recognition result generation date and time 105b, a user ID 105c for specifying a user using the present voice recognition system, and a selection which is an example of statistical information for specifying the voice recognition unit 102 selected by the user of the user ID 105c. The result number 105d and the content 105e of the voice information on which the voice recognition has been performed are stored. In the history database 105, a user ID 105c assigned to a user who uses each of the in-vehicle information terminals 200 and a selection result are generated in order to generate statistical information for specifying an optimal voice recognition unit 102 and a voice recognition result. The number 105d is associated with the number 105d.
[0037]
The terminal database 106 stores information on the in-vehicle information terminal 200 that uses the voice recognition system.
[0038]
FIG. 3 is an example of a data table of the terminal database 106 described above. The terminal database 106 includes a number 106a, a user ID 106b, a terminal number 106c for identifying the model of the in-vehicle information terminal 200, a model number 106d of the CPU, a main memory capacity 106e, a hard disk device presence / absence 106f, a hard disk device capacity 106g, and the like. Information indicating performance such as processing capacity and storage capacity of the in-vehicle information terminal 200 is stored in association with each in-vehicle information terminal 200 of each user.
[0039]
The center server 100 adjusts the data amount included in the optimal voice recognition means 102 provided to each in-vehicle information terminal 200 based on the information in the terminal database 106.
[0040]
The communication means 107 is provided for communicating with the in-vehicle information terminal 200 of each user via the network 300. The communication unit 107 acquires voice information transmitted from the vehicle information terminal 200. In addition, the communication unit 107 provides the voice recognition unit 102 that is optimal for the in-vehicle information terminal 200 among the plurality in a downloadable state to the in-vehicle information terminal 200 of the user.
[0041]
<Processing flowchart>
Next, a speech recognition process by the speech recognition system according to the present embodiment will be described using a flowchart.
[0042]
FIG. 4 is a flowchart illustrating a voice recognition process of the in-vehicle information terminal 200 by communication with the center server 100 in the voice recognition system.
[0043]
First, in the in-vehicle information terminal 200, the voice input unit 205 obtains voice generated by the user such as a user's operation instruction as voice information (Step 101 in FIG. 4, hereinafter abbreviated as S101).
[0044]
The voice information obtained by the voice input means 205 is input to the voice recognition means 202 previously obtained from the center server 100 or the voice recognition means 202 provided in the vehicle-mounted information terminal 200 in advance (S102). The voice recognition unit 202 that has received the voice information performs voice recognition processing using the voice recognition engine 203 and the voice recognition database 204, and generates a voice recognition result (S103). Then, the in-vehicle information terminal 200 outputs the voice recognition result from the voice output unit 207 such as a speaker in order to confirm whether or not the voice recognition result matches the voice uttered by the user.
[0045]
Then, the in-vehicle information terminal 200 checks with the user whether the voice recognition result corresponds to the user's operation instruction, that is, whether the user is satisfied with the voice recognition result ( S104). Here, whether or not the user is satisfied with the voice recognition result indicates whether or not the voice recognition result reflects an operation instruction intended by the user who has issued the voice information. At this time, in order to confirm whether or not the user is satisfied with the result of the voice recognition, the in-vehicle information terminal 200 uses the voice input unit 205 or a user input from the input unit such as a keyboard (not shown). Accept.
[0046]
In step 104, if the user inputs that the user is satisfied with the result of the voice recognition, the in-vehicle information terminal 200 returns to step 102 and returns to step 102 to receive the next voice information. stand by.
[0047]
If the user is not satisfied with the result of the voice recognition, the in-vehicle information terminal 200 transmits the voice information to the center server 100 via the network 300 by the communication means 209 (S105).
[0048]
The communication means 107 of the center server 100 receives the voice information transmitted from the vehicle information terminal 200 via the network 300 (S106).
[0049]
The center server 100 performs voice recognition processing by the plurality of voice recognition units 102. First, each of the voice recognition units 102 extracts voice pattern information from the obtained voice information. Then, the voice recognition engine 103 generates a voice recognition result from the extracted voice pattern information and the reference voice pattern information stored in the voice recognition database 104 (S107).
[0050]
For example, in the present embodiment, it is assumed that center server 100 includes ten voice recognition units 102 as arbitrary number of voice recognition units 102. Then, based on the acquired voice information, seven of the ten voice recognition units 102 generate the voice recognition result “A”, and the three voice recognition units 102 generate the voice recognition result “B”. Is generated.
[0051]
Then, the center server 100 transmits the generated voice recognition result to the in-vehicle information terminal 200 (S108).
[0052]
As a method of providing the voice recognition result to be transmitted to the in-vehicle information terminal 200, a voice recognition result generated by the largest number of voice recognition units among a plurality of voice recognition results may be transmitted. For example, in the case of the generation example of the voice recognition result in step 107 described above, the voice recognition result “A” is provided to the on-vehicle information terminal 200.
[0053]
As a method of providing the voice recognition result, a voice recognition result generated by the voice recognition unit 102 having a history provided in the past to the same in-vehicle information terminal 200 may be transmitted based on the history database 105. . For example, in the case of the generation example of the voice recognition result in the above step 107, if the voice recognition result selected based on the past history is the voice recognition result “B”, the voice recognition result “B” is used. The information is provided to the in-vehicle information terminal 200.
[0054]
Further, as a method of providing the voice recognition result, for example, all of the plurality of voice recognition results generated by the plurality of voice recognition units may be provided to the in-vehicle information terminal 200. Further, as a method for providing the voice recognition result, a voice recognition result list displaying a plurality of voice recognition results generated by a plurality of voice recognition units may be provided. For example, based on the acquired voice information, seven of the ten voice recognition units 102 generate a voice recognition result “A”, and three voice recognition units 102 generate a voice recognition result “B”. Is generated, the voice recognition result and the number of voice recognition means 102 that generated the result are displayed in the voice recognition list.
[0055]
Then, the in-vehicle information terminal 200 acquires the voice recognition result provided by the center server 100 by the above-described various voice recognition result providing methods. The voice recognition result acquired by the in-vehicle information terminal 200 is set as the voice recognition result selected by the user (S109).
[0056]
The center server 100 stores information on the speech recognition result selected by the user in the history database 105 and updates the information in the history database 105. At this time, when the number of times of voice recognition processing for each in-vehicle information terminal 200 reaches a predetermined number, the center server 100 performs the processing of providing the voice recognition unit 102 to the in-vehicle information terminal 200. (S110).
[0057]
Then, the in-vehicle information terminal 200 downloads the voice recognition unit 102 provided by the center server 100 (S111). The providing process of the voice recognition unit 102 will be described later with reference to a flowchart shown in FIG.
[0058]
Next, in the present voice recognition system, a process of providing the voice recognition unit 102 of the center server 100 as the voice recognition unit 202 of the in-vehicle information terminal 200 will be described with reference to the flowchart of FIG. I do.
[0059]
The center server 100 determines whether or not the total number of voice recognition processes for the individual in-vehicle information terminals 200 in the center server 100 has reached a predetermined number (for example, a multiple of a constant M) (see FIG. 5). Step 201, hereinafter abbreviated as S201). At this time, if the total number of voice recognition processes for the individual in-vehicle information terminals 200 is not the predetermined number, the center server 100 ends this process.
[0060]
When the total number of times of voice recognition processing for each in-vehicle information terminal 200 is the above-mentioned predetermined number, the center server 100 refers to the history database 105. The center server 100 refers to the history database 105 storing the statistical information, and among the plurality of voice recognition means 102, generates the largest voice recognition result for the voice information from the vehicle-mounted information terminal 200. The selected voice recognition means 102 is determined to be the optimum voice recognition means 102 for the in-vehicle information terminal 200 (S202).
[0061]
For example, in the present embodiment, it is assumed that ten speech recognition units 102 provided in the center server 100 generate speech recognition results. At this time, the fact that “speech recognition means 102.1” of the ten speech recognition means 102 provided the largest number of speech recognition results to the in-vehicle information terminal 200 was reported by the center server 100 to the history database 105. Judge from information. Then, the center server 100 determines that the “voice recognition means 102.1” is the most suitable voice recognition means 102 for the in-vehicle information terminal 200.
[0062]
The center server 100 that has selected the optimal voice recognition means 102 acquires the processing capability and storage capacity of the terminal 200, information about the voice recognition means 202 provided in advance, and the like from the onboard information terminal 200. At this time, the center server 100 adjusts the optimal capacity of the speech recognition engine 103 and the speech recognition database 104 of the speech recognition means 102 according to the processing capacity and storage capacity of the on-board information terminal 200.
[0063]
Then, the center server 100 determines whether or not the voice recognition engine 203 of the in-vehicle information terminal 200 is the same as the voice recognition engine 103 of the optimum voice recognition means 102 from the information on the voice recognition means 202. (S204).
[0064]
If the voice recognition engine 203 of the in-vehicle information terminal 200 is not the same as the voice recognition engine 103 of the optimum voice recognition means 102, the optimum voice recognition engine 103 is provided so as to be downloaded to the in-vehicle information terminal 200. (S205).
[0065]
When the optimal speech recognition engine 103 is the same as the speech recognition engine 203 of the in-vehicle information terminal 200 and when the download of the optimal speech recognition engine 103 is completed, the center server 100 The storage capacity information is obtained (S206).
[0066]
Then, the center server 100 provides the voice recognition database 104 having the optimal capacity in a form that can be downloaded according to the information on the storage capacity of the on-vehicle information terminal 200 (S207).
[0067]
<Effects of Embodiment>
By realizing the speech recognition system according to the present embodiment, the following effects can be obtained.
[0068]
According to the voice recognition system according to the present embodiment, even when the voice recognition result in each in-vehicle information terminal does not follow the operation instruction of the user, a plurality of voice recognition units of the center server By referring to the voice recognition result, it is possible to obtain an accurate voice recognition result according to the operation instruction of the user.
[0069]
Further, according to the speech recognition system according to the present embodiment, it is possible to provide the most appropriate speech recognition means among the plurality of speech recognition means of the center server as the speech recognition means of the in-vehicle information terminal.
[0070]
Further, according to the speech recognition system according to the present embodiment, since the speech recognition processing is performed by the plurality of speech recognition units corresponding to the characteristics of the speech information of the user, an accurate speech recognition result can be obtained.
[0071]
<Modified example>
In the present embodiment, the speech recognition system of the present invention has mainly been described as an example of speech recognition processing for an in-vehicle information terminal. However, the present invention is not limited to this, and is widely implemented for other speech recognition systems. be able to.
[0072]
For example, the speech recognition system according to the present embodiment can be used for speech recognition processing in various other information terminals. At this time, the center server executes the voice recognition processing for the on-vehicle information terminal according to the present embodiment for the above various other information terminals, which are examples of the user terminal.
[0073]
Further, in the present voice recognition system, the process of providing the voice recognition means 102 is performed when the number of voice recognition processes performed by the center server 100 for each of the in-vehicle information terminals 200 reaches a predetermined number. The process of providing the speech recognition unit 102 according to the present embodiment is not limited to this. For example, the process of providing the voice recognition unit 102 to the on-vehicle information terminal 200 may be performed according to a user's operation instruction.
[0074]
Further, in the present voice recognition system, the information of the in-vehicle information terminal 200 is obtained from the in-vehicle information terminal 200; however, the present embodiment is not limited to this. For example, the center server 100 that has selected the optimal voice recognition unit 102 stores the processing capability and storage capacity of the in-vehicle information terminal 200 stored in the terminal database 106, such as information on the provided voice recognition unit 202. The information of the in-vehicle information terminal 200 may be acquired. In addition, the center server 100 adjusts the capacity of the speech recognition engine 103 and the speech recognition database 104 of the speech recognition unit 102 in accordance with the processing capacity and the storage capacity of the in-vehicle information terminal 200.
[0075]
Further, in the present voice recognition system, when the center server 100 provides the optimum voice recognition means 102 to the vehicle information terminal 200, the list of the optimum voice recognition means 102 to be provided is stored in the vehicle information terminal 200. May be provided. Then, the user may determine the voice recognition unit 102 to be provided with reference to the information of the optimum voice recognition unit 102.
[0076]
Further, in the present voice recognition system, the voice recognition means 102 provided to the in-vehicle information terminal 200 is provided in a downloadable state, but may be obtained from the in-vehicle information terminal 200 to the center server 100 side. . In the present voice recognition system, the voice recognition result may be obtained from the vehicle-mounted information terminal 200 to the center server 100 side.
[0077]
【The invention's effect】
ADVANTAGE OF THE INVENTION According to the speech recognition system of this invention, the outstanding effect that the speech recognition result in a speech recognition means improves can be acquired.
[Brief description of the drawings]
FIG. 1 is a schematic configuration diagram of a speech recognition system according to an embodiment of the present invention.
FIG. 2 is an example of a data table of a history database according to the embodiment.
FIG. 3 is an example of a data table of a terminal database according to the present embodiment.
FIG. 4 is an example of a flowchart of voice recognition processing by a center server according to the present embodiment.
FIG. 5 is an example of a flowchart of a process of providing voice recognition means from a center server to an in-vehicle information terminal according to the present embodiment.
[Explanation of symbols]
100 center server
101 CPU
102 Voice recognition means
103 Speech Recognition Engine
104 Speech Recognition Database
105 History database
105a number
105b Date and time of generation of recognition result
105c User ID
105d Selection result number
105e Contents
106 terminal database
106a number
106b User ID
106c Terminal number
106d CPU model number
106f Hard disk drive
106g hard disk drive capacity
107 communication means
200 In-vehicle information terminal
201 CPU
202 Voice recognition means
203 speech recognition engine
204 speech recognition database
205 voice input means
206 voice synthesis means
207 Audio output means
208 display means
209 Communication means
300 Network

Claims

A center server having a plurality of voice recognition means,
A user terminal that acquires an optimal speech recognition unit from a center server and performs a speech recognition process using the acquired speech recognition unit.

The user terminal,
Equipped with terminal voice recognition means,
The voice recognition system according to claim 1, wherein when the voice recognition means is obtained from the center server, the voice recognition means performs voice recognition processing in place of the terminal voice recognition means.

The center server is:
Further comprising audio information acquisition means for acquiring audio information from the user terminal,
The voice recognition system according to claim 1, wherein the plurality of voice recognition units perform voice recognition processing on the obtained voice information for each voice recognition unit.

In the center server,
Each voice recognition means generates a recognition result of the voice recognition process, and determines a voice recognition means to be provided to the user terminal based on the recognition result,
In the user terminal,
The voice recognition system according to claim 1, wherein a voice recognition unit provided by the center server is obtained.

In the center server,
Each of the voice recognition means generates a recognition result of the voice recognition processing,
The speech recognition system according to claim 1, wherein the recognition result is provided to the user terminal.

The center server is:
Further comprising a history database storing the history information on the voice recognition means or the recognition result obtained by the user terminal,
The voice recognition system according to claim 1, wherein a voice recognition unit or a recognition result to be provided to the user terminal is determined based on the history database.

The center server is:
Further comprising a terminal database storing user terminal information associated with each user terminal,
The voice recognition system according to claim 1, wherein a voice recognition unit to be provided to the user terminal is determined based on the terminal database.

The user terminal,
The speech recognition system according to claim 1, wherein the speech recognition system is an in-vehicle information terminal provided in a vehicle.

Voice information obtaining means for obtaining voice information from the user terminal;
A server device comprising: a plurality of voice recognition units for performing a voice recognition process on the obtained voice information for each voice recognition unit.

In the server device,
The server device according to claim 9, wherein each of the voice recognition units generates a recognition result of the voice recognition process, and determines an optimal voice recognition unit to be provided to the user terminal based on the recognition result.

In the server device,
Each of the voice recognition means generates a recognition result of the voice recognition processing,
The server device according to claim 9, wherein the recognition result is provided to the user terminal.

A user terminal that acquires an optimal voice recognition unit from a server device and performs a voice recognition process using the obtained voice recognition unit.

The user terminal,
Equipped with terminal voice recognition means,
13. The user terminal according to claim 12, wherein when the voice recognition unit is obtained from the server device, voice recognition processing is performed by the obtained voice recognition unit instead of the terminal voice recognition unit.

Obtaining audio information from the user terminal;
A speech recognition program that causes a computer to execute a plurality of speech recognition steps on the acquired speech information.

Obtaining voice recognition means from the server device;
Performing a voice recognition process by the obtained voice recognition means, and causing the computer to execute the voice recognition program for a user terminal.