JP2012088370A

JP2012088370A - Voice recognition system, voice recognition terminal and center

Info

Publication number: JP2012088370A
Application number: JP2010232516A
Authority: JP
Inventors: Takuya Mori; 卓也森; Tetsuya Hara; 哲也原
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2010-10-15
Filing date: 2010-10-15
Publication date: 2012-05-10

Abstract

PROBLEM TO BE SOLVED: To suppress unnecessary increase of the size of a recognition dictionary in a voice recognition terminal while expanding the recognition dictionary of the voice recognition terminal by using a recognition dictionary of a center.SOLUTION: Selecting a case in which a voice recognition terminal 1 failed in performing voice recognition of user's utterance voice by the own device, a voice recognition terminal 1 receives a word of the voice recognition result of the utterance voice and voice feature data for comparison of the word in the recognition dictionary of the center side from the center 2, and adds the received voice recognition feature data for comparison of the word to the recognition dictionary of the terminal side.

Description

本発明は、音声認識システム、音声認識端末、およびセンターに関するものである。 The present invention relates to a voice recognition system, a voice recognition terminal, and a center.

従来、音声認識端末とセンターにて音声認識を行う音声認識システムがある。例えば、特許文献１には、ユーザの発話音声を音声認識端末が認識できなかった場合に、発話音声の音声データがセンターに送信され、センターがこの音声データを用いて音声認識を行い、その認識結果を音声認識端末に送信する技術が記載されている。 Conventionally, there is a voice recognition system that performs voice recognition at a voice recognition terminal and a center. For example, in Patent Document 1, when a voice recognition terminal cannot recognize a user's uttered voice, voice data of the uttered voice is transmitted to the center, and the center performs voice recognition using the voice data, and the recognition is performed. A technique for transmitting a result to a voice recognition terminal is described.

また、特許文献２には、音声認識端末が、センターから認識辞書の提供を受けることで、自機の認識辞書を最新に保つ技術が記載されている。 Patent Document 2 describes a technology in which a speech recognition terminal keeps its own recognition dictionary up-to-date by receiving a recognition dictionary from the center.

特開２００４−１８４８５８号公報JP 2004-184858 A 特開２０００−１０５６８１号公報JP 2000-105681 A

特許文献１の技術では、音声認識端末の辞書が変化しないので、いつまで経ってもセンターに頼る頻度は変わらない。したがって、センターとの通信ができないような状況で、認識結果を得られないという事態になる可能性が高いままとなってしまう。 In the technique of Patent Document 1, since the dictionary of the voice recognition terminal does not change, the frequency of relying on the center does not change even after a long time. Therefore, there is a high possibility that the recognition result cannot be obtained in a situation where communication with the center is not possible.

一方、サーバの認識辞書は多くの場合サイズが大きいので、特許文献２のように、サーバの認識辞書全体を音声認識端末が受信して使用するのでは、音声認識端末の記憶領域が圧迫されてしまう。また、音声認識端末が大量の認識単語を持つことで、その中に使われない単語が多く含まれることになり、認識精度が低下する可能性もある。 On the other hand, since the server recognition dictionary is often large in size, as in Patent Document 2, if the voice recognition terminal receives and uses the entire server recognition dictionary, the storage area of the voice recognition terminal is under pressure. End up. In addition, since the speech recognition terminal has a large number of recognition words, many words that are not used are included in the speech recognition terminal, which may reduce the recognition accuracy.

本発明は上記点に鑑み、センターの認識辞書を利用して音声認識端末の認識辞書を拡充しつつも、音声認識端末における認識辞書のサイズの無駄な増大を抑えることを目的とする。 The present invention has been made in view of the above points, and an object thereof is to suppress a useless increase in the size of the recognition dictionary in the speech recognition terminal while expanding the recognition dictionary of the speech recognition terminal using the recognition dictionary in the center.

上記目的を達成するための請求項１に記載の発明は、音声認識端末（１）およびセンター（２）を備えた音声認識システムであって、前記音声認識端末（１）は、車載側認識辞書が記録された車載側認識辞書記憶部（１４）と、端末側制御回路部（１５）と、を備え、前記センター（２）は、センター側認識辞書が記録されたセンター側認識辞書記憶部（２２）と、センター側制御回路部（２３）とを備え、前記センター側認識辞書は、前記端末側認識辞書が有さない単語の比較用音声特徴データを有し、前記端末側制御回路部（１５）は、前記ユーザの発話音声に基づく音声特徴データを取得し（１１０）、取得した音声特徴データと前記車載側認識辞書中の各単語の比較用音声特徴データとの比較に基づいて、前記発話音声に相当する単語を抽出する端末側音声認識手段（１１０、１１５）と、前記端末側音声認識手段（１５ａ、１１０、１１５）が単語の抽出に失敗したことに基づいて、前記発話音声に基づく音声データを、問い合わせ音声データとして前記センター（２）に送信する問い合わせ送信手段（１３０）と、を備え、前記センター側制御回路部（２３）は、前記音声認識端末（１）から送信された前記問い合わせ音声データに基づく音声特徴データと、前記センター側認識辞書中の各単語の比較用音声特徴データとの比較に基づいて、前記発話音声に相当する単語を抽出するセンター側音声認識手段（２３ｂ）と、前記センター側音声認識手段（２３ｂ）が抽出した単語と、前記センター側認識辞書中の当該単語の比較用音声特徴データと、を含む認識結果を、前記音声認識端末（１）に送信する応答手段（２３ａ）と、を備え、更に前記端末側制御回路部（１５）は、前記センター（２）から受信した前記認識結果に含まれる比較用音声特徴データを、受信した前記認識結果に含まれる単語の比較用音声特徴データとして、前記端末側認識辞書に追加登録する辞書更新手段（１５０）を備えたことを特徴とする音声認識システムである。 In order to achieve the above object, an invention according to claim 1 is a speech recognition system including a speech recognition terminal (1) and a center (2), wherein the speech recognition terminal (1) is a vehicle-side recognition dictionary. And a terminal-side control circuit unit (15). The center (2) includes a center-side recognition dictionary storage unit (in which a center-side recognition dictionary is recorded). 22) and a center side control circuit unit (23), wherein the center side recognition dictionary has voice feature data for comparison of words that the terminal side recognition dictionary does not have, and the terminal side control circuit unit ( 15) acquires voice feature data based on the user's uttered voice (110), and based on the comparison between the acquired voice feature data and the voice feature data for comparison of each word in the vehicle-mounted side recognition dictionary, A single equivalent of speech Terminal-side speech recognition means (110, 115) for extracting utterances, and the terminal-side speech recognition means (15a, 110, 115) inquires about speech data based on the uttered speech based on the failure to extract words. Inquiry transmission means (130) for transmitting to the center (2) as voice data, and the center side control circuit unit (23) is based on the inquiry voice data transmitted from the voice recognition terminal (1). Center-side speech recognition means (23b) for extracting a word corresponding to the uttered speech based on comparison between the speech feature data and comparison speech feature data for each word in the center-side recognition dictionary; A recognition result including the word extracted by the voice recognition means (23b) and the comparison voice feature data of the word in the center side recognition dictionary, Response means (23a) for transmitting to the voice recognition terminal (1), and the terminal-side control circuit section (15) further includes a comparison voice feature included in the recognition result received from the center (2). A speech recognition system comprising dictionary updating means (150) for additionally registering data in the terminal side recognition dictionary as comparison voice feature data for words included in the received recognition result.

このように、音声認識端末（１）は、ユーザの発話音声を自機で音声認識することに失敗した場合を選んで、センター（２）から当該発話音声の音声認識結果の単語と共に、センター側認識辞書における当該単語の比較用音声特徴データを受信し、受信した当該単語の比較用音声特徴データを端末側認識辞書に追加する。 As described above, the voice recognition terminal (1) selects the case where the user's speech voice has failed to be recognized by the user's own device, and from the center (2), along with the word of the voice recognition result of the speech voice, The comparison voice feature data of the word in the recognition dictionary is received, and the received comparison voice feature data of the word is added to the terminal side recognition dictionary.

このようにすることで、少なくとも１回は音声認識端末で音声認識することが必要になった単語の比較用音声特徴データを選択的に音声認識端末に登録することになる。そのような単語を再度認識する必要が発生する可能性は、一度も認識する必要が発生していない単語よりは高いので、当該単語の比較用音声特徴データも、無駄になる可能性が比較的低い。したがって、センターの認識辞書を利用して音声認識端末の認識辞書を拡充しつつも、音声認識端末における認識辞書のサイズの増大量を低減することができる。 By doing in this way, the comparison voice feature data of the words that need to be recognized by the speech recognition terminal at least once is selectively registered in the speech recognition terminal. The possibility that such a word needs to be recognized again is higher than a word that does not need to be recognized once, so the comparative voice feature data of the word is relatively likely to be wasted. Low. Therefore, it is possible to reduce the increase in the size of the recognition dictionary in the speech recognition terminal while expanding the recognition dictionary of the speech recognition terminal using the center recognition dictionary.

また、請求項２に記載の発明は、センター（２）と通信する音声認識端末であって、車載側認識辞書が記録された車載側認識辞書記憶部（１４）と、端末側制御回路部（１５）と、を備え、前記端末側制御回路部（１５）は、前記ユーザの発話音声に基づく音声特徴データを取得し（１１０）、取得した音声特徴データと前記車載側認識辞書中の各単語の比較用音声特徴データとの比較に基づいて、前記発話音声に相当する単語を抽出する端末側音声認識手段（１１０、１１５）と、前記端末側音声認識手段（１５ａ、１１０、１１５）が単語の抽出に失敗したことに基づいて、前記発話音声に基づく音声データを、問い合わせ音声データとして前記センター（２）に送信する問い合わせ送信手段（１３０）と、前記センター（２）が、前記音声認識端末（１）から送信された前記問い合わせ音声データに基づく音声特徴データと、センター側認識辞書中の各単語の比較用音声特徴データとの比較に基づいて、発話音声に相当する単語を抽出し、抽出した単語と、前記センター側認識辞書中の当該単語の比較用音声特徴データと、を含む認識結果を、当該音声認識端末に送信したとき、前記センター（２）から受信した前記認識結果に含まれる比較用音声特徴データを、受信した前記認識結果に含まれる単語の比較用音声特徴データとして、前記端末側認識辞書に追加登録する辞書更新手段（１５０）と、を備えたことを特徴とする音声認識端末である。このように、音声認識システムの発明の特徴は、音声認識端末の発明の特徴としても捉えることができる。 The invention described in claim 2 is a voice recognition terminal that communicates with the center (2), and includes a vehicle-side recognition dictionary storage unit (14) in which a vehicle-side recognition dictionary is recorded, and a terminal-side control circuit unit ( 15), and the terminal side control circuit unit (15) acquires voice feature data based on the user's uttered voice (110), and acquires the acquired voice feature data and each word in the in-vehicle side recognition dictionary Terminal-side speech recognition means (110, 115) for extracting a word corresponding to the uttered speech based on comparison with the comparison speech feature data, and the terminal-side speech recognition means (15a, 110, 115) Inquiry transmission means (130) for transmitting voice data based on the uttered voice to the center (2) as inquiry voice data based on the failure to extract the voice, and the center (2) Based on the comparison between the voice feature data based on the inquiry voice data transmitted from the voice recognition terminal (1) and the voice feature data for comparison of each word in the center side recognition dictionary, a word corresponding to the uttered voice is extracted. The recognition result received from the center (2) when the recognition result including the extracted word and the comparison speech feature data of the word in the center-side recognition dictionary is transmitted to the speech recognition terminal. Dictionary update means (150) for additionally registering the comparison speech feature data contained in the terminal-side recognition dictionary as comparison speech feature data for the words contained in the received recognition result. Is a voice recognition terminal. As described above, the features of the invention of the speech recognition system can also be understood as the features of the invention of the speech recognition terminal.

また、請求項３に記載の発明は、音声認識端末（１）と通信するセンターであって、センター側認識辞書が記録されたセンター側認識辞書記憶部（２２）と、センター側制御回路部（２３）とを備え、前記センター側制御回路部（２３）は、前記音声認識端末（１）が、ユーザの発話音声に基づく音声データを、問い合わせ音声データとして当該センター（２）に送信したとき、前記問い合わせ音声データに基づく音声特徴データと、前記センター側認識辞書中の各単語の比較用音声特徴データとの比較に基づいて、前記発話音声に相当する単語を抽出するセンター側音声認識手段（２３ｂ）と、前記センター側音声認識手段（２３ｂ）が抽出した単語と、前記センター側認識辞書中の当該単語の比較用音声特徴データと、を含む認識結果を、前記音声認識端末（１）に送信し、それにより、前記音声認識端末（１）に、前記認識結果に含まれる比較用音声特徴データを、前記認識結果に含まれる単語の比較用音声特徴データとして、前記端末側認識辞書に追加登録させる応答手段（２３ａ）と、を備えたセンターである。このように、音声認識システムの発明の特徴は、センターの発明の特徴としても捉えることができる。 The invention according to claim 3 is a center that communicates with the voice recognition terminal (1), and a center side recognition dictionary storage unit (22) in which a center side recognition dictionary is recorded, and a center side control circuit unit ( 23), and when the voice recognition terminal (1) transmits voice data based on the user's uttered voice to the center (2) as inquiry voice data, Center-side speech recognition means (23b) for extracting a word corresponding to the uttered speech based on comparison between the speech feature data based on the inquiry speech data and comparison speech feature data for each word in the center-side recognition dictionary. ), The word extracted by the center side speech recognition means (23b), and comparison voice feature data of the word in the center side recognition dictionary, The speech recognition terminal (1) transmits the comparison speech feature data included in the recognition result to the speech recognition terminal (1) as comparison speech feature data for a word included in the recognition result. And a response means (23a) for additionally registering in the terminal side recognition dictionary. As described above, the features of the invention of the voice recognition system can also be understood as the features of the invention of the center.

なお、上記および特許請求の範囲における括弧内の符号は、特許請求の範囲に記載された用語と後述の実施形態に記載される当該用語を例示する具体物等との対応関係を示すものである。 In addition, the code | symbol in the bracket | parenthesis in the said and the claim shows the correspondence of the term described in the claim, and the concrete thing etc. which illustrate the said term described in embodiment mentioned later. .

本発明の実施形態に係る音声認識システムの模式図である。It is a mimetic diagram of a voice recognition system concerning an embodiment of the present invention. 車載機１およびセンター２の構成図である。It is a block diagram of the vehicle equipment 1 and the center 2. FIG. 端末側制御回路部１５が実行する処理のフローチャートである。4 is a flowchart of processing executed by a terminal-side control circuit unit 15.

以下、本発明の一実施形態について説明する。図１に、本実施形態に係る音声認識システムを模式的に示す。この音声認識システムは、車両に搭載される車載機１（音声認識端末の一例に相当する）と、車両の外部の遠隔地（例えば建物内）に設置されたセンター２とを備えている。車載機１とセンター２の通信経路は、どのようなものでもよい。例えば、無線基地局、広域ネットワーク（例えばインターネット）等の通信経路を介して通信できるようになっていてもよいし、直接無線通信可能となっていてもよい。 Hereinafter, an embodiment of the present invention will be described. FIG. 1 schematically shows a speech recognition system according to the present embodiment. This voice recognition system includes an in-vehicle device 1 (corresponding to an example of a voice recognition terminal) mounted on a vehicle and a center 2 installed in a remote place (for example, in a building) outside the vehicle. The communication path between the in-vehicle device 1 and the center 2 may be any type. For example, communication may be enabled via a communication path such as a wireless base station or a wide area network (for example, the Internet), or direct wireless communication may be possible.

本実施形態では、車載機１は、車両内のユーザの発話音声の認識を試み、認識に失敗すると、その発話音声に基づく音声データを問い合わせデータとしてセンター２に送信し、センター２は、その問い合わせデータに基づく音声データに対して音声認識を行い、音声認識の結果得た単語と、音声認識辞書における当該単語の語彙データ（当該単語の文字列および当該単語の比較用音声特徴データ（後述する）を含む）とを、車載機１に送信する。そして車載機１は、受信した語彙データを、受信した単語の語彙データとして、自機の音声認識辞書に追加する。 In the present embodiment, the in-vehicle device 1 tries to recognize the user's utterance voice in the vehicle, and when the recognition fails, the in-vehicle device 1 transmits voice data based on the utterance voice to the center 2 as inquiry data. Speech recognition is performed on speech data based on the data, the word obtained as a result of speech recognition, and vocabulary data of the word in the speech recognition dictionary (character string of the word and speech feature data for comparison of the word (described later) Are transmitted to the in-vehicle device 1. The in-vehicle device 1 adds the received vocabulary data to the speech recognition dictionary of the own device as the vocabulary data of the received word.

以下、このような音声認識システムの構成および作動について説明する。図２に、車載機１およびセンター２の構成をブロック図で示す。この図に示す通り、車載機１は、入力装置１１、出力装置１２、端末側通信部１３、端末側認識辞書記憶部１４、および端末側制御回路部１５を有している。 Hereinafter, the configuration and operation of such a speech recognition system will be described. In FIG. 2, the structure of the vehicle equipment 1 and the center 2 is shown with a block diagram. As shown in this figure, the in-vehicle device 1 includes an input device 11, an output device 12, a terminal-side communication unit 13, a terminal-side recognition dictionary storage unit 14, and a terminal-side control circuit unit 15.

入力装置１１は、車両内のユーザが発した発話音声の入力を受け付け、受け付けた発話音声の音声信号を端末側制御回路部１５に出力するマイクロフォンである。出力装置１２は、画像を出力するディスプレイ、音声を出力するスピーカ等の、ユーザに情報を提供する装置である。 The input device 11 is a microphone that receives an input of uttered voice uttered by a user in the vehicle and outputs an audio signal of the received uttered voice to the terminal-side control circuit unit 15. The output device 12 is a device that provides information to the user, such as a display that outputs an image and a speaker that outputs sound.

端末側通信部１３は、センター２および他の通信装置と通信するための周知の無線通信デバイスである。端末側制御回路部１５は、この端末側通信部１３を用いてセンター２および他の通信装置と通信を行うことができる。他の通信装置としては、例えば、広域ネットワークに接続されたＷｅｂサーバ等がある。 The terminal side communication unit 13 is a known wireless communication device for communicating with the center 2 and other communication devices. The terminal-side control circuit unit 15 can communicate with the center 2 and other communication devices using the terminal-side communication unit 13. Examples of other communication devices include a Web server connected to a wide area network.

端末側認識辞書記憶部１４は、端末側認識辞書が記録された不揮発性の書き込み可能な記憶媒体（例えば、磁気記憶媒体、フラッシュメモリ）である。端末側認識辞書は、それぞれが１つの単語に対応する複数の語彙データを有し、各語彙データは、当該単語の比較用音声特徴データ、および、当該単語の文字列データを含んでいる。比較用音声特徴データは、発話音声に基づく音声特徴データと比較するためのデータである。 The terminal-side recognition dictionary storage unit 14 is a nonvolatile writable storage medium (for example, a magnetic storage medium or a flash memory) in which the terminal-side recognition dictionary is recorded. The terminal-side recognition dictionary has a plurality of vocabulary data each corresponding to one word, and each vocabulary data includes comparison voice feature data of the word and character string data of the word. The comparison voice feature data is data for comparison with the voice feature data based on the speech voice.

例えば、「コンビニエンスストア」という単語に対応する語彙データの比較用音声特徴データは、「コンビニエンスストア」と人が発話したときの音の典型的特徴を示すデータ（例えば、その音の特徴量、周波数スペクトルデータ、時系列にサンプリングした強度等）である。また、「コンビニエンスストア」という単語に対応する語彙データには、「コンビニエンスストア」という単語に対応する語彙データの比較用音声特徴データとして、「コンビニエンスストア」の他の呼称（「コンビニ」、「コンビニエンス」等）を発話したときの音の典型的特徴を示すデータ（当該音の特徴量でもよいし、周波数スペクトルでもよいし、時系列にサンプリングした強度でもよい）が、更に含まれていてもよい。また、「コンビニエンスストア」という単語に対応する語彙データの文字列データは、「コンビニエンスストア」という文字列である。 For example, the comparison voice feature data of the vocabulary data corresponding to the word “convenience store” is data indicating typical features of sound when a person utters “convenience store” (for example, the feature amount and frequency of the sound) Spectrum data, time-series sampled intensity, etc.). In addition, the vocabulary data corresponding to the word “convenience store” includes other names of “convenience store” (“convenience store”, “convenience store” as voice feature data for comparison of the vocabulary data corresponding to the word “convenience store”. ”Etc.) may be further included as data indicating typical characteristics of the sound when the utterance is uttered (may be the characteristic amount of the sound, the frequency spectrum, or the intensity sampled in time series). . Further, the character string data of the vocabulary data corresponding to the word “convenience store” is a character string “convenience store”.

端末側制御回路部１５は、ＣＰＵ、ＲＡＭ、ＲＯＭ、Ｉ／Ｏ等を備えたマイクロコンピュータによって実現され、ＣＰＵがＲＯＭに記録されるプログラムを実行することで、各種処理が実現される。処理の詳細については後述する。 The terminal-side control circuit unit 15 is realized by a microcomputer including a CPU, RAM, ROM, I / O, and the like, and various processes are realized by the CPU executing programs recorded in the ROM. Details of the processing will be described later.

センター２は、センター側通信部２１、センター側認識辞書記憶部２２、センター側制御回路部２３を備えている。センター側通信部２１は、車載機１と通信するための周知の通信インターフェース装置である。センター側制御回路部２３は、このセンター側通信部２１を用いて車載機１と通信を行うことができる。 The center 2 includes a center side communication unit 21, a center side recognition dictionary storage unit 22, and a center side control circuit unit 23. The center side communication unit 21 is a well-known communication interface device for communicating with the in-vehicle device 1. The center side control circuit unit 23 can communicate with the in-vehicle device 1 using the center side communication unit 21.

センター側認識辞書記憶部２２は、センター側認識辞書が記録された不揮発性の書き込み可能な記憶媒体（例えば、磁気記憶媒体、フラッシュメモリ）である。センター側認識辞書は、端末側認識辞書と同様、それぞれが１つの単語に対応する複数の語彙データを有し、各語彙データは、当該単語の比較用音声特徴データ、および、当該単語の文字列データを含んでいる。 The center-side recognition dictionary storage unit 22 is a nonvolatile writable storage medium (for example, a magnetic storage medium or a flash memory) in which the center-side recognition dictionary is recorded. Similar to the terminal-side recognition dictionary, the center-side recognition dictionary has a plurality of vocabulary data each corresponding to one word, and each vocabulary data includes voice comparison data for the word and a character string of the word Contains data.

ただし、センター側認識辞書は、端末側認識辞書が有するすべての単語の比較用音声特徴データを有すると共に、端末側認識辞書が有さない単語の比較用音声特徴データをも多数有する拡張認識辞書である。したがって、センター側認識辞書のデータサイズは、端末側認識辞書のデータサイズよりも遙かに大きい（例えば１００倍以上）。 However, the center-side recognition dictionary is an extended recognition dictionary that has comparison voice feature data for all words included in the terminal-side recognition dictionary and also includes a large number of comparison voice feature data for words that the terminal-side recognition dictionary does not have. is there. Therefore, the data size of the center side recognition dictionary is much larger than the data size of the terminal side recognition dictionary (for example, 100 times or more).

このセンター側認識辞書全体を車載機１にダウンロードしようとすると、記憶容量が圧迫され、場合によってはセンター側認識辞書全体を格納するような記憶容量がない場合もある。また、車載機１が大量の認識単語を持つことで、その中には車載機１で使われない単語が多く含まれることになり、認識精度が低下する恐れもある。 If an attempt is made to download the entire center side recognition dictionary to the in-vehicle device 1, the storage capacity is compressed, and in some cases, there is no storage capacity to store the entire center side recognition dictionary. In addition, since the in-vehicle device 1 has a large number of recognition words, many words that are not used in the in-vehicle device 1 are included in the in-vehicle device 1, and the recognition accuracy may be lowered.

次に、音声認識システムの作動について説明する。図２に示すように、端末側制御回路部１５は、その機能構成として、端末側音声認識部１５ａ、端末側処理制御部１５ｂ、辞書更新部１５ｃを有している。 Next, the operation of the voice recognition system will be described. As shown in FIG. 2, the terminal-side control circuit unit 15 includes a terminal-side speech recognition unit 15a, a terminal-side processing control unit 15b, and a dictionary update unit 15c as functional configurations.

端末側音声認識部１５ａは、入力装置１１から入力された発話音声の音声信号に対して、端末側認識辞書記憶部１４中の端末側認識辞書を用いて音声認識を行う。端末側処理制御部１５ｂは、端末側音声認識部１５ａの音声認識の結果に応じて、音声認識が成功すれば、認識結果の単語を出力装置１２に出力させる等の処理を行い、音声認識が失敗すれば、端末側通信部１３を用いてセンター２と通信することで、発話音声の音声認識結果をセンター２から受信する等の処理を行う。辞書更新部１５ｃは、端末側処理制御部１５ｂの処理結果に応じて、後述するようにセンター２から受信した語彙データを端末側認識辞書に追加記録する。 The terminal-side speech recognition unit 15 a performs speech recognition on the speech signal of the uttered speech input from the input device 11 using the terminal-side recognition dictionary in the terminal-side recognition dictionary storage unit 14. If the speech recognition is successful, the terminal-side processing control unit 15b performs processing such as outputting the recognition result word to the output device 12 according to the speech recognition result of the terminal-side speech recognition unit 15a. If it fails, the terminal side communication unit 13 is used to communicate with the center 2 to perform processing such as receiving the speech recognition result of the uttered speech from the center 2. The dictionary updating unit 15c additionally records the vocabulary data received from the center 2 in the terminal-side recognition dictionary, as will be described later, according to the processing result of the terminal-side processing control unit 15b.

なお、端末側制御回路部１５は、上記のような機能に加え、端末側通信部１３を用いてＷｅｂサーバにアクセスし、Ｗｅｂページのデータを受信し、受信したＷｅｂページのデータに従って、出力装置１２のディスプレイに当該Ｗｅｂページを表示させる等の処理も行うようになっている。 In addition to the functions as described above, the terminal-side control circuit unit 15 accesses the Web server using the terminal-side communication unit 13, receives Web page data, and outputs the output device according to the received Web page data. Processing such as displaying the Web page on 12 displays is also performed.

また、センター側制御回路部２３は、その機能構成として、センター側処理制御部２３ａ、センター側音声認識部２３ｂを有している。 Moreover, the center side control circuit part 23 has the center side process control part 23a and the center side speech recognition part 23b as the function structure.

センター側処理制御部２３ａは、センター側通信部２１を用いて、車載機１からのデータの受信、および、センター側音声認識部２３ｂの音声認識結果に応じた車載機１へのデータの送信を行う。センター側音声認識部２３ｂは、車載機１から受信した発話音声の音声特徴データに対して、センター側認識辞書記憶部２２中のセンター側認識辞書を用いて音声認識を行う。 The center-side processing control unit 23a uses the center-side communication unit 21 to receive data from the vehicle-mounted device 1 and transmit data to the vehicle-mounted device 1 according to the voice recognition result of the center-side voice recognition unit 23b. Do. The center side voice recognition unit 23 b performs voice recognition on the voice feature data of the uttered voice received from the in-vehicle device 1 using the center side recognition dictionary in the center side recognition dictionary storage unit 22.

図３に、端末側制御回路部１５が実行する処理のフローチャートを示す。以下、このフローチャートに従い、音声認識システムの具体的な作動について説明する。 FIG. 3 shows a flowchart of processing executed by the terminal-side control circuit unit 15. The specific operation of the speech recognition system will be described below according to this flowchart.

まず、端末側制御回路部１５が、端末側通信部１３を用いて検索サイト（例えばｇｏｏｇｌｅ（登録商標））のＷｅｂサーバにアクセスし、当該検索サイトのＷｅｂページのデータを受信し、受信したＷｅｂページのデータに従って、出力装置１２のディスプレイに当該Ｗｅｂページを表示させているとする。このとき、当該Ｗｅｂページには、検索ワードを入力するための文字入力欄（入力フォーム）が含まれているとする。 First, the terminal-side control circuit unit 15 accesses the Web server of a search site (for example, Google (registered trademark)) using the terminal-side communication unit 13, receives data of the Web page of the search site, and receives the received Web It is assumed that the Web page is displayed on the display of the output device 12 according to the page data. At this time, it is assumed that the Web page includes a character input field (input form) for inputting a search word.

このときユーザは、当該文字入力欄に入力する文字を音声認識させるため、車載機１の図示しない操作部に対して、音声認識を開始する旨の操作（例えば、音声認識開始ボタンの押下）を行い、認識してほしい音声を発話する。以下、発話された音声を発話音声という。 At this time, the user performs an operation for starting voice recognition (for example, pressing a voice recognition start button) to an operation unit (not shown) of the in-vehicle device 1 in order to make the character input in the character input field be voice-recognized. And speak the voice you want to recognize. Hereinafter, the uttered voice is referred to as uttered voice.

すると端末側制御回路部１５は、当該音声認識を開始する旨の操作が行われたことに基づいて、図３の処理を開始し、まずステップ１１０で、発話音声入力処理を実行する。具体的には、入力装置１１が出力した発話音声の音声信号を受け付け、受け付けた音声信号に基づく音声特徴データを取得する。ここで、音声特徴データは、例えば発話音声の音声信号の特徴を表すデータである。例えば、音声信号の特徴量のデータであってもよいし、音声信号の周波数スペクトルのデータであってもよいし、時系列にサンプリングした強度のデータであってもよいが、端末側認識辞書の比較用音声特徴データと同じ形式のデータであることが望ましい。 Then, the terminal-side control circuit unit 15 starts the process of FIG. 3 based on the operation for starting the voice recognition, and first executes the speech voice input process in step 110. Specifically, the voice signal of the uttered voice output from the input device 11 is received, and voice feature data based on the received voice signal is acquired. Here, the voice feature data is data representing the feature of the voice signal of the speech voice, for example. For example, it may be feature value data of an audio signal, frequency spectrum data of an audio signal, or intensity data sampled in time series. It is desirable that the data be in the same format as the comparison voice feature data.

なお、音声信号を受け付ける期間は、あらかじめ決められた一定期間でもよいし、ユーザが音声認識を終了する旨の操作を行うまでの期間でもよいし、取得した音声信号のレベルが閾値を下回る期間が所定期間以上続くまでの期間でもよい。 The period for receiving the audio signal may be a predetermined period, a period until the user performs an operation to end the voice recognition, or a period in which the level of the acquired audio signal is lower than the threshold value. It may be a period until it continues for a predetermined period or longer.

続いてステップ１１５では、取得した音声特徴データと、端末側認識辞書記憶部１４に記録されている端末側認識辞書中の各単語の比較用音声特徴データとを、周知の方法（例えば、隠れマルコフモデルによる方法で）で比較し、当該音声特徴データとの尤度（類似度）が最も高く、かつ所定の基準値よりも高い比較用音声特徴データを１つ特定する。 Subsequently, in step 115, the acquired speech feature data and the speech feature data for comparison of each word in the terminal side recognition dictionary recorded in the terminal side recognition dictionary storage unit 14 are converted into a known method (for example, hidden Markov By comparison with a model method), and specifies one piece of comparative voice feature data having the highest likelihood (similarity) with the voice feature data and higher than a predetermined reference value.

ただし、当該音声特徴データとの尤度が所定の基準値よりも高くなっている比較用音声特徴データが１つもない場合があるので、その場合は、音声認識が失敗したことになる。逆に当該音声特徴データとの尤度が所定の基準値よりも高くなっている比較用音声特徴データが１つでもあれば、音声認識が成功したことになる。 However, since there may be no comparison voice feature data having a likelihood higher than the predetermined reference value, the voice recognition has failed in that case. Conversely, if there is at least one comparison voice feature data whose likelihood with the voice feature data is higher than a predetermined reference value, the voice recognition is successful.

そして、音声認識が成功した場合には、抽出した比較用音声特徴データに対応する単語を、当該発話音声に相当する単語として抽出する。具体的には、抽出した比較用音声特徴データと同じ語彙データに含まれる文字列データを抽出する。これらステップ１１０、１１５の処理によって実現する機能が、端末側音声認識部１５ａの機能に相当する。 When the speech recognition is successful, a word corresponding to the extracted comparison speech feature data is extracted as a word corresponding to the uttered speech. Specifically, character string data included in the same vocabulary data as the extracted comparison voice feature data is extracted. The function realized by the processing of these steps 110 and 115 corresponds to the function of the terminal side voice recognition unit 15a.

続いてステップ１２０では、ステップ１１５の音声認識処理が成功したか失敗したかを判定し、成功したと判定した場合は、続いてステップ１２５に進み、抽出した単語の文字列（例えば「コンビニエンスストア」）を出力装置１２に出力させる。これにより、出力装置１２は、Ｗｅｂページの文字入力欄に、当該単語の文字列を表示する。この後、ユーザは、図示しない車載機１の操作部に対して、送信の操作を行うと、端末側制御回路部１５は、当該検索サイトのＷｅｂサーバに対し、検索ワードとして当該文字列を送信する。そしてＷｅｂサーバは、当該検索ワードにヒットする項目（例えば、他のＷｅｂサイト）のデータを車載機１に送信し、端末側制御回路部１５は、当該データを受信して出力装置１２に表示させる。ステップ１２０、１２５の処理によって実現する機能が、端末側処理制御部１５ｂの機能の一部に相当する。 Subsequently, in step 120, it is determined whether the speech recognition process in step 115 has succeeded or failed. If it is determined that the speech recognition process has succeeded, the process proceeds to step 125, and the character string of the extracted word (for example, “convenience store”). ) Is output to the output device 12. Thereby, the output device 12 displays the character string of the word in the character input field of the Web page. Thereafter, when the user performs a transmission operation on the operation unit of the in-vehicle device 1 (not shown), the terminal-side control circuit unit 15 transmits the character string as a search word to the Web server of the search site. To do. Then, the Web server transmits data of items that hit the search word (for example, other Web sites) to the in-vehicle device 1, and the terminal-side control circuit unit 15 receives the data and causes the output device 12 to display the data. . The function realized by the processing of steps 120 and 125 corresponds to part of the function of the terminal-side process control unit 15b.

一方、ステップ１２０で音声認識が失敗したと判定した場合、続いてステップ１３０に進み、上記発話音声に基づく音声データを問い合わせ音声データとし、端末側通信部１３を用いて、センター２に送信する。そして続いてステップ１３５で、当該問い合わせ音声データに対する応答をセンター２から受信するまで待つ。 On the other hand, if it is determined in step 120 that the voice recognition has failed, the process proceeds to step 130 where the voice data based on the uttered voice is used as inquiry voice data and transmitted to the center 2 using the terminal side communication unit 13. Then, in step 135, the process waits until a response to the inquiry voice data is received from the center 2.

なお、この問い合わせ音声データは、発話音声の特徴を表すデータであればよく、例えば、発話音声の音声信号の特徴量のデータであってもよいし、音声信号の周波数スペクトルのデータであってもよいし、時系列にサンプリングした強度のデータであってもよい。 The inquiry voice data may be data representing the characteristics of the speech voice. For example, the inquiry voice data may be data of the feature amount of the voice signal of the speech voice, or may be data of the frequency spectrum of the voice signal. It may be intensity data sampled in time series.

センター２のセンター側制御回路部２３は、センター側処理制御部２３ａの機能により、車載機１から上記のように送信された問い合わせ音声データを、センター側通信部２１を介して受信する。 The center side control circuit unit 23 of the center 2 receives the inquiry voice data transmitted as described above from the vehicle-mounted device 1 through the center side communication unit 21 by the function of the center side processing control unit 23a.

また、センター側制御回路部２３は、センター側音声認識部２３ｂの機能により、以下のような処理を行う。まず、受信した問い合わせ音声データに基づいて音声特徴データを作成する。作成する音声特徴データは、発話音声の音声信号の特徴量のデータであってもよいし、発話音声の音声信号の周波数スペクトルのデータであってもよいし、発話音声を時系列にサンプリングした強度のデータであってもよいが、センター側認識辞書の比較用音声特徴データと同じ形式のデータであることが望ましい。問い合わせ音声データがセンター側認識辞書の比較用音声特徴データと同じ形式であれば、それをそのまま音声特徴データとしてもよい。 Moreover, the center side control circuit part 23 performs the following processes by the function of the center side voice recognition part 23b. First, voice feature data is created based on the received inquiry voice data. The voice feature data to be created may be data of the feature amount of the voice signal of the utterance voice, data of the frequency spectrum of the voice signal of the utterance voice, or intensity obtained by sampling the utterance voice in time series However, it is desirable that the data be in the same format as the comparison voice feature data in the center side recognition dictionary. If the inquiry voice data has the same format as the comparison voice feature data in the center recognition dictionary, it may be used as the voice feature data as it is.

そして、作成した音声特徴データと、センター側認識辞書記憶部２２に記録されているセンター側認識辞書中の各単語の比較用音声特徴データとを、周知の方法（例えば、隠れマルコフモデルによる方法で）で比較し、当該音声特徴データとの尤度（類似度）が最も高く、かつ所定の基準値よりも高い比較用音声特徴データを１つ特定する。 Then, the created voice feature data and the voice feature data for comparison of each word in the center side recognition dictionary recorded in the center side recognition dictionary storage unit 22 are obtained by a known method (for example, a method using a hidden Markov model). ) To identify one piece of comparative voice feature data having the highest likelihood (similarity) with the voice feature data and higher than a predetermined reference value.

ただし、図３のステップ１１５の処理と同様、当該音声特徴データとの尤度が所定の基準値よりも高くなっている比較用音声特徴データが１つもない場合があるので、その場合は、音声認識が失敗したことになる。逆に当該音声特徴データとの尤度が所定の基準値よりも高くなっている比較用音声特徴データが１つでもあれば、音声認識が成功したことになる。 However, as in the process of step 115 in FIG. 3, there may be no comparison voice feature data whose likelihood with the voice feature data is higher than a predetermined reference value. Recognition has failed. Conversely, if there is at least one comparison voice feature data whose likelihood with the voice feature data is higher than a predetermined reference value, the voice recognition is successful.

そして、音声認識が成功した場合には、抽出した比較用音声特徴データに対応する単語を、当該発話音声に相当する単語として抽出する。具体的には、抽出した比較用音声特徴データを含む語彙データを抽出する。センター側制御回路部２３は、センター側音声認識部２３ｂの機能を実現するこのような処理を実行することで、センター側音声認識手段の一例として機能する。 When the speech recognition is successful, a word corresponding to the extracted comparison speech feature data is extracted as a word corresponding to the uttered speech. Specifically, vocabulary data including the extracted comparison voice feature data is extracted. The center-side control circuit unit 23 functions as an example of a center-side speech recognition unit by executing such processing that realizes the function of the center-side speech recognition unit 23b.

また、センター側制御回路部２３は、センター側処理制御部２３ａの機能により、音声認識部２３ｂの機能によって単語の抽出に成功したか失敗したかを判定する。 Further, the center-side control circuit unit 23 determines whether the word extraction has succeeded or failed by the function of the voice recognition unit 23b by the function of the center-side processing control unit 23a.

そして、成功したと判定した場合、センター側音声認識部２３ｂの機能によって抽出された語彙データを含む認識結果を、センター側通信部２１を用いて、車載機１に送信する。この語彙データには、当該発話音声に相当する単語の文字列と、当該単語の比較用音声データが含まれている。 And when it determines with having succeeded, the recognition result containing the vocabulary data extracted by the function of the center side speech recognition part 23b is transmitted to the vehicle equipment 1 using the center side communication part 21. FIG. The vocabulary data includes a character string of a word corresponding to the uttered voice and comparison voice data of the word.

一方、単語の抽出に失敗したと判定した場合、失敗したことを示す失敗データを含む認識結果を、センター側通信部２１を用いて、車載機１に送信する。センター側制御回路部２３は、センター側処理制御部２３ａの機能を実現するこのような処理を実行することで、応答手段の一例として機能する。 On the other hand, when it is determined that the word extraction has failed, the recognition result including failure data indicating the failure is transmitted to the in-vehicle device 1 using the center side communication unit 21. The center-side control circuit unit 23 functions as an example of a response unit by executing such a process that realizes the function of the center-side process control unit 23a.

また、端末側制御回路部１５は、ステップ１３５において、端末側通信部１３ｂを用いて、センター２から、上記問い合わせ音声データの応答として上記認識結果を受信すると、続いてステップ１４０に進む。 In step 135, the terminal-side control circuit unit 15 receives the recognition result as a response to the inquiry voice data from the center 2 using the terminal-side communication unit 13b, and then proceeds to step 140.

ステップ１４０では、受信した認識結果の内容に基づいて、センター２において音声認識が成功したか失敗したかを判定する。具体的には、認識結果が語彙データを含んでいれば、音声認識が成功したと判定し、認識結果が失敗データを含んでいれば音声認識が失敗したと判定する。失敗したと判定した場合は、ステップ１４５に進み、音声認識に失敗した旨の情報を出力装置１２に出力させ、図３の処理を終了する。ステップ１３０、１３５、１４５の処理によって、端末側処理制御部１５ｂの機能の一部が実現する。 In step 140, it is determined whether the voice recognition has succeeded or failed in the center 2 based on the content of the received recognition result. Specifically, if the recognition result includes vocabulary data, it is determined that the speech recognition has succeeded, and if the recognition result includes failure data, it is determined that the speech recognition has failed. When it determines with having failed, it progresses to step 145, the information to the effect that speech recognition has failed is output to the output device 12, and the process of FIG. 3 is complete | finished. Part of the functions of the terminal-side process control unit 15b is realized by the processes of steps 130, 135, and 145.

成功したと判定した場合は、ステップ１５０に進み、受信した認識結果中の語彙データを、端末側認識辞書記憶部１４の端末側認識辞書に追加登録する。つまり、センター２から受信した認識結果に含まれる比較用音声特徴データが、当該認識結果に含まれる単語の比較用音声特徴データとして、端末側認識辞書に追加登録されることになる。この追加登録される語彙データは、車載機１側で認識できなった語彙のデータなので、ほとんどの場合、端末側認識辞書に含まれていなかった単語の語彙データである。このステップ１５０の処理によって、辞書更新部１５ｃの機能が実現する。 If it is determined that the process has succeeded, the process proceeds to step 150, and the vocabulary data in the received recognition result is additionally registered in the terminal-side recognition dictionary of the terminal-side recognition dictionary storage unit 14. That is, the comparison voice feature data included in the recognition result received from the center 2 is additionally registered in the terminal side recognition dictionary as the comparison voice feature data of the word included in the recognition result. The additional registered vocabulary data is vocabulary data that cannot be recognized on the in-vehicle device 1 side, and in most cases, is vocabulary data of words that are not included in the terminal-side recognition dictionary. By the processing in step 150, the function of the dictionary update unit 15c is realized.

更にステップ１６０では、受信した認識結果に含まれる語彙データ中の単語の文字列に基づく情報を出力装置１２に出力させる。具体的には、ステップ１２５と同様、当該単語の文字列（例えば「コンビニエンスストア」）を出力装置１２に出力させる。これにより、出力装置１２は、Ｗｅｂページの文字入力欄に、当該単語の文字列を表示する。この後、ユーザは、図示しない車載機１の操作部に対して、送信の操作を行うと、端末側制御回路部１５は、当該検索サイトのＷｅｂサーバに対し、検索ワードとして当該文字列を送信する。そしてＷｅｂサーバは、当該検索ワードにヒットする項目（例えば、他のＷｅｂサイト）のデータを車載機１に送信し、端末側制御回路部１５は、当該データを受信して出力装置１２に表示させる。ステップ１６０の処理によって実現する機能が、端末側処理制御部１５ｂの機能の一部に相当する。 In step 160, the output device 12 is caused to output information based on the character string of the word in the vocabulary data included in the received recognition result. Specifically, as in step 125, the character string of the word (for example, “convenience store”) is output to the output device 12. Thereby, the output device 12 displays the character string of the word in the character input field of the Web page. Thereafter, when the user performs a transmission operation on the operation unit of the in-vehicle device 1 (not shown), the terminal-side control circuit unit 15 transmits the character string as a search word to the Web server of the search site. To do. Then, the Web server transmits data of items that hit the search word (for example, other Web sites) to the in-vehicle device 1, and the terminal-side control circuit unit 15 receives the data and causes the output device 12 to display the data. . The function realized by the process of step 160 corresponds to a part of the function of the terminal-side process control unit 15b.

以上説明した通り、車載機１は、ユーザの発話音声を自機で音声認識することに失敗した場合を選んで、センター２から当該発話音声の音声認識結果の単語と共に、センター側認識辞書における当該単語の比較用音声特徴データを受信し、受信した当該単語の比較用音声特徴データを端末側認識辞書に追加する。 As described above, the in-vehicle device 1 selects the case where the user's speech voice has failed to be recognized by the user's own device, along with the word of the speech recognition result of the speech sound from the center 2, in the center side recognition dictionary. The comparison voice feature data for the word is received, and the received comparison voice feature data for the word is added to the terminal side recognition dictionary.

なお、上記実施形態では、端末側制御回路部１５が、図３のステップ１１０、１１５を実行することで端末側音声認識手段の一例として機能し、ステップ１３０を実行することで問い合わせ送信手段の一例として機能し、ステップ１５０を実行することで辞書更新手段の一例として機能する。
（他の実施形態）
以上、本発明の実施形態について説明したが、本発明の範囲は、上記実施形態のみに限定されるものではなく、本発明の各発明特定事項の機能を実現し得る種々の形態を包含するものである。 In the above embodiment, the terminal-side control circuit unit 15 functions as an example of the terminal-side voice recognition unit by executing steps 110 and 115 of FIG. By executing step 150, it functions as an example of a dictionary updating unit.
(Other embodiments)
As mentioned above, although embodiment of this invention was described, the scope of the present invention is not limited only to the said embodiment, The various form which can implement | achieve the function of each invention specific matter of this invention is included. It is.

例えば、上記実施形態では、発話音声に相当する単語として抽出された単語は、検索サイトに入力する検索ワードであったが、発話音声に相当する単語として抽出された単語は、必ずしもこのようなものに限らない。例えば、車載機１が、複数の施設の名称と所在位置の対応関係を含む地図データを有し、現在地から目的地までの誘導経路を算出して案内するナビゲーション装置である場合、ユーザが目的地の施設の名称を発話音声として発し、ステップ１２５、１６０では、発話音声に相当する単語として抽出された単語の文字列を用いて、地図データから目的地を検索し、その検索結果を出力装置１２に出力させるようになっていてもよい。つまり、ステップ１２５、１６０では、検索結果の単語を用いた処理結果を出力装置１２に出力させるようになっていてもよい。つまり、出力装置１２は、検索結果の単語に基づく表示であれば、どのような表示を行うようになっていてもよい。なお、目的地の名称の発話音声が車載機１で音声認識に失敗し、センター２で音声認識に成功した場合、車載機１がセンター２から語彙データを受信して端末側認識辞書に追加登録することになる。このように、新たに車載機１に語彙データが追加されれば有益な目的地名称としては、車載機１の製造後、新たに新設されたため、上述の対応関係のデータには含まれていない施設の名称が考えられる。このような場合、車載機１は、センター２から、当該名称の施設の位置データを更に取得して、上述の対応関係のデータに追加登録するようになっていてもよい。 For example, in the above embodiment, the word extracted as the word corresponding to the uttered voice is the search word input to the search site, but the word extracted as the word corresponding to the uttered voice is not necessarily such. Not limited to. For example, when the in-vehicle device 1 is a navigation device that has map data including correspondences between names and locations of a plurality of facilities and calculates and guides a guidance route from the current location to the destination, The name of the facility is uttered as uttered speech, and in steps 125 and 160, the destination is searched from the map data using the word character string extracted as the word corresponding to the uttered speech, and the search result is output to the output device 12. You may be made to output to. That is, in steps 125 and 160, the processing result using the search result word may be output to the output device 12. That is, the output device 12 may perform any display as long as the display is based on the search result word. If the utterance voice of the destination name fails in the in-vehicle device 1 and the speech recognition is successful in the center 2, the in-vehicle device 1 receives the vocabulary data from the center 2 and additionally registers it in the terminal side recognition dictionary. Will do. As described above, if the vocabulary data is newly added to the vehicle-mounted device 1, a useful destination name is newly included after the vehicle-mounted device 1 is manufactured, and thus is not included in the above-described correspondence data. The name of the facility can be considered. In such a case, the in-vehicle device 1 may further acquire the location data of the facility having the name from the center 2 and additionally register the location data of the above-described correspondence relationship.

また、上記の実施形態において、制御回路１７がプログラムを実行することで実現している各機能は、それらの機能を有するハードウェア（例えば回路構成をプログラムすることが可能なＦＰＧＡ）を用いて実現するようになっていてもよい。 In the above embodiment, each function realized by the control circuit 17 executing the program is realized by using hardware having those functions (for example, an FPGA capable of programming the circuit configuration). You may come to do.

また、上記実施形態では、端末側制御回路部１５が、端末側音声認識部１５ａ、端末側処理制御部１５ｂ、辞書更新部１５ｃの機能を実現するようになっていたが、端末側音声認識部１５ａ、端末側処理制御部１５ｂ、辞書更新部１５ｃが別々のＩＣとして実現されていてもよい。 In the above embodiment, the terminal-side control circuit unit 15 has realized the functions of the terminal-side voice recognition unit 15a, the terminal-side processing control unit 15b, and the dictionary update unit 15c. 15a, the terminal-side process control unit 15b, and the dictionary update unit 15c may be realized as separate ICs.

同様に、上記実施形態では、センター側制御回路部２３は、センター側処理制御部２３ａ、センター側音声認識部２３ｂの機能を実現するようになっていたが、センター側処理制御部２３ａ、センター側音声認識部２３ｂが別々のＩＣとして実現されていてもよい。 Similarly, in the above embodiment, the center-side control circuit unit 23 is configured to realize the functions of the center-side processing control unit 23a and the center-side voice recognition unit 23b. The voice recognition unit 23b may be realized as a separate IC.

また、上記実施形態では、音声認識端末の一例として車載機１を用いているが、音声認識端末は、必ずしも車載用の装置でなくてもよい。例えば、ユーザが携帯する端末でもよい。 Moreover, in the said embodiment, although the vehicle equipment 1 is used as an example of a voice recognition terminal, a voice recognition terminal does not necessarily need to be a vehicle-mounted apparatus. For example, a terminal carried by the user may be used.

１車載機
２センター
１１入力装置
１２出力装置
１３端末側通信部
１４端末側認識辞書記憶部
１５端末側制御回路部
１５ａ端末側音声認識部
１５ｂ端末側処理制御部
１５ｃ辞書更新部
２１センター側通信部
２２センター側認識辞書記憶部
２３センター側制御回路部
２３ａセンター側処理制御部
２３ｂセンター側音声認識部 DESCRIPTION OF SYMBOLS 1 Vehicle equipment 2 Center 11 Input device 12 Output device 13 Terminal side communication part 14 Terminal side recognition dictionary memory | storage part 15 Terminal side control circuit part 15a Terminal side speech recognition part 15b Terminal side processing control part 15c Dictionary update part 21 Center side communication part 22 Center side recognition dictionary storage unit 23 Center side control circuit unit 23a Center side processing control unit 23b Center side speech recognition unit

Claims

A speech recognition system comprising a speech recognition terminal (1) and a center (2),
The voice recognition terminal (1) includes a vehicle-side recognition dictionary storage unit (14) in which a vehicle-side recognition dictionary is recorded, and a terminal-side control circuit unit (15).
The center (2) includes a center side recognition dictionary storage unit (22) in which a center side recognition dictionary is recorded, and a center side control circuit unit (23).
The center side recognition dictionary has voice feature data for comparison of words that the terminal side recognition dictionary does not have,
The terminal side control circuit unit (15)
Voice feature data based on the user's spoken voice is acquired (110), and based on a comparison between the acquired voice feature data and the voice feature data for comparison of each word in the vehicle-side recognition dictionary, it corresponds to the spoken voice Terminal-side speech recognition means (110, 115) for extracting words to be
Inquiry transmission means for transmitting voice data based on the uttered voice to the center (2) as inquiry voice data based on the fact that the terminal side voice recognition means (15a, 110, 115) has failed to extract a word. 130), and
The center side control circuit section (23)
Corresponds to the uttered speech based on a comparison between speech feature data based on the inquiry speech data transmitted from the speech recognition terminal (1) and comparison speech feature data for each word in the center-side recognition dictionary. Center side speech recognition means (23b) for extracting words;
Response means for transmitting a recognition result including the word extracted by the center side speech recognition means (23b) and comparison voice feature data of the word in the center side recognition dictionary to the voice recognition terminal (1). (23a)
Further, the terminal-side control circuit unit (15) uses the comparison voice feature data included in the recognition result received from the center (2) as comparison voice feature data of words included in the received recognition result. A speech recognition system comprising dictionary updating means (150) for additionally registering in the terminal side recognition dictionary.

A speech recognition terminal that communicates with the center (2),
An in-vehicle side recognition dictionary storage unit (14) in which an in-vehicle side recognition dictionary is recorded, and a terminal side control circuit unit (15),
The terminal side control circuit unit (15)
Voice feature data based on the user's spoken voice is acquired (110), and based on a comparison between the acquired voice feature data and the voice feature data for comparison of each word in the vehicle-side recognition dictionary, it corresponds to the spoken voice Terminal-side speech recognition means (110, 115) for extracting words to be
Inquiry transmission means for transmitting voice data based on the uttered voice to the center (2) as inquiry voice data based on the fact that the terminal side voice recognition means (15a, 110, 115) has failed to extract a word. 130),
Based on the comparison between the voice feature data based on the inquiry voice data transmitted from the voice recognition terminal (1) and the voice feature data for comparison of each word in the center side recognition dictionary, the center (2) When a word corresponding to the uttered voice is extracted, and a recognition result including the extracted word and voice feature data for comparison of the word in the center-side recognition dictionary is transmitted to the voice recognition terminal, the center ( Dictionary update means (150) for additionally registering the comparison speech feature data included in the recognition result received from 2) as the comparison speech feature data of the word included in the received recognition result in the terminal side recognition dictionary And a voice recognition terminal.

A center that communicates with the voice recognition terminal (1),
A center side recognition dictionary storage unit (22) in which a center side recognition dictionary is recorded, and a center side control circuit unit (23),
The center side control circuit section (23)
When the voice recognition terminal (1) transmits voice data based on the user's uttered voice to the center (2) as inquiry voice data, the voice feature data based on the inquiry voice data and the center side recognition dictionary Center-side speech recognition means (23b) for extracting a word corresponding to the uttered speech based on the comparison of each word with comparison speech feature data;
A recognition result including the word extracted by the center side speech recognition means (23b) and comparison voice feature data of the word in the center side recognition dictionary is transmitted to the speech recognition terminal (1), The response means for causing the voice recognition terminal (1) to additionally register the comparison voice feature data included in the recognition result as the comparison voice feature data of the word included in the recognition result in the terminal side recognition dictionary. (23a).