JP2008046260A

JP2008046260A - Voice recognition device

Info

Publication number: JP2008046260A
Application number: JP2006220448A
Authority: JP
Inventors: Takeshi Ono; 健大野
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2006-08-11
Filing date: 2006-08-11
Publication date: 2008-02-28
Anticipated expiration: 2026-08-11
Also published as: JP4967519B2

Abstract

PROBLEM TO BE SOLVED: To provide a voice recognition device which can register only the vocabularies actually used by a user as the target vocabularies for voice recognition. SOLUTION: The signal processing unit 1 has an external memory 15 storing legitimate names as the target vocabularies for voice recognition. It creates other synonymous words from the legitimate names through the signal processor 11, and registers them in the external memory 15 as the vocabularies for voice recognition after evaluating how often they are used. It registers only the synonymous words estimated to be actually used often as the target vocabularies for voice recognition. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、使用者が発話した音声を認識する音声認識装置に関する。 The present invention relates to a speech recognition apparatus that recognizes speech uttered by a user.

従来より、音声認識装置としては、下記の特許文献１や特許文献２に記載された技術が知られている。 Conventionally, as a speech recognition apparatus, techniques described in Patent Document 1 and Patent Document 2 below are known.

特許文献１には、辞書作成ルールを定義することによって、効率的でかつユーザ発話様式に適した辞書を作成する技術が記載されている。また、特許文献１には、施設の正式名称を入力し、入力された施設の正式名称を形態素に分割し、分割された形態素から地域名称及び施設の正式名称の一部を抽出し、抽出された地域名称と抽出された施設名称の一部とを連結単語で連結した単語を辞書用単語として登録することが記載されている。 Patent Document 1 describes a technique for creating a dictionary that is efficient and suitable for a user utterance style by defining a dictionary creation rule. Further, in Patent Document 1, the official name of the facility is input, the official name of the input facility is divided into morphemes, and the region name and a part of the official name of the facility are extracted and extracted from the divided morphemes. Registering a word obtained by concatenating a region name and a part of the extracted facility name with a concatenated word as a dictionary word.

特許文献２には、長い言葉も容易に確実に認識することを目的とし、使用者の発話に対する負荷低減や使用者の発話様式に適合した音声認識をすることが記載されている。また、特許文献２には、長い施設名称に区切りを入れて認識用単語を短く言い換え、この言い換え語を辞書登録して、使用者の言いよどみや発話単語を短縮化した音声を認識可能とすることが記載されている。
特開２００５−２０２１９８号公報特開２００１−０８３９８２号公報 Patent Document 2 describes that a long word is easily and surely recognized, and that the load on the user's utterance is reduced and the voice recognition adapted to the user's utterance style is performed. Also, in Patent Document 2, a long facility name is segmented to rephrase the recognition word, and the paraphrase word is registered in the dictionary so that the user can recognize the voice that shortens the stagnation of the user or the spoken word. Is described.
JP 2005-202198 A JP 2001/083982 A

しかしながら、上述した音声認識技術では、使用者が実際にその言い換え語を使用するかどうか分からないにも拘わらず、言い換え語を認識辞書に加えていたために、必要以上に認識語彙の数が多くなってしまい、その結果、認識率が低下してしまうという問題点があった。 However, in the speech recognition technology described above, the number of recognized vocabularies increases more than necessary because the paraphrased word is added to the recognition dictionary even though the user does not know whether to actually use the paraphrase. As a result, there is a problem that the recognition rate is lowered.

そこで、本発明は、上述した実情に鑑みて提案されたものであり、実際に使用者によって使用される語彙のみを音声認識対象語彙として登録することができる音声認識装置を提供することを目的とする。 Therefore, the present invention has been proposed in view of the above-described circumstances, and an object thereof is to provide a speech recognition apparatus that can register only a vocabulary actually used by a user as a speech recognition target vocabulary. To do.

本発明は、使用者から発せられた音声を認識する音声認識手段を備えた音声認識装置であって、正式名称を音声認識対象語彙として記憶した記憶手段と、記憶手段に記憶された正式名称から言い換え語を生成する言い換え語生成手段と、言い換え語生成手段によって生成された言い換え語の実使用度を評価する実使用度評価手段と、言い換え語生成手段によって生成された言い換え語を音声認識対象語彙として記憶手段に登録する登録手段とを有する。このような音声認識装置は、上述の課題を解決するために、登録手段により、実用度評価手段によって実使用度が高いと評価された言い換え語のみを音声認識対象語彙として登録する。 The present invention is a speech recognition apparatus including speech recognition means for recognizing speech uttered by a user, the storage means storing a formal name as a speech recognition target vocabulary, and the formal name stored in the storage means A paraphrase word generating means for generating a paraphrase word, an actual usage evaluation means for evaluating the actual usage of the paraphrase word generated by the paraphrase word generating means, and the paraphrase word generated by the paraphrase word generating means as a speech recognition target vocabulary Registration means for registering in the storage means. In order to solve the above-described problem, such a speech recognition apparatus registers only paraphrased words that have been evaluated as having high actual usage by the practicality evaluation unit by the registration unit as a speech recognition target vocabulary.

本発明に係る音声認識装置によれば、実使用度の高い言い換え語を音声認識対象語彙として登録するので、実際に使用者によって使用される語彙のみを音声認識対象語彙として登録することができ、必要以上に音声認識対象語彙が多くなってしまう問題がない。 According to the speech recognition apparatus according to the present invention, since the paraphrase word having high actual usage is registered as the speech recognition target vocabulary, only the vocabulary actually used by the user can be registered as the speech recognition target vocabulary. There is no problem that the vocabulary for speech recognition increases more than necessary.

以下、本発明の実施の形態について図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［第１実施形態］
本発明は、例えば図１に示すように構成された第１実施形態に係る音声認識装置に適用される。この音声認識装置は、信号処理ユニット１に、マイク２とタッチパネルディスプレイ３とスピーカ４と入力装置５とが接続されて構成されている。この音声認識装置は、正式名称に対する言い換え語を登録するに際して、使用者が実際に使用する度合い（実使用度）が高い言い換え語のみを登録するものである。 [First Embodiment]
The present invention is applied to, for example, the speech recognition apparatus according to the first embodiment configured as shown in FIG. This speech recognition apparatus is configured by connecting a signal processing unit 1 to a microphone 2, a touch panel display 3, a speaker 4, and an input device 5. This speech recognition apparatus registers only paraphrases that have a high degree of actual use (actual usage) by users when registering paraphrases for official names.

信号処理ユニット１は、信号処理装置１１に、Ａ／Ｄコンバータ１２とＤ／Ａコンバータ１３とアンプ１４と外部記憶装置（記憶手段）１５とが接続されている。 In the signal processing unit 1, an A / D converter 12, a D / A converter 13, an amplifier 14, and an external storage device (storage means) 15 are connected to a signal processing device 11.

信号処理装置１１には、マイク２で検出された使用者の音声信号がＡ／Ｄコンバータ１２を介して供給される。また、信号処理装置１１は、タッチパネルディスプレイ３に操作名称及び音声認識結果等を表示すると共に、タッチパネルディスプレイ３から使用者の操作入力信号を入力する。更に、信号処理装置１１は、各種情報を音声案内するために、Ｄ／Ａコンバータ１３及びアンプ１４を介して音声信号をスピーカ４に供給して、操作名称を選択することを命令する告知音声及び音声認識結果の告知音声をスピーカ４から放音させる。 The signal processing device 11 is supplied with the user's voice signal detected by the microphone 2 via the A / D converter 12. Further, the signal processing device 11 displays an operation name, a voice recognition result, and the like on the touch panel display 3 and inputs a user operation input signal from the touch panel display 3. Further, the signal processing device 11 supplies a voice signal to the speaker 4 via the D / A converter 13 and the amplifier 14 to give voice guidance of various information, and gives a notification voice for instructing to select an operation name. The announcement sound of the voice recognition result is emitted from the speaker 4.

信号処理装置１１は、ＣＰＵ（Central Processing Unit）２１及びメモリ２２からなる。信号処理装置１１は、メモリ２２を作業領域として使用して、ＣＰＵ２１によって、音声認識処理（音声認識手段）、言い換え語を生成する処理（言い換え語生成手段）、言い換え語の実使用度を評価する処理（実使用度評価手段）、言い換え語を登録する処理（登録手段）を行う。 The signal processing device 11 includes a CPU (Central Processing Unit) 21 and a memory 22. The signal processing apparatus 11 uses the memory 22 as a work area, and the CPU 21 evaluates speech recognition processing (speech recognition means), processing to generate paraphrase words (paraphrase word generation means), and actual usage of paraphrase words. Processing (actual usage evaluation means), processing to register paraphrases (registration means) is performed.

入力装置５は、音声認識を開始するに際して操作される発話スイッチ５ａ、信号処理ユニット１によって使用者が意図する音声とは異なる音声認識結果を生成した場合に音声認識結果を訂正するに際して操作される訂正スイッチ５ｂ（訂正指示手段）とを備えている。発話スイッチ５ａ及び訂正スイッチ５ｂが操作されると、当該操作は、信号処理装置１１によって検出される。また、訂正スイッチ５ｂを一定期間押し続けた場合、信号処理ユニット１は、使用者から発せられる音声による処理を途中で終了させる。 The input device 5 is operated when correcting the speech recognition result when a speech recognition result different from the speech intended by the user is generated by the speech switch 5a and the signal processing unit 1 which are operated when starting the speech recognition. And a correction switch 5b (correction instruction means). When the speech switch 5a and the correction switch 5b are operated, the operation is detected by the signal processing device 11. In addition, when the correction switch 5b is kept pressed for a certain period, the signal processing unit 1 terminates the processing by the voice emitted from the user halfway.

外部記憶装置１５は、施設等の正式名称情報、言い換え語情報である音声認識対象語彙及び当該施設等の位置情報を登録した音声認識対象語彙データベースと、操作時の操作名称を登録した操作名称データベースとを記憶している。例えば図２に示すように、信号処理ユニット１に対する操作名称である行き先、検索条件等の上位層の操作名称を登録した操作名称データベース３１と、行き先の下位層の住所、施設等の下位層の操作名称を登録した操作名称データベース３２と、当該操作名称データベース３１，３２の下位層に相当する正式名称、言い換え語を登録した音声認識対象語彙データベース３３とからなる。 The external storage device 15 is a speech recognition target vocabulary database in which formal name information of facilities, etc., speech recognition target vocabulary as paraphrase information and position information of the facilities are registered, and an operation name database in which operation names at the time of operation are registered. Is remembered. For example, as shown in FIG. 2, an operation name database 31 in which destination names, which are operation names for the signal processing unit 1, and operation names of higher layers such as search conditions are registered, addresses of lower layers of destinations, lower layers of facilities, etc. It consists of an operation name database 32 in which operation names are registered, and a speech recognition target vocabulary database 33 in which formal names and paraphrased words corresponding to lower layers of the operation name databases 31 and 32 are registered.

例えば使用者の行き先が「北海道大学」という施設名である場合、操作名称データベース３１に、最上位層の行き先、探索条件等の操作名称が登録され、操作名称データベース３２に、行き先の下位層の住所、施設等の操作名称が登録されているとすると、操作名称データベース３１から「行き先」の操作名称が選択され、操作名称データベース３２から「施設名」の操作名称が選択されることになる。また、音声認識対象語彙データベース３３は、操作名称データベース３１，３２に対する最下位層の施設名称「北海道大学」を登録しているものである。 For example, when the destination of the user is a facility name “Hokkaido University”, the operation name database 31 stores the operation name of the highest layer destination, search conditions, etc., and the operation name database 32 stores the lower layer of the destination. Assuming that operation names such as addresses and facilities are registered, the operation name “destination” is selected from the operation name database 31, and the operation name “facility name” is selected from the operation name database 32. The speech recognition target vocabulary database 33 is registered with the facility name “Hokkaido University” in the lowest layer for the operation name databases 31 and 32.

また、外部記憶装置１５には、正式名称、言い換え語及び操作名称データベースの一部を音声認識対象として登録した音声認識対象語彙データベースを記憶している。この音声認識対象語彙データベースは、信号処理装置１１によって書き換えられる。 Further, the external storage device 15 stores a speech recognition target vocabulary database in which a part of the formal name, paraphrase word, and operation name database is registered as a speech recognition target. This speech recognition target vocabulary database is rewritten by the signal processing device 11.

つぎに、上述の音声認識装置において、使用者が「北海道大学」という正式名称の施設に行きたい又は地図表示させたい場合に、「北大」という言い換え語が音声認識対象語彙データベースに登録されていない時の動作を説明する。 Next, in the speech recognition apparatus described above, when the user wants to go to the facility with the official name “Hokkaido University” or display the map, the paraphrase “Hokkaido” is not registered in the speech recognition target vocabulary database. The operation at the time will be described.

このような音声認識装置においては、発話スイッチ５ａが操作されて、図３（ａ）に示す音声入力用の音声メニュー画像４１をタッチパネルディスプレイ３に表示している時に、音声入力を受け付ける。この音声メニュー画像４１は、信号処理ユニット１による音声認識結果を表示する音声認識結果表示欄４５、操作名称を示すメニューリスト４６を含む。メニューリスト４６は、外部記憶装置１５の音声認識対象語彙データベースに登録されている行き先、検索条件等の操作名称である音声認識対象語彙を羅列している。 In such a voice recognition device, the voice input is accepted when the speech switch 5a is operated and the voice menu image 41 for voice input shown in FIG. The voice menu image 41 includes a voice recognition result display column 45 for displaying a voice recognition result by the signal processing unit 1 and a menu list 46 indicating operation names. The menu list 46 lists voice recognition target words that are operation names such as destinations and search conditions registered in the voice recognition target word database of the external storage device 15.

この音声メニュー画像４１を表示させている状態において、信号処理ユニット１は、図４の（１）のように使用者が「行き先」との操作名称を発話し、当該「行き先」との音声認識結果を得ると、信号処理ユニット１は、図４の（２）のように「行き先のコマンドをどうぞ」との告知音声をスピーカ４から放音させて、図３（ｂ）の複数の操作名称を示すメニューリスト４６を含む行き先選択メニュー画像４２をタッチパネルディスプレイ３に表示させる。 In the state where the voice menu image 41 is displayed, the signal processing unit 1 allows the user to speak the operation name “destination” as shown in (1) of FIG. When the result is obtained, the signal processing unit 1 emits an announcement voice “Please give destination command” from the speaker 4 as shown in (2) of FIG. Is displayed on the touch panel display 3.

図３（ｂ）の行き先選択メニュー画像４２を表示させている状態において、図４の（３）のように行き先選択メニュー画像４２のメニューリスト４６に含まれる「施設」という操作名称を使用者が発話し、信号処理ユニット１によって「施設」との音声認識結果を得た場合には、図４の（４）のように「施設名をどうぞ」との告知音声をスピーカ４から放音させて、図３（ｃ）に示す施設名入力画像４３を表示させる。 In the state where the destination selection menu image 42 of FIG. 3B is displayed, the user selects the operation name “facility” included in the menu list 46 of the destination selection menu image 42 as shown in FIG. When a speech recognition result of “facility” is obtained by utterance and signal processing unit 1, an announcement voice “Please name the facility” is emitted from speaker 4 as shown in FIG. The facility name input image 43 shown in FIG.

図３（ｃ）の施設名入力画像４３を表示させている状態において、図４の（５）のように「北大」と使用者が発話した場合、信号処理ユニット１は、外部記憶装置１５の音声認識対象語彙データベースには「北大」が音声認識対象語彙として登録されていないことから、当該「北大」に対する音声認識結果と最も近く音声認識対象語彙データベースに登録されている国分駅を選択して、図３（ｄ）に示すように、音声認識結果表示欄４５に音声認識結果「国分駅」を含む音声認識結果画像４４を表示する。この音声認識結果画像４４には、音声認識結果表示欄４５に含まれる音声認識結果が指し示す位置に行くための最適経路を探索するコマンドを発生させる「そこへ行く」ボタン４７及び音声認識結果が指し示す位置付近の地図を表示するコマンドを発生させる「地図を見る」ボタン４８とを含んでいる。 In the state where the facility name input image 43 of FIG. 3C is displayed, when the user speaks “Hokkaido University” as shown in FIG. 4 (5), the signal processing unit 1 is stored in the external storage device 15. Since “Hokkaido University” is not registered as a speech recognition target vocabulary in the speech recognition target vocabulary database, select the Kokubun station registered in the speech recognition target vocabulary database closest to the speech recognition result for “Hokkaido University”. As shown in FIG. 3D, the speech recognition result image 44 including the speech recognition result “Kokubun Station” is displayed in the speech recognition result display field 45. In this voice recognition result image 44, a “go to there” button 47 for generating a command for searching for an optimum route to go to the position indicated by the voice recognition result included in the voice recognition result display field 45 and the voice recognition result indicate. And a “view map” button 48 that generates a command to display a map near the location.

この音声認識結果画像４４を表示している状態において、図４の（７）のように入力装置５の訂正スイッチ５ｂを使用者が操作すると、図４の（８）のようにスピーカ４から「もう一度発話してください」との告知音声を放音させて、再度図３（ｃ）の施設名入力画像４３を表示させる。そして、図４の（５）〜（８）のように（９）〜（１２），（１３）〜（１６）の手順を行い、図４の（１７）において使用者が「北海道大学」と発話した場合、音声認識対象語彙データベースに正式名称「北海道大学」が音声認識対象語彙として登録されているので、信号処理ユニット１によって「北海道大学」との音声認識結果を図３（ｄ）の音声認識結果画像４４における音声認識結果表示欄４５に表示させることができる。 When the user operates the correction switch 5b of the input device 5 as shown in FIG. 4 (7) in the state where the voice recognition result image 44 is displayed, the "4" is displayed from the speaker 4 as shown in FIG. “Speak again” is sounded and the facility name input image 43 of FIG. 3C is displayed again. Then, the steps (9) to (12) and (13) to (16) are performed as in (5) to (8) in FIG. 4, and in (17) in FIG. When speaking, since the official name “Hokkaido University” is registered as the speech recognition target vocabulary in the speech recognition target vocabulary database, the speech recognition result of “Hokkaido University” by the signal processing unit 1 is shown in FIG. It can be displayed in the speech recognition result display field 45 in the recognition result image 44.

このように、「北海道大学」の言い換え語である「北大」が外部記憶装置１５の音声認識対象語彙データベースに登録されていない場合、音声認識装置は、使用者が「北大」と発話しても、「北海道大学」との音声認識結果を出力できない。 As described above, when “Hokkaido University”, which is the paraphrase word of “Hokkaido University”, is not registered in the speech recognition target vocabulary database of the external storage device 15, the speech recognition device can recognize even if the user utters “Hokkaido University”. , Voice recognition results with “Hokkaido University” cannot be output.

これに対し、本発明を適用した音声認識装置は、使用者が実際に使用する言い換え語の実使用度を評価して、例えば「北大」という言い換え語の実使用度が高い場合には、当該言い換え語の「北大」を音声認識対象語彙データベースに登録することを特徴とするものである。すなわち、音声認識装置は、図５に示すように、（１）において使用者が「行き先」と発話し、（２）で「行き先のコマンドをどうぞ」との告知音声をスピーカ４から放音し、（３）で使用者が「施設」と発話し、（４）で「施設名をどうぞ」との告知音声をスピーカ４から放音したことに対し、使用者が「北大」と発話すると、当該「北大」の音声認識対象語彙が正式名称「北海道大学」の言い換え語として音声認識対象語彙データベースに登録されているので、（６）で「北海道大学」とスピーカ４から放音及び図３（ｄ）に示す音声認識結果画像４４において音声認識結果表示欄４５に「北海道大学」と表示させることができる。 On the other hand, the speech recognition apparatus to which the present invention is applied evaluates the actual usage of the paraphrase actually used by the user. For example, when the actual usage of the paraphrase “Hokkaido” is high, The paraphrase word “Hokkaido University” is registered in the speech recognition target vocabulary database. That is, as shown in FIG. 5, the voice recognition device emits a notification voice from the speaker 4 that the user speaks “destination” in (1) and “please give a destination command” in (2). In (3), the user utters “facility”, and in (4), when the user utters “Hokkaido Univ.” Since the speech recognition target vocabulary of the “Hokkaido University” is registered in the speech recognition target vocabulary database as the paraphrase of the official name “Hokkaido University”, sound emission from “Hokkaido University” and the speaker 4 in FIG. In the speech recognition result image 44 shown in d), “Hokkaido University” can be displayed in the speech recognition result display field 45.

以下、このように音声認識対象語彙データベースに実使用度の高い言い換え語を登録する処理について、図６を参照して説明する。 Hereinafter, a process for registering a paraphrase word having a high actual use degree in the speech recognition target vocabulary database will be described with reference to FIG.

先ずステップＳ１において、信号処理装置１１のＣＰＵ２１は、使用者が発話スイッチ５ａを操作したことを検出し、発話開始が指示されたことを判定して、処理をステップＳ２に進める。 First, in step S1, the CPU 21 of the signal processing device 11 detects that the user has operated the speech switch 5a, determines that the start of speech has been instructed, and advances the processing to step S2.

ステップＳ２において、ＣＰＵ２１は、音声認識処理のための待ち受け設定を行う。ＣＰＵ２１は、音声入力用のメニューとして図３（ａ）〜（ｄ）の何れかの画像を表示して音声入力の待ち受け状態となる。なお、ステップＳ１で発話スイッチ５ａが操作された直後においては、最上位階層である図３（ａ）の音声メニュー画像４１を表示して待ち受け状態となるが、例えば「行き先」、「施設名」と操作名称を選択して下位層に向かうに従って図３（ｂ）〜（ｄ）の画像を表示させることになる。 In step S2, the CPU 21 performs standby setting for voice recognition processing. The CPU 21 displays one of the images shown in FIGS. 3A to 3D as a voice input menu and enters a voice input standby state. Immediately after the utterance switch 5a is operated in step S1, the voice menu image 41 of FIG. 3A, which is the highest hierarchy, is displayed and enters a standby state. For example, “destination”, “facility name” The operation names are selected and the images shown in FIGS. 3B to 3D are displayed as the operation name is moved to the lower layer.

また、ＣＰＵ２１は、音声認識対象語彙データベースに登録されている音声認識対象語彙を外部記憶装置１５からメモリ２２に読み込む。そして、使用者の発話とともに操作名称データベース３１，３２といったように下位層に階層が進み、図２の音声認識対象語彙データベース３３に登録されている語彙すなわち施設名称が音声認識対象語彙として読み込まれているとする。なお、全国全ての施設名称をメモリ２２に読み込むことはメモリ容量及び音声認識演算速度の増加などの理由で難しいが、使用者の位置の近傍県内の施設および予め定められた全国の代表的な施設名称が読み込まれているとする。 Further, the CPU 21 reads the speech recognition target vocabulary registered in the speech recognition target vocabulary database from the external storage device 15 into the memory 22. Then, with the user's utterance, the hierarchy advances to a lower layer such as the operation name databases 31 and 32, and the vocabulary registered in the speech recognition target vocabulary database 33 in FIG. 2, that is, the facility name is read as the speech recognition target vocabulary. Suppose that In addition, it is difficult to read all facility names in the country into the memory 22 for reasons such as an increase in memory capacity and voice recognition calculation speed, but facilities in the vicinity of the user's location and typical facilities nationwide that have been determined in advance. Suppose the name has been read.

次のステップＳ３において、ＣＰＵ２１は、プロンプト、すなわち音声認識処理を開始した旨を使用者に告知する為に、外部記憶装置１５に記憶されている告知音声信号をＤ／Ａコンバータ１３及びアンプ１４を介してスピーカ４に出力し、告知音声をスピーカ４から放音させる。例えば施設名を発話させる場合、「施設名をどうぞ」などが告知音声に該当する。 In the next step S3, the CPU 21 sends a notification voice signal stored in the external storage device 15 to the D / A converter 13 and the amplifier 14 in order to notify the user that the voice recognition process has been started. To the speaker 4 and the announcement sound is emitted from the speaker 4. For example, when a facility name is uttered, “Please name the facility” corresponds to the announcement voice.

この告知音声に対し、例えば、図３（ｃ）の施設名入力画像４３を表示している場面において、図４の（５）のように正式名称「北海道大学」の言い換え語の「北大」と発話したとする。この場合、信号処理ユニット１は、マイク２からの音声信号をＡ／Ｄコンバータ１２でディジタル信号に変換し、信号処理装置１１に入力させると、ＣＰＵ２１によって発話スイッチ５ａの操作がなされるまで、ディジタル信号の平均パワーを演算している。発話スイッチ５ａが操作された後、信号処理装置１１は、平均パワーと比較してディジタル信号の瞬間パワーが所定値以上大きくなった時に、使用者が発話したと判断して、音声取り込みを開始する。 For example, in the scene where the facility name input image 43 shown in FIG. 3C is displayed in response to the announcement voice, the paraphrase “Hokkaido University” of the official name “Hokkaido University” as shown in FIG. Suppose you speak. In this case, when the signal processing unit 1 converts the audio signal from the microphone 2 into a digital signal by the A / D converter 12 and inputs the digital signal to the signal processing device 11, the signal processing unit 1 is digital until the speech switch 5 a is operated by the CPU 21. The average power of the signal is calculated. After the utterance switch 5a is operated, the signal processing device 11 determines that the user has uttered when the instantaneous power of the digital signal is greater than a predetermined value as compared with the average power, and starts to capture voice. .

次のステップＳ４において、ＣＰＵ２１は、ステップＳ３で音声信号の読み取りを開始して読み取った音声信号と、メモリ２２に記憶されている音声認識対象語彙との一致度の演算を開始する。この一致度、すなわち音声区間部分と個々の音声認識対象語彙を示す音声信号が一致している度合いは、ＣＰＵ２１によって、音声区間ごとにスコアとして得られる。このスコアは、値が大きいほど、一致度が高いとする。なお、この音声区間ごとに一致度を求めている処理に平行して、音声取り込みを継続している。 In the next step S 4, the CPU 21 starts reading the audio signal in step S 3 and starts calculating the degree of coincidence between the read audio signal and the speech recognition target vocabulary stored in the memory 22. The degree of coincidence, that is, the degree of coincidence between the speech segments and the speech signals indicating the individual speech recognition target words, is obtained as a score for each speech segment by the CPU 21. It is assumed that the greater the score, the higher the degree of matching. Note that voice capturing is continued in parallel with the processing for obtaining the degree of coincidence for each voice section.

次のステップＳ５において、ＣＰＵ２１は、Ａ／Ｄコンバータ１２から得られた音声のディジタル信号の瞬間パワーが所定時間以上に亘って所定値以下になった時に、使用者の発話が終了したと判断し、音声の取り込みを終了する。 In the next step S5, the CPU 21 determines that the user's utterance has ended when the instantaneous power of the digital audio signal obtained from the A / D converter 12 has fallen below a predetermined value over a predetermined time. , End audio capture.

次のステップＳ６において、ＣＰＵ２１は、図３（ｃ）のように音声認識結果表示欄４５を表示させている所定の名称入力階層であるか否かを判定する。すなわち、図３（ａ）、（ｂ）のような音声メニュー画像４１、行き先選択メニュー画像４２を表示させる階層のように行き先、探索条件等の操作名称が入力される階層ではなく、行き先の施設名や住所等の正式名称や当該正式名称の言い換え語が入力される階層であるか否かを判定する。例えば発話とともに階層が進み、音声の取り込みが完了した時点の階層が、図３（ｃ）の施設名入力画像４３が表示されている階層である場合、正式名称、言い換え語が発話される可能性がある階層であると判定して、ステップＳ７に処理を進める。一方、音声メニュー画像４１や行き先選択メニュー画像４２のようにメニューリスト４６から選択する操作名称を選択する階層であると判定した場合には、ステップＳ６からステップＳ８に処理を進める。 In the next step S6, the CPU 21 determines whether or not the predetermined name input hierarchy is displaying the voice recognition result display field 45 as shown in FIG. That is, the destination facility is not a hierarchy in which operation names such as a destination and a search condition are input like the hierarchy in which the voice menu image 41 and the destination selection menu image 42 are displayed as shown in FIGS. It is determined whether or not it is a hierarchy in which a formal name such as a name and address and a paraphrase of the formal name are input. For example, when the hierarchy advances with the utterance and the hierarchy at the time when the voice capturing is completed is the hierarchy where the facility name input image 43 of FIG. 3C is displayed, there is a possibility that the formal name and the paraphrase are uttered. It is determined that there is a certain hierarchy, and the process proceeds to step S7. On the other hand, when it is determined that the operation name to be selected from the menu list 46 is determined as in the voice menu image 41 or the destination selection menu image 42, the process proceeds from step S6 to step S8.

ステップＳ７において、ＣＰＵ２１は、後のステップＳ１４において使用者の発話「北大」が言い換え語である可能性を考慮するために、ステップＳ５で音声取り込みを終了した「北大」の音声を示すディジタル信号をメモリ２２に一時保存する。 In step S7, the CPU 21 considers the possibility that the user's utterance “Hokkaido University” is a paraphrase word in the subsequent step S14, and outputs a digital signal indicating the voice of “Hokkaido University” that has finished the voice capture in step S5. Temporarily stored in the memory 22.

次のステップＳ８において、ＣＰＵ２１は、音声認識対象語彙データベースに記憶されている音声認識対象語彙と、「北大」の音声を示すディジタル信号との一致度を求め、一致度の大きい順番で音声認識結果の候補を取得する。 In the next step S8, the CPU 21 obtains the degree of coincidence between the speech recognition target vocabulary stored in the speech recognition target vocabulary database and the digital signal indicating the speech of “Hokkaido University”, and the speech recognition results in descending order of coincidence. Get candidates for.

そして、次のステップＳ９において、ＣＰＵ２１は、ステップＳ８で取得した音声認識結果の候補を出力する。例えば図３（ｄ）に示すように、ＣＰＵ２１は、音声認識結果画像４４の音声認識結果表示欄４５に、「北大」の音声を示すディジタル信号と最も一致度が高い音声認識対象語彙「国分駅」を音声認識結果として出力する。なお、音声認識結果の出力の仕方としては、ＣＰＵ２１の音声合成機能によって音声認識結果の「国分駅」を音声信号に変換して、Ｄ／Ａコンバータ１３及びアンプ１４を介して、スピーカ４で「国分駅」と放音させても良い。 In the next step S9, the CPU 21 outputs the speech recognition result candidate acquired in step S8. For example, as shown in FIG. 3 (d), the CPU 21 displays in the speech recognition result display field 45 of the speech recognition result image 44 the speech recognition target vocabulary “Kokubun Station” having the highest degree of coincidence with the digital signal indicating “Hokkaido University” speech. Is output as a speech recognition result. As a method of outputting the voice recognition result, “Kokubun Station” of the voice recognition result is converted into a voice signal by the voice synthesizing function of the CPU 21, and “D / A converter 13 and the amplifier 14 are used to output“ Kokubun Station "may be sounded.

その後、音声認識結果として「国分駅」を出力したことに対して、使用者によって、訂正スイッチ５ｂが操作される。その結果、信号処理装置１１は、ステップＳ１０において、ステップＳ９で音声認識結果を出力した後の所定時間（例えば数１０秒）内に訂正スイッチ５ｂが操作されたことを検出したか否かを判定する。所定時間内に訂正スイッチ５ｂが操作されたことを検出した場合、処理をステップＳ１０からステップＳ１１に進め、所定時間内に訂正スイッチ５ｂが操作されたことが検出されなかった場合、処理をステップＳ１０からステップＳ１２に進める。 Thereafter, the correction switch 5b is operated by the user in response to outputting “Kokubun Station” as the voice recognition result. As a result, in step S10, the signal processing device 11 determines whether or not it has been detected that the correction switch 5b has been operated within a predetermined time (for example, several tens of seconds) after outputting the speech recognition result in step S9. To do. If it is detected that the correction switch 5b has been operated within a predetermined time, the process proceeds from step S10 to step S11. If it is not detected that the correction switch 5b has been operated within the predetermined time, the process proceeds to step S10. To step S12.

ステップＳ１１において、ＣＰＵ２１は、訂正スイッチ５ｂの操作回数をインクリメントして記録して、ステップＳ３に処理を進めて、ステップＳ３〜ステップＳ１０の処理を繰り返して行う。その後、図４の（９）〜（１６）のように、使用者から「北大」という発話が繰り返されて、その後に、図４の（１７）にて「北海道大学」という正式名称を発話したとする。この場合、ステップＳ８において、音声認識対象語彙データベース３３に「北海道大学」という正式名称が登録されていることから、当該「北海道大学」という音声認識対象語彙が最も一致度が高くなり、ステップＳ９において「北海道大学」という音声認識結果を出力できる。 In step S11, the CPU 21 increments and records the number of operations of the correction switch 5b, advances the process to step S3, and repeats the processes of steps S3 to S10. After that, the utterance “Hokkaido University” was repeated from the user as shown in (9) to (16) of FIG. 4, and then the official name “Hokkaido University” was uttered in (17) of FIG. And In this case, since the official name “Hokkaido University” is registered in the speech recognition target vocabulary database 33 in step S8, the speech recognition target vocabulary “Hokkaido University” has the highest degree of coincidence, and in step S9. The speech recognition result “Hokkaido University” can be output.

このように、「北海道大学」という音声認識結果を出力した後のステップＳ１０においては、訂正スイッチ５ｂが操作されずに、ＣＰＵ２１は、処理をステップＳ１０からステップＳ１２に進める。 Thus, in step S10 after outputting the speech recognition result “Hokkaido University”, the correction switch 5b is not operated, and the CPU 21 advances the process from step S10 to step S12.

ステップＳ１２において、ＣＰＵ２１は、ステップＳ９で音声認識結果を出力した音声認識対象語彙が操作名称であるか、正式名称又は言い換え語であるかを判定して、次の階層が有るか否かを判定する。次の階層がある場合には、ステップＳ２に処理を戻し、次の階層が無い場合には、ステップＳ１３に処理を進める。例えば行き先として施設名の「北海道大学」を音声認識結果として出力した場合には、ステップＳ１３に処理を進める。 In step S12, the CPU 21 determines whether the speech recognition target vocabulary for which the speech recognition result is output in step S9 is an operation name, a formal name, or a paraphrase, and determines whether there is a next hierarchy. To do. If there is a next hierarchy, the process returns to step S2. If there is no next hierarchy, the process proceeds to step S13. For example, when “Hokkaido University” of the facility name is output as the speech recognition result as the destination, the process proceeds to step S13.

ステップＳ１３において、ＣＰＵ２１は、図３（ｄ）の音声認識結果画像４４に含まれる「そこへ行く」ボタン４７又は「地図を見る」ボタン４８が選択されたことによって、音声認識結果を決定する。「そこへ行く」ボタン４７又は「地図を見る」ボタン４８が選択された場合、北海道大学の位置情報及びコマンドをナビゲーション装置（図示せずに）に供給して、ルート探索又は地図表示をさせる。 In step S 13, the CPU 21 determines the speech recognition result when the “go there” button 47 or the “view map” button 48 included in the speech recognition result image 44 of FIG. 3D is selected. When the “go to there” button 47 or the “view map” button 48 is selected, the location information and commands of Hokkaido University are supplied to a navigation device (not shown) to search for a route or display a map.

次のステップＳ１４において、ＣＰＵ２１は、使用者から発話された「北大」が、正式名称「北海道大学」の言い換え語である可能性を評価する。 In the next step S 14, the CPU 21 evaluates the possibility that “Hokkaido University” uttered by the user is a paraphrase of the official name “Hokkaido University”.

先ず、ＣＰＵ２１は、正式名称「北海道大学」から言い換え語を生成する。ＣＰＵ２１は、正式名称を、形態素解析プログラムにより形態素に分割し、「北海道」と「大学」に分割する。なお、形態素解析は、汎用プログラム（例えば、ChaSen-http://Chasen.aist-nara.ac.jp/）をＣＰＵ２１で実行することで実現される。ＣＰＵ２１は、この２分割された２つの形態素「北海道」、「大学」から、それぞれ部分文字列を取り出して連結することで複数の言い換え語を生成する。例えば、２文字の言い換え語としては「北大」、「海大」、「道大」、「北学」、「海学」、「道学」が生成され、さらに他の文字数の言い換え語も生成される。 First, the CPU 21 generates a paraphrase from the official name “Hokkaido University”. The CPU 21 divides the official name into morphemes by a morpheme analysis program, and divides them into “Hokkaido” and “University”. The morphological analysis is realized by executing a general-purpose program (for example, ChaSen-http: //Chasen.aist-nara.ac.jp/) on the CPU 21. The CPU 21 generates a plurality of paraphrases by taking out and concatenating partial character strings from the two divided morphemes “Hokkaido” and “University”. For example, two-letter paraphrases are “Hokkaido University”, “Umi Univ.”, “Michidai”, “Hokkaido”, “Oceanology”, “Dogaku”, and other paraphrasing numbers are also generated. The

次にＣＰＵ２１は、使用者が発話した言い換え語であって、ステップＳ７でメモリ２２に一時記憶された言い換え語の実使用度を評価する。このとき、ＣＰＵ２１は、ステップＳ１０で訂正スイッチ５ｂが操作されて、ステップＳ１１で訂正スイッチ５ｂの操作回数が多いほど、当該言い換え語に対する使用者の使用意図が高く、当該言い換え語の実使用度が高いと評価する。そして、ＣＰＵ２１は、訂正スイッチ５ｂの操作回数が所定値以上であるか否かを判定して、所定値以上の場合には、当該言い換え語の実使用度が高く、言い換え語を音声認識対象語彙データベースに登録することを決定する。 Next, the CPU 21 evaluates the actual usage of the paraphrase words spoken by the user and temporarily stored in the memory 22 in step S7. At this time, the CPU 21 operates the correction switch 5b in step S10, and the greater the number of operations of the correction switch 5b in step S11, the higher the user's intention to use the paraphrase, and the actual usage of the paraphrase is higher. Evaluate as high. Then, the CPU 21 determines whether or not the number of operations of the correction switch 5b is equal to or greater than a predetermined value. If the number of operations of the correction switch 5b is equal to or greater than the predetermined value, the actual usage of the paraphrase is high. Decide to register in the database.

なお、使用者の使用意図は、訂正スイッチ５ｂの操作回数に限らず、同じ言い換え語「北大」を発話した回数であっても良い。例えば、正式名称「北海道大学」に対する言い換え語「北大」、「北海道大」とがステップＳ７でメモリ２２に一時記憶され、言い換え語「北大」の方が多く発話されていた場合には、「北大」の方が使用意図が高いと判定できる。 The user's intention to use is not limited to the number of operations of the correction switch 5b, but may be the number of times the same paraphrase “Hokkaido” is spoken. For example, if the paraphrase words “Hokkaido University” and “Hokkaido University” for the official name “Hokkaido University” are temporarily stored in the memory 22 in step S7, and the paraphrase word “Hokkaido University” is spoken more frequently, Can be determined to have a higher intended use.

次にＣＰＵ２１は、ステップＳ７でメモリ２２に一時記憶された使用者の音声「北大」を入力音声とし、「北海道」、「大学」の２個の形態素から生成した言い換え語の全てを音声認識対象語彙とし、入力音声と音声認識対象語彙との一致度を演算する。その結果、ＣＰＵ２１は、一致度のスコアが所定の閾値以上の言い換え語が存在した場合には、当該入力音声を音声認識対象語彙として音声認識対象語彙データベースに登録する。 Next, the CPU 21 uses the user's speech “Hokkaido University” temporarily stored in the memory 22 in step S7 as the input speech, and recognizes all of the paraphrases generated from the two morphemes “Hokkaido” and “University” as speech recognition targets. The vocabulary is used, and the degree of coincidence between the input speech and the speech recognition target vocabulary is calculated. As a result, when there is a paraphrase having a matching score equal to or higher than a predetermined threshold, the CPU 21 registers the input speech as a speech recognition target vocabulary in the speech recognition target vocabulary database.

また、ＣＰＵ２１は、「北海道大学」と同一カテゴリーである他の大学名称にも同様の言い換え語を生成して登録しても良い。すなわち、「北大」を音声認識対象語彙として音声認識対象語彙データベースに登録した場合、ＣＰＵ２１は、形態素解析した結果である「北海道」、「大学」それぞれの一文字目を連結して「北大」という言い換え語を作成するという規則を生成し、当該規則を他の大学の正式名称に適用して、音声認識対象語彙として登録してもよい。 Further, the CPU 21 may generate and register a similar paraphrase for another university name in the same category as “Hokkaido University”. That is, when “Hokkaido University” is registered as a speech recognition target vocabulary in the speech recognition target vocabulary database, the CPU 21 connects the first characters of “Hokkaido” and “University”, which are the results of morphological analysis, and rephrases “Hokkaido University”. A rule for creating a word may be generated, and the rule may be applied to a formal name of another university and registered as a speech recognition target vocabulary.

これにより、正式名称「北海道大学」と言い換え語「北大」とを音声認識対象語彙データベースに登録した後には、図５の（５）示すように、「北大」と使用者が発話したことに対する応答として、「北海道大学」という正式名称を音声認識結果として出力することができる。また、ＣＰＵ２１は、「北海道大学」と使用者が発話したことに対する応答として、「北海道大学」との音声認識結果を出力すると同時に、正式名称「北海道大学」の言い換え語として「北大」と発話しても正式名称「北海道大学」を音声認識結果として出力できることを図３（ｄ）の音声認識結果画像４４内で表示しても良い。 Thus, after registering the official name “Hokkaido University” and the paraphrase word “Hokkaido University” in the speech recognition target vocabulary database, as shown in FIG. 5 (5), the response to the user speaking “Hokkaido University” The official name “Hokkaido University” can be output as a speech recognition result. In addition, the CPU 21 outputs a speech recognition result with “Hokkaido University” as a response to what the user spoke with “Hokkaido University”, and at the same time, speaks “Hokkaido University” as the paraphrase for the official name “Hokkaido University”. However, the fact that the official name “Hokkaido University” can be output as the voice recognition result may be displayed in the voice recognition result image 44 of FIG.

［第１実施形態の効果］
以上詳細に説明したように、本発明を適用した第１実施形態に係る音声認識装置によれば、実使用度の高い言い換え語を音声認識対象語彙として登録するので、実際に使用される可能性が高い言い換え語のみを音声認識対象語彙として追加登録でき、必要以上に音声認識対象語彙が多くなってしまう問題がなく、音声操作の使い勝手を大きく向上できる。 [Effect of the first embodiment]
As described above in detail, according to the speech recognition apparatus according to the first embodiment to which the present invention is applied, a paraphrase with high actual usage is registered as a speech recognition target vocabulary, so that it may be actually used. Only high paraphrasing words can be additionally registered as speech recognition target vocabulary, and there is no problem that the number of speech recognition target vocabularies increases more than necessary, and the usability of voice operation can be greatly improved.

また、この音声認識装置によれば、訂正スイッチ５ｂによって訂正された言い換え語（第１の音声認識結果）を記憶しておき、その後に入力した音声に基づく音声認識結果（第２の音声認識結果）が訂正されなかった場合に、訂正された音声認識結果（第１の音声認識結果）が訂正されなかった音声認識結果（第２の音声認識結果）の言い換え語として実使用度が高いという評価をするので、訂正された言い換え語のみを音声認識対象語彙として登録でき、必要以上に音声認識対象語彙を多くすることを回避できる。 In addition, according to this speech recognition apparatus, the paraphrase (first speech recognition result) corrected by the correction switch 5b is stored, and then the speech recognition result (second speech recognition result) based on the input speech. ) Is not corrected, it is evaluated that the corrected speech recognition result (first speech recognition result) is highly used as a paraphrase for the uncorrected speech recognition result (second speech recognition result). Therefore, only the corrected paraphrase word can be registered as the speech recognition target vocabulary, and it is possible to avoid increasing the speech recognition target vocabulary more than necessary.

更に、この音声認識装置によれば、訂正スイッチ５ｂを操作した操作回数から、言い換え語の使用意図が高い場合に、言い換え語を登録するので、多くの操作を費やして入力に至った、より使用意図の高い言い換え語のみを音声認識結果に追加登録でき、必要以上に音声認識結果が多くなることを回避できる。 Furthermore, according to this speech recognition apparatus, since the paraphrase word is registered when the intended use of the paraphrase word is high from the number of times the correction switch 5b has been operated, the operation is led to input by using many operations. Only paraphrases with high intent can be additionally registered in the speech recognition result, and an increase in the speech recognition result can be avoided.

更にまた、この音声認識装置によれば、言い換え語を音声認識対象語彙として登録した場合に、正式名称から当該言い換え語が生成された規則を求めて、当該言い換え語と同一カテゴリーに分類される他の正式名称の言い換え語を、当該規則に従って登録するので、例えば大学といったカテゴリーにおいて正式名称「北海道大学」が「北大」として登録された場合、同じカテゴリーの正式名称「ＡＢＣ大学」から「Ａ大」という言い換え語を登録できる。これにより、言い換え語を用いて音声認識装置を使いやすいものとできる。 Furthermore, according to this speech recognition apparatus, when a paraphrase word is registered as a speech recognition target vocabulary, a rule for generating the paraphrase word from the official name is obtained and classified into the same category as the paraphrase word. Since the official name “Hokkaido University” is registered as “Hokkaido University” in a category such as a university, for example, the official name “ABC University” to “A University” in the same category is registered. Can be registered. This makes it easy to use the speech recognition apparatus using paraphrased words.

更にまた、音声認識装置によれば、音声認識対象語彙として登録された言い換え語が、使用可能となったことを使用者に提示するので、次回使用時から言い換え語を速やかに使用させることが可能となる。 Furthermore, according to the speech recognition apparatus, since the paraphrase registered as the speech recognition target vocabulary is presented to the user, it is possible to promptly use the paraphrase from the next use. It becomes.

［第２実施形態］
つぎに、第２実施形態に係る音声認識装置について説明する。なお、第２実施形態に係る音声認識装置は、その構成が上述の第１実施形態と同様であるので、同一符号を付することによりその詳細な説明を省略する。 [Second Embodiment]
Next, a speech recognition apparatus according to the second embodiment will be described. In addition, since the structure of the speech recognition apparatus according to the second embodiment is the same as that of the above-described first embodiment, detailed description thereof is omitted by attaching the same reference numerals.

音声認識装置において、使用者が「厚木国際カントリー倶楽部」という正式名称の場所に行きたい又は地図表示させたい場合に、「厚木カントリー」という言い換え語が音声認識対象語彙データベースに登録されていない時の動作を説明する。 When the user wants to go to a place with the official name “Atsugi International Country Club” or display a map, the paraphrase “Atsugi Country” is not registered in the speech recognition target vocabulary database. The operation will be described.

このような音声認識装置においては、図３（ａ）に示す音声入力用の音声メニュー画像４１をタッチパネルディスプレイ３に表示している時に、音声入力を受け付ける。 In such a voice recognition device, voice input is accepted when the voice menu image 41 for voice input shown in FIG.

この音声メニュー画像４１を表示させている状態において、ＣＰＵ２１は、図７の（１）のように使用者が「行き先」と発話し、当該「行き先」との音声認識結果を得ると、ＣＰＵ２１は、図７の（２）のように「行き先のコマンドをどうぞ」との告知音声をスピーカ４から放音させて、図３（ｂ）の複数の操作名称を示すメニューリスト４６を含む行き先選択メニュー画像４２をタッチパネルディスプレイ３に表示させる。 In a state where the voice menu image 41 is displayed, when the user speaks “destination” as shown in (1) of FIG. 7 and obtains a voice recognition result of the “destination”, the CPU 21 7, a notification voice saying “Please give a destination command” is emitted from the speaker 4, and a destination selection menu including a menu list 46 showing a plurality of operation names in FIG. The image 42 is displayed on the touch panel display 3.

図３（ｂ）の行き先選択メニュー画像４２を表示させている状態において、図７の（３）のように行き先選択メニュー画像４２のメニューリスト４６に含まれる「施設」という操作名称を使用者が発話し、ＣＰＵ２１によって「施設」との音声認識結果を得た場合には、図７の（４）のように「施設名をどうぞ」との告知音声をスピーカ４から放音させて、図３（ｃ）に示す施設名入力画像４３を表示させる。 In the state where the destination selection menu image 42 of FIG. 3B is displayed, the user selects the operation name “facility” included in the menu list 46 of the destination selection menu image 42 as shown in FIG. When the speech recognition result of “facility” is obtained by the utterance and the CPU 21, an announcement voice “Please name the facility” is emitted from the speaker 4 as shown in FIG. The facility name input image 43 shown in (c) is displayed.

図３（ｃ）の施設名入力画像４３を表示させている状態において、図７の（５）のように「厚木カントリー」と使用者が発話した場合、ＣＰＵ２１は、外部記憶装置１５の音声認識対象語彙データベースには「厚木カントリー」が音声認識対象語彙として登録されていないことから、当該「厚木カントリー」に対する音声認識結果と最も近く音声認識対象語彙データベースに登録されている厚木駅を選択して、図３（ｄ）の音声認識結果表示欄４５に音声認識結果「厚木駅」を含む音声認識結果画像４４を表示する。 When the user utters “Atsugi Country” as shown in (5) of FIG. 7 while the facility name input image 43 of FIG. 3C is displayed, the CPU 21 recognizes the voice of the external storage device 15. Since “Atsugi Country” is not registered as a speech recognition target vocabulary in the target vocabulary database, select the Atsugi station registered in the speech recognition target vocabulary database closest to the speech recognition result for “Atsugi Country”. Then, the voice recognition result image 44 including the voice recognition result “Atsugi Station” is displayed in the voice recognition result display field 45 of FIG.

この音声認識結果画像４４を表示している状態において、図７の（７）のように訂正スイッチ５ｂを使用者が操作すると、図７の（８）のようにスピーカ４から「もう一度発話してください」との告知音声を放音させて、再度図３（ｃ）の施設名入力画像４３を表示させる。そして、再度使用者によって「厚木カントリー」と発話したことに対して、図７の（１０）で「厚木駅」との音声認識結果を出力した場合、使用者が音声による入力をあきらめて、図７の（１１）にて、タッチパネルディスプレイ３を用いた手操作入力で「厚木国際カントリー倶楽部」と入力させる。 When the user operates the correction switch 5b as shown in (7) of FIG. 7 in the state where the voice recognition result image 44 is displayed, “speak again” from the speaker 4 as shown in (8) of FIG. The notification voice “Please” is emitted and the facility name input image 43 of FIG. 3C is displayed again. If the user again utters “Atsugi Country” and outputs the speech recognition result “Atsugi Station” in (10) of FIG. 7, the user gives up the input by voice, 7 (11), “Atsugi International Country Club” is input by manual operation using the touch panel display 3.

このように、第２実施形態に係る音声認識装置は、音声認識に代わる代替入力手段を備えて、当該代替入力手段によって、音声認識結果とは異なる正式名称が入力された場合に、当該音声認識結果を、代替入力手段により入力した正式名称の言い換え語として実使用度が高いと評価して、音声認識対象語彙として登録することを特徴とする。 As described above, the speech recognition apparatus according to the second embodiment includes the alternative input unit that replaces the speech recognition, and when the formal input different from the speech recognition result is input by the alternative input unit, the speech recognition The result is evaluated as having high actual usage as a paraphrase of the formal name input by the alternative input means, and is registered as a speech recognition target vocabulary.

そして、音声認識装置は、図８の（１）において使用者が「行き先」と発話し、（２）で「行き先のコマンドをどうぞ」との告知音声をスピーカ４から放音し、（３）で使用者が「施設」と発話し、（４）で「施設名をどうぞ」との告知音声をスピーカ４から放音したことに対し、使用者が「厚木カントリー」と発話すると、当該「厚木カントリー」の音声認識結果が音声認識対象語彙データベースに登録されているので、図８の（６）で「厚木国際カントリー倶楽部」とスピーカ４から放音及び図３（ｄ）に示す音声認識結果画像４４において音声認識結果表示欄４５に「厚木国際カントリー倶楽部」と表示させることができる。 Then, the voice recognition device utters “Destination” in (1) in FIG. 8 and emits a notification voice from the speaker 4 that “Please give a destination command” in (2). (3) When the user utters “facility” in (4) and utters an announcement sound “Please name the facility” from the speaker 4, the user utters “Atsugi country”. Since the speech recognition result of “Country” is registered in the speech recognition target vocabulary database, sound is output from “Atsugi International Country Club” and the speaker 4 in FIG. 8 (6), and the speech recognition result image shown in FIG. 44, “Atsugi International Country Club” can be displayed in the voice recognition result display field 45.

以下、第２実施形態に係る音声認識装置の動作について図９及び図１０を参照して説明する。 Hereinafter, the operation of the speech recognition apparatus according to the second embodiment will be described with reference to FIGS. 9 and 10.

第２実施形態に係る音声認識装置は、図９に示すように、図３（ａ）、（ｂ）のように操作名称を含むメニューリスト４６を表示させて操作を選択させる処理及び図３（ｃ）、（ｄ）のように正式名称又は言い換え語の音声認識結果を得る処理を行う。音声認識装置は、第１実施形態の音声認識装置と同様に、ステップＳ１〜ステップＳ９の処理を行い、ステップＳ１０において、所定時間内に訂正スイッチ５ｂが操作されたことを検出した場合には、ステップＳ３に処理を戻し、所定時間内に訂正スイッチ５ｂが操作されなかった場合には、ステップＳ１２に処理を進める。上述したように、第２実施形態に係る音声認識装置は、代替入力手段によって正式名称が入力されたことによって言い換え語の実使用度が高いことを評価するので、図６のステップＳ１１のような訂正スイッチ５ｂの操作回数を記録する処理は行わない。そして、ステップＳ１２において、次の下位層がないと判定した後のステップＳ１３において、ステップＳ９で出力した音声認識結果を決定して処理を終了する。 As shown in FIG. 9, the speech recognition apparatus according to the second embodiment displays a menu list 46 including operation names as shown in FIGS. 3 (a) and 3 (b) and selects an operation as shown in FIG. c) A process for obtaining a speech recognition result of a formal name or paraphrase as in (d) is performed. Similar to the speech recognition device of the first embodiment, the speech recognition device performs the processing of step S1 to step S9, and when it is detected in step S10 that the correction switch 5b has been operated within a predetermined time, The process returns to step S3, and if the correction switch 5b is not operated within a predetermined time, the process proceeds to step S12. As described above, since the speech recognition apparatus according to the second embodiment evaluates that the actual usage of the paraphrase word is high when the formal name is input by the alternative input unit, as shown in step S11 of FIG. The process of recording the number of operations of the correction switch 5b is not performed. Then, in step S13 after determining that there is no next lower layer in step S12, the speech recognition result output in step S9 is determined, and the process ends.

ここで、上述のように、使用者が言い換え語「厚木カントリー」の音声入力をあきらめて、タッチパネルディスプレイ３による操作入力によって正式名称「厚木国際カントリー倶楽部」を音声認識装置に認識させる場合、音声認識装置は、図１０に示す処理を行うことによって、音声認識対象語彙データベースに実使用度の高い言い換え語を登録する。 Here, as described above, when the user gives up the voice input of the paraphrase word “Atsugi Country” and causes the voice recognition device to recognize the official name “Atsugi International Country Club” by the operation input by the touch panel display 3, the voice recognition is performed. The apparatus registers a paraphrase word having a high actual use degree in the speech recognition target vocabulary database by performing the processing shown in FIG.

図１０に示すように、ＣＰＵ２１は、先ず、ステップＳ２１において、図示しない入力装置５のメニュースイッチが操作されたことを検出した場合に、ステップＳ２２に処理を進めて、メニュースイッチの操作に従ったメニュー画面を設定表示し、ステップＳ２３において、タッチパネルディスプレイ３によって正式名称を入力する画面に遷移させるために、使用者による操作入力が確定すると、ステップＳ２４において、現在表示している画面の下位層が存在するかを判定する。 As shown in FIG. 10, when the CPU 21 first detects in step S21 that a menu switch of the input device 5 (not shown) has been operated, it proceeds to step S22 and follows the operation of the menu switch. When the menu screen is set and displayed, and the operation input by the user is confirmed in order to make a transition to the screen for inputting the formal name by the touch panel display 3 in step S23, the lower layer of the currently displayed screen is displayed in step S24. Determine if it exists.

ＣＰＵ２１は、ステップＳ２４において、図１１に示すように、音声入力に代替して正式名称を入力する代替入力画面５１のように、下位層の画面が存在しないと判定した場合に、ステップＳ２５に処理を進める。代替入力画面５１には、使用者が入力しようとする正式名称のカテゴリー情報５２、正式名称入力欄５３、５０音の文字入力ボタン５４、リスト表示ボタン５５が含まれる。カテゴリー情報５２は、ステップＳ２２及びステップＳ２３において使用者によって選択されたカテゴリーである施設、当該施設の下位層のカテゴリーであるゴルフ場を示している。 If the CPU 21 determines in step S24 that there is no lower layer screen, such as an alternative input screen 51 for inputting a formal name instead of voice input, as shown in FIG. 11, the process proceeds to step S25. To proceed. The alternative input screen 51 includes category information 52 of a formal name to be input by the user, a formal name input field 53, a 50-sound character input button 54, and a list display button 55. The category information 52 indicates the facility that is the category selected by the user in step S22 and step S23, and the golf course that is the lower-level category of the facility.

ステップＳ２５において、ＣＰＵ２１は、代替入力画面５１の文字入力ボタン５４及びリスト表示ボタン５５が使用者に操作されることを検出して、操作結果を決定する処理を行う。このとき、図１１に示すように、施設の正式名称「厚木国際カントリー倶楽部」の一部の「あつぎ」が文字入力ボタン５４の操作によって入力された後、リスト表示ボタン５５が操作されると、図１２に示すように、「あつぎ」を先頭に含む音声認識対象語彙をリスト化したリスト表示画面６１を表示する。このとき、ＣＰＵ２１は、カテゴリーが施設の音声認識対象語彙のうち、「あつぎ」を含む部分一致検索を行って、外部記憶装置１５の音声認識対象語彙データベースから「あつぎ」を含む音声認識対象語彙を抽出する。リスト表示画面６１には、検索キーの「あつぎ」を含むリスト表示６２と、「そこへ行く」ボタン６３及び「地図を見る」ボタン６４とを含む。 In step S25, the CPU 21 detects that the character input button 54 and the list display button 55 on the alternative input screen 51 are operated by the user, and performs a process of determining the operation result. At this time, as shown in FIG. 11, when a part of the official name “Atsugi International Country Club” of the facility is input by operating the character input button 54 and then the list display button 55 is operated. As shown in FIG. 12, a list display screen 61 in which the speech recognition target vocabulary including “Ajito” at the head is listed is displayed. At this time, the CPU 21 performs a partial match search including “Aki” in the speech recognition target vocabulary whose category is the facility, and the speech recognition target including “Aso” from the speech recognition target vocabulary database of the external storage device 15. Extract vocabulary. The list display screen 61 includes a list display 62 including a search key “Aki”, a “go there” button 63 and a “view map” button 64.

このリスト表示画面６１を表示させた後、使用者によってリスト表示６２のうち「厚木国際カントリー倶楽部」が選択された場合、ＣＰＵ２１は、当該操作を検出して、操作結果を決定する。また、使用者が「厚木国際カントリー倶楽部」を選択し、更に、「そこへ行く」ボタン６３又は「地図を見る」ボタン６４が選択された時に、操作内容を決定しても良い。 After the list display screen 61 is displayed, when “Atsugi International Country Club” is selected from the list display 62 by the user, the CPU 21 detects the operation and determines the operation result. In addition, when the user selects “Atsugi International Country Club” and further selects the “go there” button 63 or the “view map” button 64, the operation content may be determined.

次に、ＣＰＵ２１は、ステップＳ２６において、ステップＳ２５で操作結果が決定される直前の時間帯（例えば数分）で図９の音声を入力する処理を行っていたか否かを判定する。このとき、ＣＰＵ２１は、例えばメモリ２２に一時記憶した音声のディジタル信号を所定期間だけ保持するように構成した場合には、図９のステップＳ７で一時的にメモリ２２に音声のディジタル信号が記憶されていると判定した時に、直前に音声入力が有ったと判定する。 Next, in step S 26, the CPU 21 determines whether or not the process of inputting the voice in FIG. 9 has been performed in the time zone (for example, several minutes) immediately before the operation result is determined in step S 25. At this time, for example, if the CPU 21 is configured to hold the audio digital signal temporarily stored in the memory 22 for a predetermined period, the audio digital signal is temporarily stored in the memory 22 in step S7 of FIG. When it is determined that there is a voice input, it is determined that there was a voice input immediately before.

次のステップＳ２７において、ＣＰＵ２１は、ステップＳ２６で判定したように、代替入力画面５１からリスト表示画面６１に遷移して正式名称を選択した直前に入力された音声から、言い換え語を生成して、音声認識対象語彙データベースに登録する処理を行う。例えば、使用者にとって正式名称が分からないために、音声入力によって正式名称を音声認識装置に認識させることができずに中断し、代替入力画面５１から正式名称を入力した可能性があるので、言い換え語を生成する処理を行う。 In the next step S27, as determined in step S26, the CPU 21 generates a paraphrase from the voice input immediately before the transition from the alternative input screen 51 to the list display screen 61 and the official name is selected, Processing to register in the speech recognition target vocabulary database is performed. For example, since the user does not know the official name, there is a possibility that the voice recognition apparatus cannot recognize the official name by voice input, and the process is interrupted and the official name may be input from the alternative input screen 51. Process to generate words.

このステップＳ２７において、ＣＰＵ２１は、直前に行われていた音声入力に関わる音声のディジタル信号をメモリ２２から読み出し、この音声のディジタル信号から言い換え語を生成する。次に、ＣＰＵ２１は、生成した言い換え語と、メモリ２２に記憶されていた音声のディジタル信号とを比較して、一致度が高い言い換え語を、正式名称に対する言い換え語であると判定する。このとき、ＣＰＵ２１は、例えば正式名称「厚木国際カントリー倶楽部」から、「厚木」、「国際」、「カントリー」、「倶楽部」という形態素を組み合わせて、「厚木カントリー」という言い換え語の候補を作成し、メモリ２２に「厚木カントリー」が記憶されている場合には、当該「厚木カントリー」が「厚木国際カントリー倶楽部」の言い換え語であると判定して、音声認識対象語彙データベースに登録する。 In step S27, the CPU 21 reads out from the memory 22 a voice digital signal related to the voice input performed immediately before, and generates a paraphrase from the voice digital signal. Next, the CPU 21 compares the generated paraphrase with the digital audio signal stored in the memory 22 and determines that the paraphrase having a high degree of coincidence is a paraphrase for the formal name. At this time, the CPU 21 creates a candidate for the paraphrase “Atsugi Country” by combining the morphemes “Atsugi”, “International”, “Country”, “Club” from the official name “Atsugi International Country Club”, for example. When “Atsugi Country” is stored in the memory 22, it is determined that the “Atsugi Country” is a paraphrase of “Atsugi International Country Club”, and is registered in the speech recognition target vocabulary database.

また、メモリ２２に記憶されている音声のディジタル信号のうち、使用者の初期発話の音声のディジタル信号を選択して、正式名称から生成した言い換え語と照合し、初期発話の音声のディジタル信号と言い換え語との尤度の高い場合に、当該言い換え語を音声認識対象語彙データベースに登録することが望ましい。 In addition, the digital signal of the voice of the initial utterance of the user is selected from the digital signals of the voice stored in the memory 22 and collated with the paraphrase generated from the official name, and the digital signal of the voice of the initial utterance is When the likelihood of a paraphrase word is high, it is desirable to register the paraphrase word in the speech recognition target vocabulary database.

［第２実施形態の効果］
以上詳細に説明したように、本発明を適用した第２実施形態に係る音声認識装置によれば、代替入力画面５１によって正式名称を入力した場合に、当該正式名称の入力よりも前に音声入力があった時には、当該音声の言い換え語を実使用度が高い言い換え語として音声認識対象語彙データベースに登録できるので、音声入力に代替する手段を用いてまで入力を継続したより使用意図の高い言い換え語のみを音声認識対象語彙データベースに登録でき、必要以上に音声認識対象語彙が多くなることを回避できる。 [Effects of Second Embodiment]
As described above in detail, according to the speech recognition apparatus according to the second embodiment to which the present invention is applied, when an official name is input on the alternative input screen 51, speech input is performed before the official name is input. If there is a word, the paraphrase word of the speech can be registered in the speech recognition target vocabulary database as a paraphrase word having a high actual usage rate. Can be registered in the speech recognition target vocabulary database, and the number of speech recognition target vocabulary can be prevented from being increased more than necessary.

また、この音声認識装置によれば、代替入力画面５１によって正式名称が入力された場合に、当該正式名称から生成した言い換え語と、メモリ２２に記憶された使用者の初期発話の音声とを照合して、尤度の高い場合に言い換え語の実使用度が高いと評価して登録するので、使用者の固有の言い換え語を音声認識対象語彙データベースに登録でき、且つ必要以上に音声認識対象語彙が多くなることを回避できる。 Further, according to this speech recognition apparatus, when a formal name is input on the alternative input screen 51, the paraphrase generated from the formal name is collated with the voice of the user's initial utterance stored in the memory 22. Then, when the likelihood is high, the actual usage of the paraphrase word is evaluated and registered, so that the user's unique paraphrase word can be registered in the speech recognition target vocabulary database, and the speech recognition target vocabulary is more than necessary. Can be avoided.

［第３実施形態］
つぎに、第３実施形態に係る音声認識装置について説明する。なお、上述の実施形態と同様の部分については同一符号を付することによりその詳細な説明を省略する。 [Third Embodiment]
Next, a speech recognition apparatus according to the third embodiment will be described. Note that parts similar to those in the above-described embodiment are denoted by the same reference numerals, and detailed description thereof is omitted.

第３実施形態に係る音声認識装置は、図１３に示すように、信号処理ユニット１に、ネットワークを介して情報コンテンツ記憶サーバ（情報コンテンツ記憶手段、図示せず）に接続された通信装置（通信手段）７０が接続されている点で、上述した実施形態に係る音声認識装置とは異なる。この通信装置７０は、信号処理ユニット１の命令に従って、例えばＩＰ（Internet Protocol）等の通信プロトコルに従って通信処理を行う。 As shown in FIG. 13, the speech recognition apparatus according to the third embodiment includes a communication device (communication device) connected to an information content storage server (information content storage means, not shown) via a network. Means) 70 is connected to the speech recognition apparatus according to the above-described embodiment. The communication device 70 performs communication processing according to a communication protocol such as IP (Internet Protocol), for example, in accordance with an instruction from the signal processing unit 1.

この音声認識装置は、例えば行き先の施設名の正式名称が「関西学院大学」であり、言い換え語の「関学」が音声認識対象語彙データベースに登録されていない場合には、図１４に示すような動作となり、後述するように言い換え語「関学」を音声認識対象語彙データベースに登録した場合には、図１５に示す処理を行う。 For example, if the official name of the destination facility name is “Kwansei Gakuin University” and the paraphrase word “Kangaku” is not registered in the speech recognition target vocabulary database, this speech recognition apparatus has a name as shown in FIG. When the paraphrase word “Kakaku” is registered in the speech recognition target vocabulary database as described later, the processing shown in FIG. 15 is performed.

信号処理ユニット１は、図１４の（１）のように使用者が「行き先」と発話し、当該「行き先」との音声認識結果を得ると、図１４の（２）のように「行き先のコマンドをどうぞ」との告知音声をスピーカ４から放音させる。次に図１４の（３）のように「施設」という操作名称を使用者が発話し、信号処理ユニット１によって「施設」との音声認識結果を得た場合には、図１４の（４）のように「施設名をどうぞ」との告知音声をスピーカ４から放音させる。 When the user utters “destination” as shown in (1) of FIG. 14 and obtains a voice recognition result of the “destination” as shown in (1) of FIG. Announcement voice “Please command” is emitted from speaker 4. Next, as shown in FIG. 14 (3), when the user speaks the operation name “facility” and the signal processing unit 1 obtains a speech recognition result of “facility”, the operation (4) in FIG. As shown, the announcement sound “Please name the facility” is emitted from the speaker 4.

次に、信号処理ユニット１は、図１４の（５）のように「関学」と使用者が発話した場合、外部記憶装置１５の音声認識対象語彙データベースには「関学」が音声認識対象語彙として登録されていないことから、当該「関学」に対する音声認識結果と最も近く音声認識対象語彙データベースに登録されている甲府駅を選択する。次に、信号処理ユニット１は、図１４の（７）のように訂正スイッチ５ｂを使用者が操作すると、図１４の（８）のようにスピーカ４から「もう一度発話してください」との告知音声を放音させ、再度使用者によって「関学」と発話したことに対して、図１４の（１０）で「甲府駅」との音声認識結果を出力した場合、使用者が音声による入力をあきらめて、図１４の（１１）にて、タッチパネルディスプレイ３を用いた手操作入力で「関西学院大学」と入力させる。 Next, when the user utters “Sekigaku” as shown in FIG. 14 (5), the signal processing unit 1 has “Sekigaku” as a speech recognition target vocabulary in the speech recognition target vocabulary database of the external storage device 15. Since it is not registered, the Kofu station registered in the speech recognition target vocabulary database closest to the speech recognition result for the “Sekigaku” is selected. Next, when the user operates the correction switch 5b as shown in FIG. 14 (7), the signal processing unit 1 notifies the speaker 4 “Please speak again” as shown in FIG. 14 (8). When the voice is emitted and the user speaks “Sekigaku” again, when the voice recognition result “Kofu Station” is output in (10) of FIG. 14, the user gives up the voice input. Then, in (11) of FIG. 14, “Kwansei Gakuin University” is input by manual operation using the touch panel display 3.

この図１４の（１１）において、信号処理ユニット１は、図１６に示すように、カントリー「大学」の代替入力画面５１から、文字入力ボタン５４を操作させて正式名称入力欄５３に「かん」が入力され、更にリスト表示ボタン５５が操作された場合、図１７に示すリスト表示画面６１を表示する。そして、リスト表示画面６１のリスト表示６２のうち、「関西学院大学」が選択されて、正式名称「関西学院大学」が入力される。 In (11) of FIG. 14, the signal processing unit 1 operates the character input button 54 on the alternative input screen 51 of the country “University” as shown in FIG. When the list display button 55 is further operated, a list display screen 61 shown in FIG. 17 is displayed. Then, “Kwansei Gakuin University” is selected from the list display 62 of the list display screen 61, and the official name “Kwansei Gakuin University” is input.

次に、信号処理ユニット１は、正式名称「関西学院大学」から言い換え語「関学」を生成し、当該生成した言い換え語「関学」を検索キーとしてネットワーク上の情報コンテンツ記憶サーバに記憶されている情報コンテンツを検索するように通信装置７０を制御する。そして、信号処理ユニット１は、生成された言い換え語「関学」が通信装置７０で接続した情報コンテンツ記憶サーバに記憶されている情報コンテンツに含まれている場合に、当該言い換え語「関学」の実使用度が高いと評価して、音声認識対象語彙データベースに登録する。 Next, the signal processing unit 1 generates the paraphrase word “Kangaku” from the official name “Kwansei Gakuin University” and stores the generated paraphrase word “Kangaku” as a search key in the information content storage server on the network. The communication device 70 is controlled to search for information content. Then, when the generated paraphrase word “Sekigaku” is included in the information content stored in the information content storage server connected by the communication device 70, the signal processing unit 1 executes the actual wording of the paraphrase word “Sekigaku”. Assess that the usage is high and register it in the speech recognition target vocabulary database.

そして、音声認識装置は、図１５の（１）において使用者が「行き先」と発話し、（２）で「行き先のコマンドをどうぞ」との告知音声をスピーカ４から放音し、（３）で使用者が「施設」と発話し、（４）で「施設名をどうぞ」との告知音声をスピーカ４から放音したことに対し、使用者が「関学」と発話すると、当該「関学」の音声認識結果が音声認識対象語彙データベースに登録されているので、図１５の（６）で「関西学院大学」とスピーカ４から放音及び図３（ｄ）に示す音声認識結果画像４４において音声認識結果表示欄４５に「関西学院大学」と表示させることができる。 Then, the voice recognition device utters “Destination” in (1) in FIG. 15, and emits a notification voice from the speaker 4 that “Please give a destination command” in (2). (3) When the user utters “facilities” in (4) and utters the announcement sound from the speaker 4 that “please name the facility”, the user utters “Sekigaku”. 15 is registered in the speech recognition target vocabulary database, so that “Kwansei Gakuin University” and sound are emitted from the speaker 4 in (6) of FIG. 15 and the speech recognition result image 44 shown in FIG. “Kwansei Gakuin University” can be displayed in the recognition result display field 45.

この音声認識装置の処理は、図１８に示すように、ステップＳ１〜ステップＳ５の処理によって使用者から発話された音声を取り込んだ後に、ステップＳ８〜ステップＳ１０、ステップＳ１２及びステップＳ１３を行う。ここで、第２実施形態に係る音声認識装置が行う図９の処理に対して、ステップＳ６及びステップＳ７の処理を第３実施形態に係る音声認識装置では行っていない。この理由としては、第３実施形態に係る音声認識装置が、代替入力画面５１及びリスト表示画面６１を表示して入力された正式名称から、実使用度の高い言い換え語を生成するために、使用者から発話された言い換え語の音声をメモリ２２に記憶するステップＳ７を行わないことによる。 As shown in FIG. 18, the voice recognition apparatus performs the steps S8 to S10, S12, and S13 after capturing the voice uttered by the user in the processes of steps S1 to S5. Here, in contrast to the process of FIG. 9 performed by the speech recognition apparatus according to the second embodiment, the processes of step S6 and step S7 are not performed by the speech recognition apparatus according to the third embodiment. The reason for this is that the speech recognition apparatus according to the third embodiment uses the alternative input screen 51 and the list display screen 61 to generate a paraphrase having a high actual usage rate from the formal name input. This is because step S7 for storing the voice of the paraphrase word spoken by the person in the memory 22 is not performed.

また、第３実施形態に係る音声認識装置は、第２実施形態において説明した図１０と同様に、ステップＳ２１〜ステップＳ２６の処理を行い、ステップＳ２６において、直前に図１８に示す処理が行われたと判定した場合には、ステップＳ２７にて言い換え語を生成して、実使用度の高い言い換え語を音声認識対象語彙データベースに登録する処理を行う。 In addition, the speech recognition apparatus according to the third embodiment performs the processing of step S21 to step S26 as in FIG. 10 described in the second embodiment, and in step S26, the processing illustrated in FIG. 18 is performed immediately before. If it is determined that the paraphrase word is generated in step S27, a process of registering the paraphrase word having a high actual usage in the speech recognition target vocabulary database is performed.

このステップＳ２７において、信号処理ユニット１は、先ず、正式名称「関西学院大学」から言い換え語を生成する。このとき、信号処理ユニット１は、正式名称「関西学院大学」を形態素解析プログラムによって形態素に分割させ、「関西」と「学院」と「大学」に分割させる。次に信号処理ユニット１は、３個の形態素からそれぞれ部分文字列を取り出して、連結することで複数の言い換え語を生成する。例えば、２文字の言い換え語として「関学」、「西学」、「関院」、「西院」、「関大」、「西大」、「学大」、「院大」、「関学」、「西学」、「学学」、「院学」を生成し、さらに他の文字数の言い換え語も生成する。 In step S27, the signal processing unit 1 first generates a paraphrase from the official name “Kwansei Gakuin University”. At this time, the signal processing unit 1 divides the official name “Kwansei Gakuin University” into morphemes by the morphological analysis program, and divides them into “Kansai”, “Gakuin”, and “University”. Next, the signal processing unit 1 extracts partial character strings from the three morphemes and connects them to generate a plurality of paraphrases. For example, the two-letter paraphrases are "Sekigaku", "Nishigaku", "Sekiin", "Saiin", "Sekidai", "Nishidai", "Gakudai", "Shondai", "Sekigaku", "Nishigaku" ”,“ Study ”, and“ Study ”, and generate other paraphrasing words.

次に信号処理ユニット１は、通信装置７０を制御して、ネットワーク上の情報コンテンツにアクセスさせて、言い換え語が実際に使われているかを評価する。このとき、信号処理ユニット１は、先ず、通信装置７０によって、生成した言い換え語を検索キーとして、Ｗｅｂホームページ等の情報コンテンツを検索させる。次に信号処理ユニット１は、検索結果件数を通信装置７０から取得し、当該検索結果件数が所定の閾値以上であるか否かを判定して、閾値以上である場合に当該検索結果の情報コンテンツに正式名称が存在する情報コンテンツが存在するか否かを判定する。情報コンテンツに正式名称が存在した場合、信号処理ユニット１は、検索キーとした言い換え語が実際に使用されているものと判断して、当該言い換え語を音声認識対象語彙として登録する。 Next, the signal processing unit 1 controls the communication device 70 to access information content on the network and evaluates whether the paraphrase is actually used. At this time, the signal processing unit 1 first causes the communication device 70 to search for information content such as a Web homepage using the generated paraphrase as a search key. Next, the signal processing unit 1 acquires the number of search results from the communication device 70, determines whether or not the number of search results is equal to or greater than a predetermined threshold, and if the number is greater than or equal to the threshold, information content of the search result It is determined whether there is information content having a formal name. When the formal name exists in the information content, the signal processing unit 1 determines that the paraphrase used as the search key is actually used, and registers the paraphrase as a speech recognition target vocabulary.

このように、信号処理ユニット１によって、正式名称「関西学院大学」から生成した言い換え語のうちの「関学」を検索キーとして情報コンテンツを検索した結果、検索結果である情報コンテンツの数が所定数以上となった実使用度の高い言い換え語であり、当該情報コンテンツに正式名称「関西学院大学」が含まれている場合に、言い換え語「関学」を音声認識対象語彙として登録できる。 As described above, as a result of searching the information content by using “Kan Gaku” among the paraphrases generated from the official name “Kwansei Gakuin University” by the signal processing unit 1 as a search key, the number of information contents as the search result is a predetermined number. The paraphrase with high actual usage as described above, and when the information content includes the official name “Kwansei Gakuin University”, the paraphrase “Kangaku” can be registered as a speech recognition target vocabulary.

また、信号処理ユニット１は、通信装置７０によって検索キーを言い換え語として検索した結果として得られた情報コンテンツがＨＴＭＬ（Hypertext Markup Language）などの構造化言語で記述されている場合、当該ＨＴＭＬデータのタイトル部分に正式名称が存在するかを判定する。そして、ＨＴＭＬデータのタイトル部分に正式名称が存在した場合、当該検索キーとした言い換え語の実使用度が高いと評価して、音声認識対象語彙として登録するとしても良い。 In addition, when the information content obtained as a result of searching the search key as a paraphrase by the communication device 70 is described in a structured language such as HTML (Hypertext Markup Language), the signal processing unit 1 Judge whether the official name exists in the title part. If the formal name exists in the title portion of the HTML data, it may be evaluated that the actual usage of the paraphrase word as the search key is high and registered as a speech recognition target vocabulary.

更に、信号処理ユニット１は、代替入力画面５１及びリスト表示画面６１によって得られた正式名称のカテゴリーが地点名称である場合に、通信装置７０によって情報コンテンツを検索する検索キーとして正式名称である地点名称のみならず、当該地点名称の位置情報を加えることが望ましい。 Further, when the category of the formal name obtained from the alternative input screen 51 and the list display screen 61 is the location name, the signal processing unit 1 uses the location name that is the official name as a search key for retrieving information content by the communication device 70. It is desirable to add not only the name but also the location information of the point name.

［第３実施形態の効果］
以上詳細に説明したように、本発明を適用した第３実施形態に係る音声認識装置によれば、通信装置７０によって検索した言い換え語が複数の情報コンテンツに含まれている場合に、当該言い換え語の実使用度が高いと評価するので、設計時に認知できなかったより一般的に用いられている言い換え語を登録することが可能になり、言い換え語の認識率を高くすることができ、且つ、必要以上に音声認識対象語彙が多くなることを回避できる。 [Effect of the third embodiment]
As described above in detail, according to the speech recognition apparatus according to the third embodiment to which the present invention is applied, when a paraphrase searched by the communication device 70 is included in a plurality of information contents, the paraphrase It is possible to register paraphrasing words that are more commonly used than those that were not recognized at the time of design, so that the recognition rate of paraphrasing words can be increased and necessary. As described above, an increase in the number of speech recognition target words can be avoided.

また、音声認識装置によれば、言い換え語が含まれている情報コンテンツ数が所定値以上である場合に、当該言い換え語の実使用度が高いと評価するので、実使用度が高いと評価する所定値を高くすることによって言い換え語を登録する精度を向上でき、使いやすさを大きく向上でき、且つ、必要以上に音声認識対象語彙が多くなることを回避できる。 Further, according to the speech recognition apparatus, when the number of information contents including a paraphrase word is equal to or greater than a predetermined value, it is evaluated that the actual usage of the paraphrase word is high. By increasing the predetermined value, the accuracy of registering paraphrased words can be improved, the ease of use can be greatly improved, and an increase in the vocabulary for speech recognition can be avoided.

更にまた、音声認識装置によれば、通信装置７０で検索された情報コンテンツ中に、正式名称と当該正式名称から生成された言い換え語との両方が共起している場合に、当該言い換え語の実使用度が高いと評価するので、言い換え語を登録する精度を大きく向上でき、使いやすさを大きく向上できる。 Furthermore, according to the speech recognition apparatus, when both the formal name and the paraphrase generated from the formal name co-occur in the information content retrieved by the communication device 70, the paraphrase word Since it is evaluated that the actual usage is high, the accuracy of registering paraphrased words can be greatly improved, and the usability can be greatly improved.

更にまた、音声認識装置によれば、言い換え語を検索条件として情報コンテンツの検索を行い、検索された情報コンテンツに正式名称が含まれる場合に、当該言い換え語の実使用度が高いと評価するので、個人的な情報コンテンツであっても、検索結果として得ることができ、新たな言い換え語をより迅速に登録することが可能となり、使いやすさを大きく向上できる。 Furthermore, according to the speech recognition apparatus, when the information content is searched using the paraphrase word as a search condition and the official name is included in the searched information content, it is evaluated that the actual usage of the paraphrase word is high. Even personal information content can be obtained as a search result, a new paraphrase can be registered more quickly, and the usability can be greatly improved.

更にまた、音声認識装置によれば、言い換え語を検索条件として情報コンテンツの検索を行い、情報コンテンツのタイトル部分に正式名称が含まれる場合に、当該言い換え語の実使用度が高いと評価するので、言い換え語を登録する精度を極めて高くすることができる。 Furthermore, according to the speech recognition apparatus, when the information content is searched using the paraphrase word as a search condition, and the official name is included in the title portion of the information content, it is evaluated that the actual usage of the paraphrase word is high. The accuracy of registering paraphrased words can be made extremely high.

更にまた、音声認識装置によれば、正式名称のカテゴリが地点名称である場合に、情報コンテンツの検索条件に当該地点名称の情報コンテンツを含めて検索するので、誤検索を少なくでき、誤った言い換え語の登録を避けることができる。 Furthermore, according to the speech recognition apparatus, when the category of the official name is a spot name, the search is performed by including the information content of the spot name in the information content search condition. Avoid registering words.

なお、上述の実施の形態は本発明の一例である。このため、本発明は、上述の実施形態に限定されることはなく、この実施の形態以外であっても、本発明に係る技術的思想を逸脱しない範囲であれば、設計等に応じて種々の変更が可能であることは勿論である。 The above-described embodiment is an example of the present invention. For this reason, the present invention is not limited to the above-described embodiment, and various modifications can be made depending on the design and the like as long as the technical idea according to the present invention is not deviated from this embodiment. Of course, it is possible to change.

本発明を適用した第１実施形態に係る音声認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speech recognition apparatus which concerns on 1st Embodiment to which this invention is applied. 本発明を適用した第１実施形態に係る音声認識装置における操作名称データベース、音声認識対象語彙データベースを示す図である。It is a figure which shows the operation name database and speech recognition object vocabulary database in the speech recognition apparatus which concerns on 1st Embodiment to which this invention is applied. 本発明を適用した第１実施形態に係る音声認識装置における画面遷移を説明する図であり、（ａ）は音声メニュー画像、（ｂ）は行き先選択メニュー画像、（ｃ）は施設名入力画像、（ｄ）は音声認識結果画像である。It is a figure explaining the screen transition in the speech recognition device concerning a 1st embodiment to which the present invention is applied, (a) is a voice menu image, (b) is a destination selection menu image, (c) is a facility name input image, (D) is a speech recognition result image. 本発明を適用した第１実施形態に係る音声認識装置において、言い換え語が登録されていない場合の動作を示す図である。It is a figure which shows operation | movement when the paraphrase word is not registered in the speech recognition apparatus which concerns on 1st Embodiment to which this invention is applied. 本発明を適用した第１実施形態に係る音声認識装置において、言い換え語が登録されている場合の動作を示す図である。It is a figure which shows operation | movement when the paraphrase word is registered in the speech recognition apparatus which concerns on 1st Embodiment to which this invention is applied. 本発明を適用した第１実施形態に係る音声認識装置の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the speech recognition apparatus which concerns on 1st Embodiment to which this invention is applied. 本発明を適用した第２実施形態に係る音声認識装置において、言い換え語が登録されていない場合の動作を示す図である。It is a figure which shows operation | movement when the paraphrase word is not registered in the speech recognition apparatus which concerns on 2nd Embodiment to which this invention is applied. 本発明を適用した第２実施形態に係る音声認識装置において、言い換え語が登録されている場合の動作を示す図である。It is a figure which shows operation | movement when the paraphrase word is registered in the speech recognition apparatus which concerns on 2nd Embodiment to which this invention is applied. 本発明を適用した第２実施形態に係る音声認識装置による音声認識時の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence at the time of the speech recognition by the speech recognition apparatus which concerns on 2nd Embodiment to which this invention is applied. 本発明を適用した第２実施形態に係る音声認識装置による言い換え語の登録時の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence at the time of registration of the paraphrase word by the speech recognition apparatus which concerns on 2nd Embodiment to which this invention is applied. 本発明を適用した第２実施形態に係る音声認識装置における代替入力画面を示す図である。It is a figure which shows the alternative input screen in the speech recognition apparatus which concerns on 2nd Embodiment to which this invention is applied. 本発明を適用した第２実施形態に係る音声認識装置におけるリスト表示画面を示す図である。It is a figure which shows the list display screen in the speech recognition apparatus which concerns on 2nd Embodiment to which this invention is applied. 本発明を適用した第３実施形態に係る音声認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speech recognition apparatus which concerns on 3rd Embodiment to which this invention is applied. 本発明を適用した第３実施形態に係る音声認識装置において、言い換え語が登録されていない場合の動作を示す図である。It is a figure which shows operation | movement when the paraphrase word is not registered in the speech recognition apparatus which concerns on 3rd Embodiment to which this invention is applied. 本発明を適用した第３実施形態に係る音声認識装置において、言い換え語が登録されている場合の動作を示す図である。It is a figure which shows operation | movement when the paraphrase word is registered in the speech recognition apparatus which concerns on 3rd Embodiment to which this invention is applied. 本発明を適用した第３実施形態に係る音声認識装置における代替入力画面を示す図である。It is a figure which shows the alternative input screen in the speech recognition apparatus which concerns on 3rd Embodiment to which this invention is applied. 本発明を適用した第２実施形態に係る音声認識装置におけるリスト表示画面を示す図である。It is a figure which shows the list display screen in the speech recognition apparatus which concerns on 2nd Embodiment to which this invention is applied. 本発明を適用した第３実施形態に係る音声認識装置による音声認識時の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence at the time of the speech recognition by the speech recognition apparatus which concerns on 3rd Embodiment to which this invention is applied.

Explanation of symbols

１信号処理ユニット
２マイク
３タッチパネルディスプレイ
４スピーカ
５入力装置
５ａ発話スイッチ
５ｂ訂正スイッチ
１１信号処理装置
１２Ａ／Ｄコンバータ
１３Ｄ／Ａコンバータ
１４アンプ
１５外部記憶装置
２１ＣＰＵ
２２メモリ
３１，３２操作名称データベース
３３音声認識対象語彙データベース
４１音声メニュー画像
４２先選択メニュー画像
４３施設名入力画像
４４音声認識結果画像
４５音声認識結果表示欄
４６メニューリスト
４７，６３「そこへ行く」ボタン
４８，６４「地図を見る」ボタン
５１代替入力画面
５２カテゴリー情報
５３正式名称入力欄
５４文字入力ボタン
５５リスト表示ボタン
６１リスト表示画面
６２リスト表示
７０通信装置 DESCRIPTION OF SYMBOLS 1 Signal processing unit 2 Microphone 3 Touch panel display 4 Speaker 5 Input device 5a Speech switch 5b Correction switch 11 Signal processing device 12 A / D converter 13 D / A converter 14 Amplifier 15 External storage device 21 CPU
22 Memory 31, 32 Operation name database 33 Voice recognition target vocabulary database 41 Voice menu image 42 Pre-selected menu image 43 Facility name input image 44 Voice recognition result image 45 Voice recognition result display column 46 Menu list 47, 63 “Go there” Buttons 48 and 64 “View Map” Button 51 Alternative Input Screen 52 Category Information 53 Formal Name Input Field 54 Character Input Button 55 List Display Button 61 List Display Screen 62 List Display 70 Communication Device

Claims

A speech recognition device comprising speech recognition means for recognizing speech emitted from a user,
Storage means for storing the official name as a speech recognition target vocabulary;
Paraphrase word generating means for generating a paraphrase word from the formal name stored in the storage means;
Actual usage evaluation means for evaluating the actual usage of the paraphrase generated by the paraphrase generation means;
Registration means for registering the paraphrase word generated by the paraphrase word generation means in the storage means as a speech recognition target vocabulary;
The said registration means registers only the paraphrase word evaluated that the actual usage is high by the said practicality evaluation means as a speech recognition object vocabulary, The speech recognition apparatus characterized by the above-mentioned.

A correction instruction means for inputting an instruction to correct the voice recognition result generated by the voice recognition means for the voice emitted from the user;
The actual usage evaluation means temporarily stores the first voice recognition result when an instruction to correct the first voice recognition result is input by the correction instruction means, and then the voice recognition means When the generated second speech recognition result is determined without being corrected by the correction instruction means, the first speech recognition result is used as a paraphrase for the second speech recognition result. The speech recognition apparatus according to claim 1, wherein the speech recognition apparatus is evaluated as being high.

A use intention judging means for judging the use intention of the paraphrase word from the operation amount of the correction instruction means of the user;
The speech recognition apparatus according to claim 2, wherein the actual usage evaluation unit evaluates a paraphrase that has been determined to have a high use intention by the use intention determination unit as a paraphrase having a high actual usage.

When the paraphrase word is registered, the registration means obtains a rule in which the paraphrase word is generated from the formal name, and converts the paraphrase word of another formal name classified into the same category as the paraphrase word. The speech recognition apparatus according to claim 1, wherein registration is performed according to the following.

It further comprises an alternative input means for inputting a formal name by a user operation instead of voice,
When the actual usage evaluation means inputs a formal name different from the voice recognition result of the voice recognition means by the alternative input means, the voice recognition result of the voice recognition means is input by the alternative input means. The speech recognition apparatus according to claim 1, wherein the speech recognition apparatus evaluates that the actual usage is high as a paraphrase of the official name.

The actual usage evaluation means collates the initial utterance of the user with the paraphrase generated by the paraphrase generating means in the speech recognition result of the speech recognition means, and the user's initial utterance and the paraphrase having a high likelihood. The speech recognition apparatus according to claim 1, wherein the speech recognition apparatus evaluates that the word usage is high.

A communication means for searching for information content included in the network;
If the paraphrase word is included in a plurality of information contents as a result of searching the information content for the paraphrase word generated by the paraphrase word generation unit by the communication unit, The speech recognition apparatus according to claim 1, wherein the speech recognition apparatus evaluates that the word usage is high.

The actual usage evaluation unit determines that the actual usage of the paraphrase word is high when the number of information contents including the paraphrase word generated by the paraphrase word generation unit is equal to or greater than a predetermined value. The speech recognition apparatus according to claim 7, wherein

The actual usage evaluation means, when both the formal name and the paraphrase generated by the paraphrase generation means from the official name co-occur in the information content, the actual use of the paraphrase word The speech recognition apparatus according to claim 7, wherein the speech recognition apparatus evaluates that the degree is high.

The actual usage evaluation unit causes the information content to be searched by using the paraphrase word generated by the paraphrase word generation unit as a search condition by the communication unit, and the formal name is included in the information content searched by the communication unit The speech recognition apparatus according to claim 7, wherein, when it is determined, the actual usage of the paraphrase word is evaluated as being high.

The actual usage evaluation unit causes the information content to be searched by using the paraphrase word generated by the paraphrase word generation unit by the communication unit as a search condition, and the title part of the information content searched by the communication unit is officially displayed. The speech recognition apparatus according to claim 7, wherein when the name is included, it is determined that the actual usage of the paraphrase is high.

8. The speech recognition apparatus according to claim 7, wherein, when the category of the official name is a spot name, the communication unit performs a search by including position information of the spot name in the search condition of the information content. .

The speech recognition apparatus according to claim 1, wherein the paraphrase word registered by the registration unit is presented to the user that it can be used.