JP2000029490A

JP2000029490A - Word dictionary data building method for voice recognition apparatus, voice recognition apparatus, and navigation system

Info

Publication number: JP2000029490A
Application number: JP10200562A
Authority: JP
Inventors: Toshifumi Kato; 利文加藤
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 1998-07-15
Filing date: 1998-07-15
Publication date: 2000-01-28

Abstract

PROBLEM TO BE SOLVED: To embody the dictionary memory for recognition words necessary for voice input to a voice recognition apparatus for instructing and inputting one kind of command while permitting the substantially the same and plural input forms to this voice input. SOLUTION: To give a command for enlarging a map, a user is able to give the same command even with any pattern of, for example, 'please enlarge the map', 'please make the map larger', 'zoom the map', 'please enlarge' or 'zoom'. Namely, the specification corresponding to the same command of 'enlarge the map' is made possible even if the ways of speaking are mutually different. The user is thus able to more easily use the apparatus. There is nevertheless no need for storing the plural ways of speaking as discretely corresponding word dictionary data and, for example, the building of the common words, such as, for example, 'the map', 'enlarge', 'zoom' or 'please' as the common word dictionary date is possible.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、例えばナビゲーシ
ョンシステムにおける目的地の設定などを音声によって
入力できるようにする場合などに有効な音声認識装置及
びその音声認識装置を備えたナビゲーションシステムに
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus which is effective for, for example, enabling a destination setting or the like in a navigation system to be input by speech, and a navigation system having the speech recognition apparatus.

【０００２】[0002]

【従来の技術及び発明が解決しようとする課題】従来よ
り、入力された音声を予め記憶されている複数の比較対
象パターン候補と比較し、一致度合の高いものを認識結
果とする音声認識装置が既に実用化されており、例えば
ナビゲーションシステムにおいて表示されている地図の
拡大や縮小の指示、経路の設定など、所定のコマンドを
利用者が音声で指示入力するためなどに用いられてい
る。特に車載ナビゲーションシステムを運転手自身が利
用する場合、音声入力であればボタン操作や画面注視が
伴わないため、車両の走行中に行っても安全性が高いた
め有効である。2. Description of the Related Art Conventionally, there has been proposed a speech recognition apparatus which compares an inputted speech with a plurality of pattern candidates to be compared which are stored in advance and determines a speech having a high degree of coincidence as a recognition result. It has already been put into practical use, and is used, for example, by a user to input a predetermined command by voice such as an instruction to enlarge or reduce a map displayed in a navigation system, a setting of a route, and the like. In particular, when the driver uses the in-vehicle navigation system, voice input does not involve button operation or screen gaze, so that it is effective because the safety is high even when the vehicle is running.

【０００３】例えば、上述した地図拡大というコマンド
を指示入力するために「地図を拡大して下さい」という
音声入力が必要な場合を考える。この場合に、「地
図」、「を」、「拡大」、「して」、「下さい」という
ように単語を基本とした所定単位毎に区切って発音しな
くてはならないとすると利用者にとって煩わしいので、
ひと続きで入力（一括入力）できるようにすることが好
ましい。但し、このように一括入力に対応する構成とし
た場合には、逆に利用者にとって使い勝手が悪くなる状
況も想定される。それは、利用者が「地図を拡大して下
さい」という特定の音声入力を正確に行わないと所望の
コマンドの入力であると認識されない可能性があるとい
うことである。For example, consider a case where a voice input of "Please enlarge the map" is required in order to instruct and input the above-described command of map enlargement. In this case, it is troublesome for the user that the pronunciation must be divided into predetermined units based on words, such as "map", "wo", "enlarge", "do", "shi". So
It is preferable that input can be performed continuously (collective input). However, in the case where the configuration corresponding to the batch input is used, a situation in which the usability is degraded for the user may be assumed. That is, if the user does not make a specific voice input of “enlarge the map” correctly, it may not be recognized as the input of a desired command.

【０００４】そこで、実質的に同一のコマンドを示す複
数の音声入力を許容することが考えられる。例えば、地
図拡大というコマンドを指示入力するために、上述の
「地図を拡大して下さい」という認識単語の他に、「地
図を大きくして下さい」、「地図をズームして下さ
い」、「地図を拡大」、「地図大きくして下さい」、
「ズーム」というように、同義語に言い換えたり、省略
したりすることで多くのバリエーションが考えられる。Therefore, it is conceivable to allow a plurality of voice inputs indicating substantially the same command. For example, in order to input a command to enlarge the map, in addition to the recognition word "Enlarge the map" described above, "Enlarge the map", "Zoom the map", "Map "Enlarge", "enlarge the map",
Many variations are conceivable by paraphrasing or omitting them, such as “zoom”.

【０００５】しかしながら、１種類のコマンドにつき、
これら多くのバリエーションについて全て認識単語とし
て準備しなくてはならないとすると、コマンドの種類が
増えるにつれ、認識単語の辞書メモリが大量に必要とな
ってくる。本発明は、このような問題を解決し、１種類
のコマンドを指示入力するための音声認識装置への音声
入力については実質的に同一の複数の入力態様を許容し
ながら、そのために必要な認識単語の辞書メモリをより
少ない容量にて実現することを目的とする。However, for one type of command,
If it is necessary to prepare all of these many variations as recognition words, as the types of commands increase, a large amount of dictionary memory for recognition words is required. SUMMARY OF THE INVENTION The present invention solves such a problem, and allows substantially the same plurality of input modes for a voice input to a voice recognition device for instructing and inputting one type of command, while performing the necessary recognition. An object of the present invention is to realize a dictionary memory for words with a smaller capacity.

【０００６】[0006]

【課題を解決するための手段及び発明の効果】上記目的
を達成するためになされた請求項１に記載の音声認識装
置用の単語辞書データ構築方法は、次のような音声認識
装置において用いられる。つまり、単語毎に対応する比
較対象パターンを単語辞書データとして記憶しており、
例えばマイクなどを介して入力された音声を単語辞書デ
ータの比較対象パターンと比較して一致度合の高いもの
を認識結果とし、実質的に同一内容を示す複数の認識結
果については同一のコマンドに対応するものとして特定
し、その特定されたコマンドを外部装置へ出力する音声
認識装置である。そして、次の（１）〜（３）に示す手
順によって単語辞書データを構築する。Means for Solving the Problems and Effects of the Invention The word dictionary data construction method for a speech recognition device according to the first aspect of the present invention, which has been made to achieve the above object, is used in the following speech recognition device. . That is, the comparison target pattern corresponding to each word is stored as word dictionary data,
For example, a voice input through a microphone or the like is compared with a pattern to be compared in the word dictionary data, and a result having a high degree of matching is regarded as a recognition result, and a plurality of recognition results having substantially the same content correspond to the same command. This is a speech recognition device that specifies a command to be executed and outputs the specified command to an external device. Then, word dictionary data is constructed according to the following procedures (1) to (3).

【０００７】（１）まず、コマンド毎に、コマンドの対
象物あるいは対象機能を示すコマンド対象を特定する単
語、目的格の助詞である「を」、コマンドの動作内容を
示すコマンド動作を特定する単語、コマンド動作を特定
する単語の後に言語習慣上付加される動詞を順に接続し
て構成される標準の比較対象パターンを設定する。（２）次に、標準の比較対象パターンに対して、次の
〜の少なくとも１つ以上の条件を満たすものを、実質
的に同一内容を示す代替いずれかの条件を満たすもの
を、実質的に同一内容を示す代替の比較対象パターンと
して設定する。(1) First, for each command, a word specifying a command object indicating a command object or a target function, a word "wo" as a target particle, and a word specifying a command operation indicating a command operation content Then, a standard comparison target pattern configured by sequentially connecting verbs added linguistically after the word specifying the command operation is set. (2) Next, a pattern that satisfies at least one or more of the following conditions with respect to a standard comparison target pattern, and a pattern that satisfies any of the alternative conditions having substantially the same contents are Set as an alternative comparison pattern indicating the same content.

【０００８】コマンド対象あるいはコマンド動作を特
定する単語について同義語が使用されている。コマンド動作を特定する単語の後に言語習慣上付加さ
れる動詞についてその活用形が使用されている。[0008] Synonyms are used for words that specify a command target or a command action. The inflected form is used for verbs added in a language custom after a word specifying a command action.

【０００９】目的格の助詞である「を」が省略されて
いる。コマンド動作を特定可能な単語の後に言語習慣上付加
される動詞が省略されている。コマンド動作を特定可能な単語の後に言語習慣上付加
される動詞が「して下さい」の場合に、「下さい」だけ
が省略されている。（３）そして、標準の比較対象パターン及び代替の比較
対象パターンを、同じ音節データを持つ場合には同じ親
を持つように割り付けられた木構造を基本とするが頂点
へ到達する通路の個数は必ずしも１ではない木構造類似
の有向グラフ形式に対応するよう、前記比較対象パター
ンを構成する音節データ及び単語終了を示す識別データ
を先行順走査にしたがって各頂点に割り付ける。[0009] The object case particle "o" is omitted. The verb added in the language custom after the word that can specify the command operation is omitted. When the verb added in the language custom after the word that can specify the command operation is “Please”, only “Please” is omitted. (3) If the standard comparison target pattern and the alternative comparison target pattern have the same syllable data, they are based on a tree structure allocated to have the same parent, but the number of paths reaching the vertex is In order to correspond to a directed graph format similar to a tree structure which is not necessarily 1, syllable data constituting the pattern to be compared and identification data indicating the end of a word are assigned to respective vertices in accordance with a preceding sequential scan.

【００１０】この上記（１）〜（３）にて説明した構築
方法について、具体例を参照してさらに説明する。ここ
では、コマンドとして「地図拡大」を例に取る。上記
（１）にて設定される標準の比較対象パターンの一例と
しては、「地図を拡大して下さい」が考えられる。つま
り、コマンド「地図拡大」の対象物あるいは対象機能を
示すコマンド対象を特定する単語としては「地図」であ
り、コマンドの動作内容を示すコマンド動作を特定する
単語としては「拡大」である。そして、コマンド動作を
特定する単語「拡大」の後に言語習慣上付加される動詞
としては複合動詞「して下さい」が挙げられる。したが
って、格助詞「を」を加えて「地図を拡大して下さい」
が、標準の比較対象パターンとして設定される。The construction method described in the above (1) to (3) will be further described with reference to specific examples. Here, "map enlargement" is taken as an example of the command. As an example of the standard comparison target pattern set in the above (1), "enlarge the map" can be considered. In other words, the word specifying the command or the command target indicating the target function of the command “map enlargement” is “map”, and the word specifying the command operation indicating the operation content of the command is “magnify”. As a verb added in the language custom after the word “enlargement” that specifies the command action, a compound verb “Please” is given. Therefore, add the case particle "wo" and "enlarge the map"
Is set as a standard comparison target pattern.

【００１１】そして、上記（２）では、標準の比較対象
パターン「地図を拡大して下さい」に対して実質的に同
一内容を示す代替の比較対象パターンが設定される。条
件としては、コマンド対象を特定する単語「地図」に
ついての同義語として「マップ」が考えられる。また、
コマンド動作を特定する単語「拡大」についての同義語
としては、「大きく」や「ズーム」などが考えられる。In the above (2), an alternative comparison pattern showing substantially the same content as the standard comparison pattern "Enlarge the map" is set. As a condition, “map” is considered as a synonym for the word “map” that specifies the command target. Also,
As a synonym for the word “enlarge” that specifies the command operation, “large”, “zoom”, and the like can be considered.

【００１２】条件としては、「拡大」の後に「して下
さい」が付加される場合、動詞「する」の連用形「し」
が用いられているので、その活用形としては終止形の
「する」や命令形の「しろ」が考えられる。条件は
「を」を省略するので、「地図を拡大して下さい」に対
して「地図拡大して下さい」などが該当する。As a condition, when "please" is added after "expansion", the verb "do" is a continuous form of "do"
Is used, and its utilization form may be the end form of "do" or the imperative form of "shiro". Since the condition "o" is omitted, "enlarge the map" corresponds to "enlarge the map".

【００１３】条件は、「拡大」の後の「して下さい」
などが省略される。したがって、例えば「地図を拡大し
て下さい」に対して「地図を拡大」が該当する。条件
は、「して下さい」の「下さい」だけ省略される。した
がって、例えば「地図を拡大して下さい」に対して「地
図を拡大して」が該当するなお、〜の条件の２つ以
上を満たす場合も当然該当する。したがって、例えば省
略を規定する条件とを両方満たす場合には、「地図
を拡大して下さい」に対して「地図拡大」だけとなる。The condition is "please" after "enlarge"
Are omitted. Therefore, for example, “enlarge the map” corresponds to “enlarge the map”. As for the condition, only "please" of "please" is omitted. Therefore, for example, "enlarge the map" corresponds to "enlarge the map". Naturally, the case where two or more of the following conditions are satisfied is also applicable. Therefore, for example, when both of the conditions for omitting the omission are satisfied, only “map enlargement” is performed for “enlarge map”.

【００１４】そして、このように標準の比較対象パター
ン及び代替の比較対象パターンが設定されると、（３）
において単語辞書データを構築する。つまり、同じ音節
データを持つ場合には同じ親を持つように割り付けられ
た木構造を基本とするが頂点へ到達する通路の個数は必
ずしも１ではない木構造類似の有向グラフ形式に対応す
るよう、比較対象パターンを構成する音節データ及び単
語終了を示す識別データを先行順走査にしたがって各頂
点に割り付ける。When the standard comparison target pattern and the alternative comparison target pattern are set as described above, (3)
Constructs word dictionary data. In other words, when having the same syllable data, the tree structure is allocated so as to have the same parent, but the number of paths reaching the vertices is not necessarily one. The syllable data constituting the target pattern and the identification data indicating the end of the word are assigned to each vertex according to the preceding forward scanning.

【００１５】なお、「木構造類似の有向グラフ形式」と
は、基本的には木構造に準じているが、頂点へ到達する
通路の個数は必ずしも１ではなく複数でもよい、つまり
一旦分岐した頂点がその後「合流」することを許してい
るので、このように呼ぶこととする。The "directed graph format similar to a tree structure" basically conforms to a tree structure. However, the number of paths reaching a vertex is not necessarily one but may be plural. After that, they are allowed to "merge", so they will be called like this.

【００１６】そして、各頂点に割り付けられた音節デー
タ及び単語終了を示す識別データを持たせることで、木
構造でいうところの親子の順番で各音節データをつなげ
ていけば単語データとなる。例えば、標準の比較対象パ
ターンを上述の「地図を拡大して下さい」としてその代
替の比較対象パターンを全て含む単語辞書データとして
は、図３に示すような構成のものが構築される。By providing syllable data assigned to each vertex and identification data indicating the end of a word, word data is obtained by connecting the syllable data in the order of parent and child in a tree structure. For example, the standard comparison target pattern is set to the above-mentioned “enlarge the map” and word dictionary data including all the alternative comparison target patterns is constructed as shown in FIG.

【００１７】このように単語辞書データを構築すれば、
この単語辞書データを用いた音声認識装置では、例えば
「地図を拡大して下さい」、「地図を大きくして下さ
い」、「地図をズームして下さい」、「地図を拡大し
て」、「地図大きく」、「ズーム」というような言い回
しが異なるが実質的には同一内容を示す言葉での音声入
力に対して、「地図拡大」という同じコマンドに対応す
るものとして特定でき、利用者にとって使い勝手が良く
なる。それでいながら、それら複数の言い回しについ
て、個別に対応する単語辞書データとして記憶する必要
がなく、例えば「地図」、「拡大」、「大きく」、「ズ
ーム」あるいは「して下さい」などの共通する単語につ
いては共用した単語辞書データとして構築することがで
きるので、単語辞書データ用のメモリをより少ない容量
にて実現することができる。By constructing word dictionary data in this way,
In a speech recognition device using this word dictionary data, for example, "enlarge the map", "enlarge the map", "zoom the map", "enlarge the map", "map The wording of words such as "Large" and "Zoom" is different, but it can be specified as corresponding to the same command of "Map enlargement" for voice input with words that indicate substantially the same content, making it easier for users to use. Get better. Nevertheless, there is no need to store these multiple phrases as individually corresponding word dictionary data, such as "map", "enlarge", "large", "zoom" or "please". Since words can be constructed as shared word dictionary data, a memory for word dictionary data can be realized with a smaller capacity.

【００１８】ところで、前記（２）の、標準の比較対象
パターンに対して実質的に同一内容を示す代替の比較対
象パターンとして設定すべきものとして許容される条件
においては、コマンドが例えば「地図拡大」の場合に、
コマンド対象である「地図」とコマンド動作である「拡
大」については、同義語の使用を許すだけで省略は条件
としなかった。これは、コマンドの対象や動作が判らな
いとコマンドとして意味をなされないからである。Under the condition (2) that is permitted to be set as an alternative comparison pattern having substantially the same contents as the standard comparison pattern, the command may be, for example, “map enlargement”. In the case of
As for the command "map" and the command operation "magnify", the use of synonyms was only permitted, and the omission was not a condition. This is because if the object and operation of the command are not known, the command is meaningless.

【００１９】しかし、コマンド対象については、何等か
の方法で特定できれば省略しても構わない。つまり、コ
マンドの対象として地図が特定されるような状況であれ
ば、「拡大」とだけ入力すれば、それは地図拡大である
ことが判る。したがって、請求項２に示すように、前記
〜の条件に加えての条件、すなわち「前記コマン
ド対象が省略されている」という条件を加えた６つの条
件の内の少なくとも１つ以上の条件を満たすものを、標
準の比較対象パターンに対して実質的に同一内容を示す
代替の比較対象パターンとして設定すべきものとしても
よい。However, the command object may be omitted if it can be specified by any method. That is, in a situation where a map is specified as a target of a command, if only "enlargement" is input, it can be understood that it is map enlargement. Therefore, as set forth in claim 2, at least one or more of the six conditions including the condition in addition to the above conditions, that is, the condition that "the command object is omitted" is satisfied. The pattern may be set as an alternative pattern to be compared that shows substantially the same contents as the standard pattern to be compared.

【００２０】これは、例えばコマンドの出力先において
地図に対する何等かの動作を行なうモードとなっている
場合などでは、コマンド対象自体は特定可能なため省略
しても構わないというような状況で成立する。また、請
求項３に記載の発明は、音声を入力するための音声入力
手段と、該音声入力手段を介して入力された音声を、予
め辞書手段に記憶されている複数の比較対象パターン候
補と比較して一致度合の高いものを認識結果とする認識
手段と、該認識手段による認識結果に基づき、実質的に
同一内容を示す複数の認識結果を同一のコマンドに対応
するものとして特定する特定手段と、該特定手段によっ
て特定されたコマンドを出力するコマンド出力手段と、
を備える音声認識装置であって、前記辞書手段に記憶さ
せる単語辞書データが、請求項１に記載の単語辞書デー
タ構築方法によって構築されたものであることを特徴と
する。This holds in a situation where, for example, in a mode in which some operation is performed on the map at the output destination of the command, the command target itself can be specified and may be omitted. . Further, the invention according to claim 3 provides a voice input means for inputting voice, and a voice input via the voice input means, with a plurality of comparison target pattern candidates previously stored in the dictionary means. Recognition means for recognizing a result having a high degree of coincidence as a recognition result, and specifying means for specifying a plurality of recognition results having substantially the same content as those corresponding to the same command based on the recognition result by the recognition means. Command output means for outputting the command specified by the specifying means,
Wherein the word dictionary data stored in the dictionary means is constructed by the word dictionary data construction method according to claim 1.

【００２１】本音声認識装置によれば、利用者が音声入
力手段を介して音声を入力すると、認識手段が、その入
力された音声を予め辞書手段に記憶されている複数の比
較対象パターン候補と比較して一致度合の高いものを認
識結果とし、特定手段は、実質的に同一内容を示す複数
の認識結果を同一のコマンドに対応するものとして特定
し、その特定されたコマンドをコマンド出力手段が出力
する。そして、辞書手段に記憶させる単語辞書データ
は、上述した単語辞書データ構築方法によって構築され
たものである。この場合の効果については、単語辞書デ
ータ構築方法の発明として捉えた場合の効果として上述
したのを同じであるので、ここでは詳しくは繰り返され
ないが、辞書メモリをより少ない容量にて実現すること
ができるため、音声認識装置の小型化等においても有利
である。According to the speech recognition apparatus, when a user inputs a speech through the speech input unit, the recognition unit compares the input speech with a plurality of comparison target pattern candidates stored in the dictionary unit in advance. The result of the comparison is regarded as a recognition result, and the specifying means specifies a plurality of recognition results showing substantially the same content as corresponding to the same command, and the command output means specifies the specified command. Output. The word dictionary data stored in the dictionary means is constructed by the above-described word dictionary data construction method. The effect of this case is the same as the effect described above when the invention is considered as the invention of the word dictionary data construction method. Therefore, although not repeated in detail here, it is necessary to realize the dictionary memory with a smaller capacity. This is advantageous in miniaturization of the speech recognition device.

【００２２】また、コマンド対象が何等かの方法で特定
できれば、そのコマンド対象を省略したものも、実質的
に同一内容を示す代替の比較対象パターンとして設定し
てもよいとは上述した。それに対応するような音声認識
装置としては請求項３に示すような構成が考えられる。
つまり、コマンドの出力先から省略可能なコマンド対象
に関する情報を入力するコマンド対象入力手段を備えて
おり、辞書手段に記憶させる単語辞書データが、請求項
２に記載の単語辞書構築方法によって構築されたもので
あることを特徴とする音声認識装置である。Also, as described above, if the command target can be specified by any method, the command target omitted may be set as an alternative comparison target pattern having substantially the same contents. As a voice recognition device corresponding thereto, a configuration as described in claim 3 can be considered.
In other words, there is provided command target input means for inputting information about a command target that can be omitted from a command output destination, and word dictionary data to be stored in the dictionary means is constructed by the word dictionary construction method according to claim 2. A speech recognition apparatus characterized in that it is a speech recognition apparatus.

【００２３】また、請求項３に記載の音声認識装置をナ
ビゲーションシステム用として用いる場合には、請求項
５に示すように構成することが考えられる。つまり、請
求項３に記載の音声認識装置と、ナビゲーション装置と
を備え、前記音声認識装置の前記音声入力手段は、少な
くとも前記ナビゲーション装置がナビゲート処理をする
上で指定される必要のある所定のナビゲート処理関連コ
マンドの指示を利用者が音声にて入力するために用いら
れるものであり、前記コマンド出力手段は、前記特定さ
れたコマンドを前記ナビゲーション装置に出力するよう
構成されていることを特徴とするナビゲーションシステ
ムである。In the case where the voice recognition device according to the third aspect is used for a navigation system, it is conceivable that the voice recognition device is configured as shown in the fifth aspect. In other words, the voice recognition device according to claim 3 and a navigation device are provided, and the voice input unit of the voice recognition device is a predetermined voice that needs to be specified at least when the navigation device performs a navigation process. A command used by a user to input an instruction of a navigation processing-related command by voice, wherein the command output unit is configured to output the specified command to the navigation device. It is a navigation system.

【００２４】この場合の「所定のナビゲート処理関連コ
マンド」としては、表示している地図の拡大、縮小ある
いは移動を指示するコマンドや、経路の設定、目的地の
設定などが挙げられる。この場合には、例えばコマンド
が「地図縮小」であれば、「地図を縮小して下さい」に
対して「地図を小さくして下さい」などのバリエーショ
ンが考えられ、コマンドが「地図移動」であれば、「地
図を移動して下さい」に対して「地図ムーブ」などのバ
リエーションが考えられる。また、コマンドが「経路設
定」であれば、「経路を設定して下さい」に対して「ル
ート設定」などのバリエーションが考えられる。In this case, the "predetermined navigation processing-related command" includes a command for instructing enlargement, reduction or movement of the displayed map, setting of a route, setting of a destination, and the like. In this case, for example, if the command is "map reduction", variations such as "please reduce the map" to "please reduce the map" can be considered, and if the command is "map movement" For example, a variation such as “map move” for “please move the map” is conceivable. If the command is “route setting”, variations such as “route setting” to “please set the route” are conceivable.

【００２５】一方、請求項４に記載の音声認識装置をナ
ビゲーションシステム用として用いる場合には、請求項
６に示すように構成することが考えられる。つまり、請
求項４に記載の音声認識装置と、ナビゲーション装置と
を備え、前記ナビゲーション装置は、現在実行中のナビ
ゲート処理に基づいて、省略可能なコマンド対象に関す
る情報を前記音声認識装置へ出力するコマンド対象出力
手段を備え、前記音声認識装置の前記音声入力手段は、
少なくとも前記ナビゲーション装置がナビゲート処理を
する上で指定される必要のある所定のナビゲート処理関
連コマンドの指示を利用者が音声にて入力するために用
いられるものであり、前記コマンド出力手段は、前記特
定されたコマンドを前記ナビゲーション装置に出力する
よう構成されていることを特徴とするナビゲーションシ
ステムである。On the other hand, when the voice recognition device according to the fourth aspect is used for a navigation system, it is conceivable to configure as shown in the sixth aspect. In other words, the apparatus includes the voice recognition device according to claim 4 and a navigation device, and the navigation device outputs information on an omissible command target to the voice recognition device based on the currently executed navigation process. Command output means, the voice input means of the voice recognition device,
At least the navigation device is used for a user to input an instruction of a predetermined navigation process related command that needs to be specified in performing the navigation process by voice, and the command output unit is A navigation system configured to output the specified command to the navigation device.

【００２６】コマンドの出力先はナビゲーション装置で
あり、このナビゲーション装置において地図を表示して
のナビゲート処理を行っており、例えば地図を拡大した
り縮小したりあるいは移動下りすることができるモード
となっていれば、省略可能なコマンド対象として「地
図」を音声認識装置へ出力する。この場合には、コマン
ド対象である「地図」が特定されずに「拡大」とだけ音
声入力があったとしてもそれに対応できるようにするこ
とができる。The output destination of the command is a navigation device, and the navigation device performs a navigation process by displaying a map. For example, the mode is such that the map can be enlarged, reduced, or moved down. If so, "map" is output to the speech recognition device as an optional command object. In this case, even if there is a voice input of "enlargement" without specifying the "map" as the command target, it is possible to cope with the input.

【００２７】なお、音声認識装置の適用先としては、上
述したナビゲーションシステムには限定されない。例え
ば音声認識装置を空調システム用として用いる場合に
は、設定温度の調整、空調モード（冷房・暖房・ドラ
イ）の選択、あるいは風向モードの選択を音声入力によ
って行うようにすることが考えられる。The application destination of the speech recognition apparatus is not limited to the above-described navigation system. For example, when the voice recognition device is used for an air conditioning system, it is conceivable to adjust the set temperature, select an air conditioning mode (cooling / heating / dry), or select a wind direction mode by voice input.

【００２８】例えば設定温度について言えば、「設定温
度を２５度にして下さい」に対して「設定温度を２５度
にして」、「設定温度２５度」という言い回しや、ある
いは「２５度に設定して下さい」という言い回し、さら
には単に「２５度」という言い回しも許容することが好
ましい。空調モードや風向モードなどについても同様で
ある。For example, as for the set temperature, the phrase "set the set temperature to 25 degrees" or "set temperature 25 degrees" or "set the temperature to 25 degrees" or "set the temperature to 25 degrees" is used. It is preferable to allow the phrase "please", and even simply the phrase "25 degrees". The same applies to the air conditioning mode and the wind direction mode.

【００２９】なお、上述のナビゲーションシステム及び
空調システムは、車載機器として用いられる場合だけで
はなく、例えば携帯型ナビゲーション装置や屋内用空調
装置などでもよい。但し、車載機器用として用いる場合
には、車両に搭載する上で音声認識装置の小型化がより
有効となる。もちろん、このような視点で考えるなら
ば、ナビゲーションシステムや空調システム以外の車載
機器に対しても同様に利用することができる。例えば、
カーオーディオ機器などは有効である。また、いわゆる
パワーウインドウの開閉やミラー角度の調整などを音声
によって指示するような構成を考えれば、そのような状
況でも有効である。The above navigation system and air conditioning system are not limited to the case where they are used as on-vehicle equipment, but may be, for example, a portable navigation device or an indoor air conditioner. However, when it is used for an in-vehicle device, downsizing of the voice recognition device is more effective when mounted on a vehicle. Of course, from such a viewpoint, the present invention can be similarly applied to in-vehicle devices other than the navigation system and the air conditioning system. For example,
Car audio equipment is effective. Also, considering a configuration in which opening and closing of the power window and adjustment of the mirror angle are instructed by voice, it is effective even in such a situation.

【００３０】また、車載機器用とした場合にはそれ特有
の利点があることは述べたが、本発明の音声認識装置の
適用先としては、利用者による音声入力指示にしたがっ
て所定の処理を実行するものであれば同様に考えられ
る。例えば、携帯用の情報端末装置、あるいは街頭やパ
ーキングエリアなどに設定される情報端末装置などにも
同様に適用できる。Although it has been described that there is an advantage peculiar to the in-vehicle device, the voice recognition device of the present invention is applied to a device which executes a predetermined process in accordance with a voice input instruction from a user. If it does, the same can be considered. For example, the present invention can be similarly applied to a portable information terminal device or an information terminal device set in a street or a parking area.

【００３１】[0031]

【発明の実施の形態】以下、本発明が適用された実施例
について図面を用いて説明する。なお、本発明の実施の
形態は、下記の実施例に何ら限定されることなく、本発
明の技術的範囲に属する限り、種々の形態を採り得るこ
とは言うまでもない。Embodiments of the present invention will be described below with reference to the drawings. It is needless to say that the embodiments of the present invention are not limited to the following examples, and can take various forms as long as they belong to the technical scope of the present invention.

【００３２】図１は本発明の一実施例の音声認識装置３
０を適用したカーナビゲーションシステム２の概略構成
を示すブロック図である。本カーナビゲーションシステ
ム２は、位置検出器４、地図データ入力器６、操作スイ
ッチ群８、これらに接続された制御回路１０、制御回路
１０に接続された外部メモリ１２、表示装置１４及びリ
モコンセンサ１５及び音声認識装置３０を備えている。
なお制御回路１０は通常のコンピュータとして構成され
ており、内部には、周知のＣＰＵ、ＲＯＭ、ＲＡＭ、Ｉ
／Ｏ及びこれらの構成を接続するバスラインが備えられ
ている。FIG. 1 shows a speech recognition apparatus 3 according to an embodiment of the present invention.
FIG. 1 is a block diagram illustrating a schematic configuration of a car navigation system 2 to which 0 is applied. The car navigation system 2 includes a position detector 4, a map data input device 6, an operation switch group 8, a control circuit 10 connected thereto, an external memory 12 connected to the control circuit 10, a display device 14, and a remote control sensor 15. And a voice recognition device 30.
The control circuit 10 is configured as a normal computer, and includes a well-known CPU, ROM, RAM,
/ O and bus lines for connecting these components.

【００３３】前記位置検出器４は、ジャイロスコープ１
８、距離センサ２０、及び衛星からの電波に基づいて車
両の位置を検出するＧＰＳ（Global Positioning Syste
m ）のためのＧＰＳ受信機２２を有している。これらの
センサ等１８，２０，２２は各々が性質の異なる誤差を
持っているため、複数のセンサにより、各々補間しなが
ら使用するように構成されている。なお、精度によって
は上述した内の一部で構成してもよく、更に、ステアリ
ングの回転センサ、各転動輪の車輪センサ等を用いても
よい。The position detector 4 includes the gyroscope 1
8. GPS (Global Positioning System) that detects the position of the vehicle based on radio waves from the distance sensor 20 and satellites
m) has a GPS receiver 22. Each of the sensors 18, 20, and 22 has an error having a different property, and is configured to be used while being interpolated by a plurality of sensors. It should be noted that depending on the accuracy, a part of the above-described components may be used, and a rotation sensor for the steering wheel, a wheel sensor for each rolling wheel, or the like may be used.

【００３４】地図データ入力器６は、位置検出の精度向
上のためのいわゆるマップマッチング用データ、地図デ
ータ及び目印データを含む各種データを入力するための
装置である。媒体としては、そのデータ量からＣＤ−Ｒ
ＯＭを用いるのが一般的であるが、ディジタルビデオデ
ィスク（ＤＶＤ）やメメモリカード等の他の媒体を用い
ても良い。The map data input device 6 is a device for inputting various data including so-called map matching data, map data and landmark data for improving the accuracy of position detection. As a medium, CD-R
Although OM is generally used, another medium such as a digital video disk (DVD) or a memory card may be used.

【００３５】表示装置１４はカラー表示装置であり、表
示装置１４の画面には、位置検出器４から入力された車
両現在位置マークと、地図データ入力器６より入力され
た地図データと、更に地図上に表示する誘導経路や後述
する設定地点の目印等の付加データとを重ねて表示する
ことができる。The display device 14 is a color display device. On the screen of the display device 14, the vehicle current position mark input from the position detector 4, the map data input from the map data input device 6, and the map data It is possible to superimpose and display additional data such as a guidance route displayed above and a mark of a set point described later.

【００３６】また、本カーナビゲーションシステム２
は、リモートコントロール端末（以下、リモコンと称す
る。）１５ａを介してリモコンセンサ１５から、あるい
は操作スイッチ群８により目的地の位置を入力すると、
現在位置からその目的地までの最適な経路を自動的に選
択して誘導経路を形成し表示する、いわゆる経路案内機
能も備えている。このような自動的に最適な経路を設定
する手法は、ダイクストラ法等の手法が知られている。
操作スイッチ群８は、例えば、表示装置１４と一体にな
ったタッチスイッチもしくはメカニカルなスイッチ等が
用いられ、各種入力に使用される。The present car navigation system 2
Is input from the remote control sensor 15 via a remote control terminal (hereinafter referred to as a remote control) 15a or the position of the destination by the operation switches 8;
It also has a so-called route guidance function that automatically selects an optimal route from the current position to the destination and forms and displays a guidance route. As a technique for automatically setting the optimum route, a technique such as the Dijkstra method is known.
The operation switch group 8 is, for example, a touch switch integrated with the display device 14 or a mechanical switch, and is used for various inputs.

【００３７】そして、音声認識装置３０は、上記操作ス
イッチ群８あるいはリモコン１５ａが手動操作により目
的地などを指示するために用いられるのに対して、利用
者が音声で入力することによっても同様に目的地などを
指示することができるようにするための装置である。The voice recognition device 30 is used when the operation switch group 8 or the remote controller 15a is used for manually instructing a destination or the like. This is a device that allows the user to specify a destination or the like.

【００３８】この音声認識装置３０は、「認識手段」と
しての音声認識部３１と、「コマンド出力手段」として
の対話制御部３２と、音声合成部３３と、音声入力部３
４と、「音声入力手段」としてのマイク３５と、ＰＴＴ
（Push-To-Talk）スイッチ３６と、スピーカ３７とを備
えている。The speech recognition device 30 includes a speech recognition unit 31 as “recognition means”, a dialog control unit 32 as “command output means”, a speech synthesis unit 33, and a speech input unit 3
4, a microphone 35 as "audio input means", and a PTT
A (Push-To-Talk) switch 36 and a speaker 37 are provided.

【００３９】音声認識部３１は、音声入力部３４から入
力された音声データを、対話制御部３２からの指示によ
り入力音声の認識処理を行い、その認識結果を対話制御
部３２に返す。すなわち、音声入力部３４から取得した
音声データに対し、記憶している辞書データを用いて照
合を行ない、複数の比較対象パターン候補と比較して一
致度の高い上位比較対象パターンを対話制御部３２へ出
力する。入力音声中の単語系列の認識は次のように行な
う。すなわち、音声入力部３４から入力された音声デー
タを順次音響分析して、例えばＬＰＣ分析によって算出
されるケプストラム係数などの音響的特徴量を抽出し、
この音響分析によって得られた音響的特徴量時系列デー
タを得る。そして、周知のＤＰマッチング法、ＨＭＭ
（隠れマルコフモデル）あるいはニューラルネットなど
によって、この時系列データをいくつかの区間に分け、
各区間が辞書データとして格納されたどの単語に対応し
ているかを求める。The voice recognition unit 31 performs a voice recognition process on the voice data input from the voice input unit 34 in accordance with an instruction from the dialog control unit 32, and returns the recognition result to the dialog control unit 32. That is, the voice data acquired from the voice input unit 34 is collated using the stored dictionary data, and a higher-level comparison target pattern having a higher degree of matching as compared with a plurality of comparison target pattern candidates is determined. Output to The recognition of the word sequence in the input speech is performed as follows. That is, the audio data input from the audio input unit 34 is sequentially subjected to acoustic analysis to extract acoustic features such as cepstrum coefficients calculated by LPC analysis, for example.
The acoustic feature time series data obtained by this acoustic analysis is obtained. And the well-known DP matching method, HMM
(Hidden Markov model) or a neural network, this time series data is divided into several sections,
It is determined which word each section corresponds to stored in the dictionary data.

【００４０】対話制御部３２は、その認識結果及び自身
が管理する内部状態から、音声合成部３３への応答音声
の発声指示や、システム自体の処理を実行する制御回路
１０に対して、例えばナビゲート処理において地図を表
示している場合に地図を拡大させたり縮小させたりする
よう指示する処理を実行する。結果として、この音声認
識装置３０を利用すれば、上記操作スイッチ群８あるい
はリモコン１５ａを手動しなくても、音声入力によりナ
ビゲーション装置に対する目的地の指示などが可能とな
るのである。The dialogue control unit 32 sends, for example, a navigation instruction to the control circuit 10 which issues an instruction to generate a response voice to the voice synthesis unit 33 and executes processing of the system itself, based on the recognition result and the internal state managed by itself. When the map is displayed in the gate process, a process for instructing to enlarge or reduce the map is executed. As a result, if the voice recognition device 30 is used, it is possible to instruct a destination to the navigation device by voice input without manually operating the operation switch group 8 or the remote controller 15a.

【００４１】また音声入力部３４は、マイク３５にて取
り込んだ周囲の音声をデジタルデータに変換して音声認
識部３１に出力するものである。本実施例においては、
利用者がＰＴＴスイッチ３６を押しながらマイク３５を
介して音声を入力するという使用方法である。具体的に
は、音声入力部３４はＰＴＴスイッチ３６が押されたか
どうかを判断しており、ＰＴＴスイッチ３６が押されて
いる場合にはマイク３５を介しての音声入力処理を実行
するが、押されていない場合にはその音声入力処理を実
行しないようにしている。したがって、ＰＴＴスイッチ
３６が押されている間にマイク３５を介して入力された
音声データのみが音声認識部３１へ出力されることとな
る。The voice input unit 34 converts the surrounding voice captured by the microphone 35 into digital data and outputs the digital data to the voice recognition unit 31. In this embodiment,
This is a usage method in which a user inputs a voice through the microphone 35 while pressing the PTT switch 36. Specifically, the voice input unit 34 determines whether or not the PTT switch 36 has been pressed. When the PTT switch 36 has been pressed, the voice input unit 34 executes voice input processing via the microphone 35. If not, the voice input process is not executed. Therefore, only the voice data input via the microphone 35 while the PTT switch 36 is pressed is output to the voice recognition unit 31.

【００４２】ここで、音声認識部３１と対話制御部３２
の構成について図２を参照してさらに詳しく説明する。
図２に示すように、音声認識部３１は、「認識手段」と
しての照合部３１ａと「辞書手段」としての辞書部３１
ｂとで構成されており、対話制御部３２は後処理部３２
ａ、通信制御部３２ｂ及び辞書制御部３２ｃで構成され
ている。Here, the voice recognition unit 31 and the dialogue control unit 32
Will be described in more detail with reference to FIG.
As shown in FIG. 2, the voice recognition unit 31 includes a collating unit 31a as a “recognition unit” and a dictionary unit 31 as a “dictionary unit”.
b, and the dialog control unit 32 includes a post-processing unit 32
a, a communication control unit 32b and a dictionary control unit 32c.

【００４３】音声認識部３１においては、照合部３１ａ
が、音声入力部３４から取得した音声データに対し、辞
書部３１ｂ内に記憶されている単語辞書データを用いて
照合を行ない、入力音声中の単語系列を認識する。詳し
くは、音声入力部３４から入力された音声データを順次
音響分析して音響的特徴量（例えばケプストラム）を抽
出し、この音響分析によって得られた音響的特徴量時系
列データを得る。そして、例えば周知のＤＰマッチング
法によって、この時系列データをいくつかの区間に分け
て、各区間が辞書部３１ｂに格納されたどの単語に対応
しているかを求める「認識手段」としての処理を実行す
る。In the voice recognition section 31, a collation section 31a
Performs collation on the voice data obtained from the voice input unit 34 using the word dictionary data stored in the dictionary unit 31b, and recognizes a word sequence in the input voice. Specifically, audio data input from the audio input unit 34 is sequentially subjected to acoustic analysis to extract an acoustic feature (for example, cepstrum), and to obtain acoustic feature time-series data obtained by the acoustic analysis. Then, the time series data is divided into several sections by a well-known DP matching method, for example, and a process as “recognition means” for determining which word stored in the dictionary unit 31b corresponds to each section is performed. Execute.

【００４４】照合部３１ａでの認識結果は対話制御部３
２の後処理部３２ａへ送られる。後処理部３２ａは、例
えば通信制御部３２ｂを介して制御回路１０へデータを
送って所定の処理をするように指示する確定後処理を実
行したり、あるいは音声合成部３３へ音声データを送っ
て発音させるように指示する処理を実行する。The recognition result of the collating unit 31a is transmitted to the dialogue control unit 3
2 is sent to the post-processing unit 32a. The post-processing unit 32a transmits the data to the control circuit 10 via the communication control unit 32b, for example, to execute a post-determination process instructing to perform a predetermined process, or transmits the voice data to the voice synthesis unit 33. A process for instructing to sound is executed.

【００４５】また、対話制御部３２の辞書制御部３２ｃ
は、ナビゲーションシステム２において使用されるコマ
ンド毎に対応した辞書定義書式によって記述された辞書
記述を収めた辞書記述ファイル３２１を備えており、単
語辞書構築部３２２は、この辞書記述ファイル３２１内
の辞書記述ファイルに基づいて単語辞書データを構築す
る。この構築された単語辞書データが辞書部３１ｂに格
納され、上述した音声認識に用いられることとなる。The dictionary control unit 32c of the dialog control unit 32
Is provided with a dictionary description file 321 containing a dictionary description described in a dictionary definition format corresponding to each command used in the navigation system 2. Construct word dictionary data based on the description file. The constructed word dictionary data is stored in the dictionary unit 31b, and is used for the above-described speech recognition.

【００４６】ここで、単語辞書構築部３２２が辞書記述
ファイル３２１内の辞書記述ファイルに基づいて単語辞
書データを構築する手法について詳しく説明する。な
お、理解を容易にするために、ここでは、コマンドとし
て「地図拡大」を例に取って説明する。（１）まず、当該コマンドを利用者が音声にて入力する
と想定される言い回しを、標準の比較対象パターンとし
て設定する。詳しくは、コマンドの対象物あるいは対象
機能を示すコマンド対象を特定する単語、目的格の助詞
である「を」、コマンドの動作内容を示すコマンド動作
を特定する単語、前記コマンド動作を特定する単語の後
に言語習慣上付加される動詞を順に接続して構成される
ものである。コマンド「地図拡大」に対する具体例とし
ては、「地図を拡大して下さい」が考えられる。Here, a method in which the word dictionary construction section 322 constructs word dictionary data based on the dictionary description file in the dictionary description file 321 will be described in detail. In order to facilitate understanding, here, a description will be given taking “map enlargement” as an example of the command. (1) First, a phrase that is assumed that the user inputs the command by voice is set as a standard comparison target pattern. In detail, a word specifying a command target indicating a target object or a target function of a command, a particle of an object case “を”, a word specifying a command operation indicating the operation content of a command, a word specifying the command operation It is constructed by sequentially connecting verbs added later in language custom. As a specific example of the command “enlarge map”, “enlarge map” can be considered.

【００４７】つまり、コマンド「地図拡大」の対象物あ
るいは対象機能を示すコマンド対象を特定する単語とし
ては「地図」であり、コマンドの動作内容を示すコマン
ド動作を特定する単語としては「拡大」である。そし
て、コマンド動作を特定する単語「拡大」の後に言語習
慣上付加される動詞としては複合動詞「して下さい」が
挙げられる。したがって、格助詞「を」を加えて「地図
を拡大して下さい」が、標準の比較対象パターンとして
設定される。（２）そして、上記標準の比較対象パターン「地図を拡
大して下さい」に対し、次の〜の少なくとも１つ以
上の条件を満たすものを、実質的に同一内容を示す代替
の比較対象パターンとして設定する。That is, the word specifying the command or the command target indicating the target function of the command “map enlargement” is “map”, and the word specifying the command operation indicating the operation content of the command is “magnify”. is there. As a verb added in the language custom after the word “enlargement” that specifies the command action, a compound verb “Please” is given. Therefore, "Enlarge the map" by adding the case particle "" is set as a standard comparison target pattern. (2) Then, a pattern that satisfies at least one or more of the following conditions with respect to the standard comparison pattern “enlarge the map” is used as an alternative comparison pattern having substantially the same contents. Set.

【００４８】コマンド対象あるいはコマンド動作を特
定する単語について同義語が使用されている。コマンド動作を特定する単語の後に言語習慣上付加さ
れる動詞についてその活用形が使用されている。A synonym is used for a word that specifies a command target or a command operation. The inflected form is used for verbs added in a language custom after a word specifying a command action.

【００４９】目的格の助詞である「を」が省略されて
いる。コマンド動作を特定可能な単語の後に言語習慣上付加
される動詞が省略されている。コマンド動作を特定可能な単語の後に言語習慣上付加
される動詞が「して下さい」の場合に、「下さい」だけ
が省略されている。The object case particle "" is omitted. The verb added in the language custom after the word that can specify the command operation is omitted. When the verb added in the language custom after the word that can specify the command operation is “Please”, only “Please” is omitted.

【００５０】コマンド対象が省略されている。上記標準の比較対象パターン「地図を拡大して下さい」
に対しての各条件について考えてみる。条件として
は、コマンド動作を特定する単語「拡大」についての同
義語として「大きく」や「ズーム」などが考えられる。
また、条件としては、「拡大」の後に「して下さい」
が付加される場合、動詞「する」の連用形「し」が用い
られているので、その活用形としては終止形の「する」
や命令形の「しろ」が考えられる。さらに、条件は
「を」を省略するので、「地図を拡大して下さい」に対
して「地図拡大して下さい」などが該当する。そして、
条件は、「拡大」の後の「して下さい」などが省略さ
れる。したがって、例えば「地図を拡大して下さい」に
対して「地図を拡大」が該当する。また、条件は、
「して下さい」の「下さい」だけ省略される。したがっ
て、例えば「地図を拡大して下さい」に対して「地図を
拡大して」が該当する。さらに、条件はコマンド対象
を省略するので、「拡大して下さい」などが該当する。
なお、〜の条件の２つ以上を満たす場合も当然該当
する。したがって、例えば省略を規定する条件とを
両方満たす場合には、「地図を拡大して下さい」に対し
て「地図拡大」だけとなる。The command target is omitted. The above standard comparison pattern "Enlarge the map"
Consider each condition for. The condition may be, for example, "large" or "zoom" as a synonym for the word "enlarge" that specifies the command operation.
Also, as a condition, please do after "enlarge"
Is added, the verb "Suru" is used in conjunction with "Shi".
And an imperative form of "shiro". Further, since the condition "o" is omitted, "enlarge the map" corresponds to "enlarge the map". And
As for the condition, "Please" after "Enlargement" is omitted. Therefore, for example, “enlarge the map” corresponds to “enlarge the map”. The condition is
Only "Please" of "Please" is omitted. Therefore, for example, “enlarge the map” corresponds to “enlarge the map”. Further, since the condition is omitted from the command target, "Please enlarge" is applicable.
It should be noted that the case where two or more of the conditions (1) to (4) are satisfied naturally applies. Therefore, for example, when both of the conditions for specifying the omission are satisfied, only “map enlargement” is performed for “enlarge map”.

【００５１】このように、上記標準の比較対象パターン
「地図を拡大して下さい」に対し、〜の少なくとも
１つ以上の条件を満たす代替の比較対象パターンを具体
的に例示すると、次の（ａ）〜（ｔ）の２０のパターン
が考えられる。（ａ）地図を拡大して（ｂ）地図拡大して下さい（ｃ）地図拡大（ｄ）地図を大きくして下さい（ｅ）地図を大きくして（ｆ）地図大きくして下さい（ｇ）地図大きく（ｈ）地図をズームして下さい（ｉ）地図をズームして（ｊ）地図ズームして下さい（ｋ）地図ズーム（ｌ）拡大して下さい（ｍ）拡大して（ｎ）拡大（ｏ）大きくして下さい（ｐ）大きくして（ｑ）大きく（ｒ）ズームして下さい（ｓ）ズームして（ｔ）ズームこのように、標準の比較対象パターン「地図を拡大して
下さい」に対し、（ａ）〜（ｔ）の２０個の代替の比較
対象パターンが設定されるが、これら２１個の単語辞書
データは、１つのコマンド「地図拡大」に対応するもの
であり、これら全てをまとめて記述する辞書記述ファイ
ルは、［地図［を］］（拡大｜大きく｜ズーム）［して
［下さい］］となる。As described above, when the above-mentioned standard comparison target pattern “enlarge the map” is specifically exemplified as an alternative comparison target pattern satisfying at least one or more of the following conditions, the following (a) ) To (t). (A) Enlarge map (b) Enlarge map (c) Enlarge map (d) Enlarge map (e) Enlarge map (f) Enlarge map (g) Map (H) Zoom map (i) Zoom map (j) Zoom map (k) Map zoom (l) Zoom (m) Zoom (n) Zoom (o ) Increase (p) Increase (q) Increase (r) Zoom (s) Zoom (t) Zoom In this way, the standard comparison pattern "Enlarge the map" On the other hand, 20 alternative comparison target patterns (a) to (t) are set, and these 21 word dictionary data correspond to one command “map enlargement”. The dictionary description file to be described collectively is [Map [ (Enlargement | large | zoom) [to [please] becomes].

【００５２】なお、この具体例において、（拡大｜大き
く｜ズーム）における（）は優先順位を示しており、｜
は選択肢を示す。つまり、（拡大｜大きく｜ズーム）は
「拡大」、「大きく」あるいは「ズーム」のいずれか一
つを選択することとなり、さらに「ズーム」よりも「大
きく」が優先し、「大きく」よりも「拡大」が優先す
る。また、［］は省略可を示しており、［地図［を］］
であれば、「を」だけを省略して「地図」とすることも
できるし、「地図を」を全て省略することもできる。（３）上述したように、本具体例では標準の比較対象パ
ターン「地図を拡大して下さい」に（ａ）〜（ｔ）の２
０個の代替の比較対象パターンを加えた２１個の比較対
象パターンが設定されるが、それをそのまま２１個の単
語辞書データとして設定するのではなく、次のようにし
て単語辞書データを構築する。つまり、同じ音節データ
を持つ場合には同じ親を持つように割り付けられた木構
造を基本とするが頂点へ到達する通路の個数は必ずしも
１ではない木構造類似の有向グラフ形式に対応するよ
う、比較対象パターンを構成する音節データ及び単語終
了を示す識別データを先行順走査にしたがって各頂点に
割り付ける。In this specific example, () in (enlargement | large | zoom) indicates a priority order, and |
Indicates an option. In other words, (Enlarge | Large | Zoom) means selecting one of "Enlarge", "Large" or "Zoom", "Large" has priority over "Zoom", and "Large" has priority over "Large". "Expansion" has priority. [] Indicates omission possible, and [Map []]
In such a case, it is possible to omit only "O" to obtain a "map" or to omit "map" entirely. (3) As described above, in this specific example, the standard comparison target pattern “Please enlarge the map” is the same as (a) to (t).
21 comparison target patterns including zero alternative comparison target patterns are set. Instead of setting these as 21 word dictionary data as they are, word dictionary data is constructed as follows. . In other words, when having the same syllable data, the tree structure is allocated so as to have the same parent, but the number of paths reaching the vertices is not necessarily one. The syllable data constituting the target pattern and the identification data indicating the end of the word are assigned to each vertex according to the preceding forward scanning.

【００５３】ここで、「木構造類似の有向グラフ形式」
とは、基本的には木構造に準じているが、頂点へ到達す
る通路の個数は必ずしも１ではなく複数でもよい、つま
り一旦分岐した頂点がその後「合流」することを許して
いるので、このように呼ぶこととする。そして、各頂点
に割り付けられた音節データ及び単語終了を示す識別デ
ータを持たせることで、木構造でいうところの親子の順
番で各音節データをつなげていけば単語データとなる。Here, "directed graph format similar to tree structure"
This basically conforms to the tree structure, but the number of paths reaching the vertices is not necessarily one, but may be plural. Will be called as follows. By providing syllable data assigned to each vertex and identification data indicating the end of a word, if the syllable data is connected in the order of parent and child in a tree structure, it becomes word data.

【００５４】例えば、標準の比較対象パターンを上述の
「地図を拡大して下さい」としてその代替の比較対象パ
ターンを全て含む単語辞書データとしては、図３に示す
ような構成のものが構築される。このように単語辞書デ
ータを構築すれば、この単語辞書データを用いた音声認
識装置では、例えば「地図を拡大して下さい」、「地図
を大きくして下さい」、「地図をズームして下さい」、
「地図を拡大して」、「地図大きく」、「ズーム」とい
うような言い回しが異なるが実質的には同一内容を示す
言葉での音声入力に対して、「地図拡大」という同じコ
マンドに対応するものとして特定でき、利用者にとって
使い勝手が良くなる。それでいながら、それら複数の言
い回しについて、個別に対応する単語辞書データとして
記憶する必要がなく、例えば「地図」、「拡大」、「大
きく」、「ズーム」あるいは「して下さい」などの共通
する単語については共有した単語辞書データとして構築
することができるので、単語辞書データ用のメモリをよ
り少ない容量にて実現することができる。For example, assuming that the standard pattern to be compared is “enlarge the map” described above, word dictionary data having all the alternative patterns to be compared is constructed as shown in FIG. . By constructing the word dictionary data in this way, a speech recognition device using the word dictionary data can, for example, "enlarge the map", "enlarge the map", or "zoom the map". ,
Words such as "enlarge the map", "enlarge the map", and "zoom" have different words, but correspond to the same command of "enlarge the map" for speech input with words that have substantially the same content. It can be specified as a user, and the usability is improved for the user. Nevertheless, there is no need to store these multiple phrases as individually corresponding word dictionary data, such as "map", "enlarge", "large", "zoom" or "please". Since words can be constructed as shared word dictionary data, a memory for word dictionary data can be realized with a smaller capacity.

【００５５】ところで、前記（２）の、標準の比較対象
パターンに対して実質的に同一内容を示す代替の比較対
象パターンとして設定すべきものとして許容される条件
においては、コマンドが例えば「地図拡大」の場合に、
コマンド対象である「地図」とコマンド動作である「拡
大」については、同義語の使用を許すだけで省略は条件
としなかった。これは、コマンドの対象や動作が判らな
いとコマンドとして意味をなされないからである。By the way, under the condition (2) that is permitted to be set as an alternative comparison pattern having substantially the same content as the standard comparison pattern, the command may be, for example, “map enlargement”. In the case of
As for the command "map" and the command operation "magnify", the use of synonyms was only permitted, and the omission was not a condition. This is because if the object and operation of the command are not known, the command is meaningless.

【００５６】しかし、コマンド対象については、何等か
の方法で特定できれば省略しても構わない。つまり、コ
マンドの対象として地図が特定されるような状況であれ
ば、「拡大」とだけ入力すれば、それは地図拡大である
ことが判る。したがって、前記〜の条件に加えて
の条件、すなわち「前記コマンド対象が省略されてい
る」という条件を加えた６つの条件の内の少なくとも１
つ以上の条件を満たすものを、標準の比較対象パターン
に対して実質的に同一内容を示す代替の比較対象パター
ンとして設定すべきものとしてもよい。However, the command object may be omitted if it can be specified by any method. That is, in a situation where a map is specified as a target of a command, if only "enlargement" is input, it can be understood that it is map enlargement. Therefore, at least one of the six conditions obtained by adding the condition in addition to the above conditions, that is, the condition that “the command object is omitted” is added.
A pattern that satisfies one or more conditions may be set as an alternative pattern to be compared having substantially the same content as the standard pattern to be compared.

【００５７】これは、例えばコマンドの出力先において
地図に対する何等かの動作を行なうモードとなっている
場合などでは、コマンド対象自体は特定可能なため省略
しても構わないというような状況で成立する。辞書部３
１ｂの説明はこれで終わることとする。This is established in a situation where, for example, in a mode in which some operation is performed on the map at the output destination of the command, the command target itself can be specified and can be omitted. . Dictionary part 3
The explanation of 1b ends here.

【００５８】音声認識部３１においては、照合部３１ａ
が、音声入力部３４から取得した音声データに対し、辞
書部３１ｂ内に記憶されている複数の比較対象パターン
候補と比較して一致度の高い上位比較対象パターンを対
話制御部３２の後処理部３２ａへ出力する。そして後処
理部３２ａでは、この上位比較対象パターンを記憶して
おき、例えば上記所定の確定指示がなされた場合に制御
回路１０へデータを送って所定の処理をするように指示
する「確定後処理」を実行したり、あるいは音声合成部
３３へ音声データを送って発音させるように指示する処
理を実行する。なお、この場合の制御回路１０へ送るデ
ータとしては、最終的な認識結果としての上位比較対象
パターンの全てでもよいし、あるいはその内の最上位の
ものだけでもよい。このように、本実施例のカーナビゲ
ーションシステム２であれば、地図拡大というコマンド
を指示する場合に、例えば「地図を拡大して下さい」と
いう標準の比較対象パターン以外にも、「地図を大きく
して下さい」、「地図をズームして下さい」、「地図を
拡大して」、「地図大きく」、「ズーム」など上述した
（ａ）〜（ｔ）の２０の代替比較対象パターン、計２１
のパターンのいずれであっても、同じコマンドを指示で
きる。つまり、言い回しが異なるが実質的には同一内容
を示す言葉での音声入力に対して、「地図拡大」という
同じコマンドに対応するものとして特定でき、利用者に
とって使い勝手が良くなる。それでいながら、それら複
数の言い回しについて、個別に対応する単語辞書データ
として記憶する必要がなく、例えば「地図」、「拡
大」、「大きく」、「ズーム」あるいは「して下さい」
などの共通する単語については共用した単語辞書データ
として構築することができるので、単語辞書データ用の
メモリをより少ない容量にて実現することができる。［別実施例］上述の単語辞書データは「地図拡大」とい
うコマンドについてのものであったが、別の具体例とし
て「現在地表示」というコマンドについても説明してお
く。In the voice recognition section 31, a collation section 31a
Is a post-processing unit of the dialogue control unit 32, which compares the voice data acquired from the voice input unit 34 with a higher comparison pattern having a higher degree of coincidence with a plurality of comparison target pattern candidates stored in the dictionary unit 31b. 32a. Then, the post-processing unit 32a stores the upper comparison target pattern and, for example, sends data to the control circuit 10 and instructs to perform a predetermined process when the above-mentioned predetermined determination instruction is issued. Or a process of instructing the voice synthesizing unit 33 to transmit voice data to generate sound. In this case, the data to be sent to the control circuit 10 may be all of the upper comparison target patterns as the final recognition result, or only the uppermost one of them. As described above, according to the car navigation system 2 of the present embodiment, when the command to enlarge the map is instructed, for example, in addition to the standard comparison target pattern of “Please enlarge the map”, “ , "Zoom the map", "enlarge the map", "enlarge the map", "zoom", etc. The above 20 alternative comparison target patterns (a) to (t), a total of 21
, The same command can be specified. In other words, a voice input using a word having a different wording but having substantially the same content can be specified as a command corresponding to the same command of “enlarge map”, and the user can use the device more conveniently. Nevertheless, there is no need to memorize these multiple phrases as individually corresponding word dictionary data. For example, "map", "enlarge", "large", "zoom" or "please"
Such common words can be constructed as shared word dictionary data, so that a memory for word dictionary data can be realized with a smaller capacity. [Alternative Embodiment] Although the above-described word dictionary data is for the command "map enlargement", a command "current location display" will be described as another specific example.

【００５９】この場合には、標準の比較対象パターン
「現在地を表示して下さい」す代替比較対象パターンと
しては、例えば次の（イ）〜（ヨ）の１５のパターンが
考えられる。（イ）現在地を表示して（ロ）現在地を表示（ハ）現在地表示して下さい（ニ）現在地表示して（ホ）現在地表示（ヘ）現在の位置を表示して下さい（ト）現在の位置を表示して（チ）現在の位置を表示（リ）現在の位置表示して下さい（ヌ）現在の位置表示して（ル）現在の位置表示（ヲ）現在の位置はどこ（ワ）現在地はどこ（カ）今どこ（ヨ）今どこにいるのこれら標準の比較対象パターン及び代替比較対象パター
ンを全て含む単語辞書データとしては、図４に示すよう
な構成のものが構築される。In this case, for example, the following 15 patterns (A) to (Y) can be considered as the alternative comparison patterns that are the standard comparison patterns “Please display the current location”. (B) Display the current position (b) Display the current position (c) Display the current position (d) Display the current position (e) Display the current position (f) Display the current position (g) Display the current position Display the position (H) Display the current position (R) Display the current position (N) Display the current position (L) Display the current position (ヲ) Where is the current position (W) Where is the present location (f) Where is the present (yo) Where are you now? As the word dictionary data including all of these standard comparison target patterns and alternative comparison target patterns, a structure as shown in FIG. 4 is constructed.

【００６０】この別実施例の場合には、現在地という語
に対して「現在の位置」という語が代替できる点や「表
示して下さい」に対して「下さい」や「して下さい」の
ように一部が省略される点を考慮して代替比較対象パタ
ーンを設定すると共に、さらに、コマンド「現在地表
示」は、利用者にとって、現在居る場所を知りたい場合
に使用されるという観点から、「今どこに居るの」や
「今どこ」という通常の会話で用いるような疑問文的な
用法も代替比較対象パターンとして設定した。また、こ
れらを組み合わせた「現在地はどこ」なども設定した。
もちろん、同様の観点からその他の代替比較対象パター
ンを設定してもよい。［その他］上述した実施例では、音声認識装置３０をカ
ーナビゲーションシステム２に適用した例として説明し
たが、適用先としては、上述したカーナビゲーションシ
ステム２には限定されない。例えば音声認識装置を空調
システム用として用いる場合には、設定温度の調整、空
調モード（冷房・暖房・ドライ）の選択、あるいは風向
モードの選択を音声入力によって行うようにすることが
考えられる。In the case of this alternative embodiment, the word "current position" can be substituted for the word "current location", and the words "please" and "please" for "please display". In addition to setting the alternative comparison target pattern in consideration of the fact that a part is omitted, the command “display current location” is used from the viewpoint that the command is used when the user wants to know the current location. The questionable usages used in ordinary conversations such as "where are you now?" And "now where" are also set as alternative comparison patterns. In addition, "where is your current location", which combines them, is also set.
Of course, other alternative comparison target patterns may be set from the same viewpoint. [Others] In the above-described embodiment, the example in which the voice recognition device 30 is applied to the car navigation system 2 has been described. However, the application destination is not limited to the car navigation system 2 described above. For example, when the voice recognition device is used for an air conditioning system, it is conceivable to adjust the set temperature, select an air conditioning mode (cooling / heating / dry), or select a wind direction mode by voice input.

【００６１】例えば設定温度について言えば、「設定温
度を２５度にして下さい」に対して「設定温度を２５度
にして」、「設定温度２５度」という言い回しや、ある
いは「２５度に設定して下さい」という言い回し、さら
には単に「２５度」という言い回しも許容することが好
ましい。空調モードや風向モードなどについても同様で
ある。For example, regarding the set temperature, the phrase “set the set temperature to 25 degrees” or “set temperature 25 degrees” or “set the temperature to 25 degrees” or “set the temperature to 25 degrees” is used. It is preferable to allow the phrase "please", and even simply the phrase "25 degrees". The same applies to the air conditioning mode and the wind direction mode.

【００６２】なお、上述のナビゲーションシステムや空
調システムは、車載機器として用いられる場合だけでは
なく、例えば携帯型ナビゲーション装置や屋内用空調装
置などでもよい。但し、車載機器用として用いる場合に
は、車両に搭載する上で音声認識装置の小型化がより有
効となる。もちろん、このような視点で考えるならば、
ナビゲーションシステムや空調システム以外の車載機器
に対しても同様に利用することができる。例えば、カー
オーディオ機器などは有効である。また、いわゆるパワ
ーウインドウの開閉やミラー角度の調整などを音声によ
って指示するような構成を考えれば、そのような状況で
も有効である。The above-described navigation system and air conditioning system are not limited to the case where they are used as on-vehicle equipment, but may be, for example, a portable navigation device or an indoor air conditioner. However, when it is used for an in-vehicle device, downsizing of the voice recognition device is more effective when mounted on a vehicle. Of course, from this perspective,
The present invention can be similarly used for in-vehicle devices other than the navigation system and the air conditioning system. For example, a car audio device is effective. Also, considering a configuration in which opening and closing of the power window and adjustment of the mirror angle are instructed by voice, it is effective even in such a situation.

【００６３】また、車載機器用とした場合にはそれ特有
の利点があることは述べたが、本発明の音声認識装置の
適用先としては、利用者による音声入力指示にしたがっ
て所定の処理を実行するものであれば同様に考えられ
る。例えば、携帯用の情報端末装置、あるいは街頭やパ
ーキングエリアなどに設定される情報端末装置などにも
同様に適用できる。Although it has been described that there is an advantage unique to an in-vehicle device, the present invention is applied to a speech recognition apparatus which executes a predetermined process according to a speech input instruction from a user. If it does, the same can be considered. For example, the present invention can be similarly applied to a portable information terminal device or an information terminal device set in a street or a parking area.

[Brief description of the drawings]

【図１】本発明の一実施例としてのカーナビゲーショ
ンシステムの概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a car navigation system as one embodiment of the present invention.

【図２】音声認識装置における音声認識部と対話制御
部の構成を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration of a speech recognition unit and a dialogue control unit in the speech recognition device.

【図３】音声認識部内の辞書部に記憶されている地図
拡大というコマンドに対する辞書データを示す説明図で
ある。FIG. 3 is an explanatory diagram showing dictionary data for a command of map enlargement stored in a dictionary unit in a voice recognition unit.

【図４】別実施例として現在地表示というコマンドに
対する辞書データを示す説明図である。FIG. 4 is an explanatory diagram showing dictionary data corresponding to a command of present location display as another embodiment.

[Explanation of symbols]

２…カーナビゲーションシステム４…位置検出器６…地図データ入力器８…操作スイッチ
群１０…制御回路１２…外部メモリ１４…表示装置１５…リモコンセ
ンサ１５ａ…リモコン１６…地磁気セ
ンサ１８…ジャイロスコープ２０…距離センサ２２…ＧＰＳ受信機３０…音声認識装
置３１…音声認識部３１ａ…照合部３１ｂ…辞書部３２…対話制御
部３２ａ…後処理部３２ｂ…通信制御
部３２ｃ…辞書制御部３３…音声合成
部３４…音声入力部３５…マイク３６…ＰＴＴスイッチ３７…スピーカ３２１…辞書記述ファイル３２２…単語辞書
構築部2 ... Car navigation system 4 ... Position detector 6 ... Map data input device 8 ... Operation switch group 10 ... Control circuit 12 ... External memory 14 ... Display device 15 ... Remote control sensor 15a ... Remote control 16 ... Geomagnetic sensor 18 ... Gyroscope 20 ... Distance sensor 22 GPS receiver 30 Voice recognition device 31 Voice recognition unit 31a Matching unit 31b Dictionary unit 32 Dialogue control unit 32a Post-processing unit 32b Communication control unit 32c Dictionary control unit 33 Voice synthesis unit 34 voice input unit 35 microphone 36 PTT switch 37 speaker 321 dictionary description file 322 word dictionary construction unit

Claims

[Claims]

1. A comparison target pattern corresponding to each word is stored as word dictionary data, and an input voice is compared with the comparison target pattern, a speech having a high degree of matching is regarded as a recognition result, and substantially the same. A plurality of recognition results indicating the contents are specified as corresponding to the same command, and the following procedure (in order to construct the word dictionary data used in the voice recognition device that outputs the specified command to an external device) ( A method for constructing word dictionary data for a speech recognition device, wherein the word dictionary data is constructed according to 1) to (3). (1) First, for each of the commands, a word that specifies a command target indicating a target object or a target function of the command, a target particle “wo”, a word specifying a command operation indicating the operation content of the command, A standard comparison target pattern configured by sequentially connecting verbs added linguistically after the word specifying the command operation is set. (2) Next, a pattern satisfying at least one or more of the following conditions with respect to the standard comparison target pattern:
It is set as an alternative comparison target pattern having substantially the same contents. Synonyms are used for words that identify the command target or command action. The inflected form is used for verbs added in a language custom after a word specifying a command action. The object particle "wo" is omitted. Verbs added linguistically after the word that can specify the command action are omitted. If the verb added in the language custom after the word that can identify the command action is "Please", "Please"
Only has been omitted. (3) When the standard comparison target pattern and the alternative comparison target pattern have the same syllable data, they are based on a tree structure allocated so as to have the same parent, but the number of paths reaching the vertex. Assigns syllable data constituting the pattern to be compared and identification data indicating the end of a word to each vertex in accordance with a preceding scan so as to correspond to a directed graph format similar to a tree structure which is not necessarily 1.

2. The word dictionary data construction method for a speech recognition device according to claim 1, wherein the alternative pattern to be compared in step (2) has substantially the same contents as a standard pattern to be compared. The speech recognition device according to claim 1, wherein the one to be set as satisfies at least one or more of the six conditions obtained by adding "the command object is omitted" to the above. How to build word dictionary data.

3. A voice input means for inputting voice, a dictionary means for storing a comparison target pattern corresponding to each word as word dictionary data, and a voice input via the voice input means, A recognition unit that recognizes a pattern having a high degree of coincidence with the comparison target pattern stored in the unit as a recognition result; and, based on the recognition result by the recognition unit, recognizes a plurality of recognition results having substantially the same contents in the same manner. Claims 1. A speech recognition device comprising: a specifying unit that specifies a command corresponding to a command; and a command output unit that outputs the command specified by the specifying unit. The word dictionary data stored in the dictionary unit is: 1
A speech recognition device characterized by being constructed by the dictionary construction method according to (1).

4. The speech recognition apparatus according to claim 3, further comprising command target input means for inputting information on a command target that can be omitted from an output destination of the command, wherein word dictionary data stored in the dictionary means is provided. Is claim 2
A speech recognition device characterized by being constructed by the word dictionary construction method according to (1).

5. A voice recognition device according to claim 3, further comprising a navigation device, wherein said voice input means of said voice recognition device needs to be specified at least when said navigation device performs a navigation process. The command is used by a user to input an instruction of a predetermined navigation processing related command by voice, and the command output unit is configured to output the specified command to the navigation device. A navigation system characterized by the following.

6. A voice recognition device according to claim 4, further comprising a navigation device, wherein the navigation device obtains information about an omissible command target based on a currently executed navigation process. Command output means for outputting to the voice recognition device, the voice input means of the voice recognition device uses an instruction of a predetermined navigation process related command which needs to be specified at least when the navigation device performs the navigation process. A navigation system, wherein the command output unit is configured to output the specified command to the navigation device.