JPH0926799A

JPH0926799A - Speech recognition device

Info

Publication number: JPH0926799A
Application number: JP7200362A
Authority: JP
Inventors: Shoji Yokoyama; 昭二横山; Hiroyuki Yamakawa; 博幸山川; Yumi Murakami; ユミ村上
Original assignee: Equos Research Co Ltd
Current assignee: Equos Research Co Ltd
Priority date: 1995-07-12
Filing date: 1995-07-12
Publication date: 1997-01-28

Abstract

PROBLEM TO BE SOLVED: To provide a speech recognition device which can efficiently use the storage capacity of a dictionary and is good in operability. SOLUTION: When one operation is specified for the first time on a touch panel, etc., the standard pattern of the words indicating a specified operation is generated based on phoneme patterns representing features of respective phonemes stored in a dictionary part 167 and stored in a specific dictionary 163b in a dictionary buffer 163. A feature extraction part 162 extracts features from an input speech from a microphone 24 and generates a word pattern, and a pattern matching part 165 compares it with standard patterns stored in a RAM 164 and the dictionary buffer 163 to recognize the speech. Further, the word pattern generated by the feature extraction part 162 is stored as a standard pattern for a specific speaker in the RAM 164 and the specific dictionary 163b. Since a user registers standard patterns of only necessary words, an efficient dictionary is generated and an object of speech recognition can easily be understood.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声認識装置に係り、詳
細には、特定の語句について発声された音声を認識する
音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device, and more particularly, to a voice recognition device for recognizing a voice uttered for a specific phrase.

【０００２】[0002]

【従来の技術】人間の話した音声を言葉として認識する
音声認識装置が各種方面で実用化されている。この音声
認識装置は、例えば、工場における各種装置に対応する
指示をはなれた場所から音声で指示する入力装置として
実用化されており、また、自動車のナビゲーション装置
において、目的地や指示情報等を音声入力する場合の音
声入力装置として用いることが考えられている。このよ
うな音声認識装置では、一般に入力された音声を特定す
るために、予め認識対象となる音声の周波数分布を分析
することで、例えば、スペクトルや基本周波数の時系列
情報等を特徴として抽出し、そのパターンを各単語に対
応させて格納する音声認識用辞書を備えている。この音
声認識辞書に格納される辞書としては、一般に不特定多
数の話者を対象とした音声の周波数分布に対して平均化
処理を行ったものをパターン（以下を標準パターンとす
る）化し、そのパターンを単語と共に登録する不特定話
者辞書及び、特定の話者を対象に入力音声のパターンを
単語と共に登録する特定話者辞書がある。2. Description of the Related Art Speech recognition devices for recognizing speech spoken by humans as words have been put to practical use in various fields. This voice recognition device has been put to practical use as an input device for giving a voice instruction from a place where an instruction corresponding to various devices in a factory has been released. It is considered to be used as a voice input device when inputting. In such a voice recognition device, in general, in order to identify the input voice, by analyzing the frequency distribution of the voice to be recognized in advance, for example, the spectrum or the time series information of the fundamental frequency is extracted as a feature. , A voice recognition dictionary for storing the pattern in association with each word is provided. As a dictionary stored in this voice recognition dictionary, a pattern obtained by performing averaging processing on the frequency distribution of the voice for an unspecified large number of speakers is generally used as a pattern (the following is referred to as a standard pattern). There are an unspecified speaker dictionary in which patterns are registered with words, and a specified speaker dictionary in which patterns of input speech are registered together with words for a specific speaker.

【０００３】そして、認識するべき音声が入力される
と、入力された音声の周波数パターンと両辞書に格納さ
れた各単語のパターンをパターンマッチングにより比較
照合し、各単語に対する類似度を算出する。つぎに算出
された類似度が最も高い単語（パターンが最も近い単
語）を、入力された音声であると認識し、その単語を出
力するようにしている。つまり、入力された単語の周波
数分布のパターンがどの単語パターンに最もよく似てい
るかを調べることによって、入力音声を判定している。When a voice to be recognized is input, the frequency pattern of the input voice and the pattern of each word stored in both dictionaries are compared and collated by pattern matching, and the similarity to each word is calculated. Next, the word with the highest calculated similarity (the word with the closest pattern) is recognized as the input voice, and that word is output. That is, the input voice is determined by checking which word pattern most closely matches the pattern of the frequency distribution of the input word.

【０００４】[0004]

【発明が解決しようとする課題】しかし、現在の音声認
識装置では、不特定辞書及び特定辞書における標準パタ
ーンの記憶容量に限界があり、語彙数が増すと、多くの
標準パターンを両辞書に蓄積できなかった。このよう
に、標準パターンの記憶容量に限界があるので、予め選
択された単語についての標準パターンのみが、不特定辞
書に登録されている。しかし、選択された登録単語はユ
ーザにとって必ずしも必要なものであるとは限らず、不
要な単語である場合には使用された記憶容量が無駄にな
っていた。また、辞書に予めどういう単語が登録されて
いるか、ユーザにとっては、使用マニュアル等で登録さ
れた単語を確認しない限りわからなかった。このため、
従来の音声認識装置では、ユーザの操作性の面で問題が
あった。However, the current speech recognition apparatus has a limit in the storage capacity of standard patterns in the unspecified dictionary and the specified dictionary, and when the number of vocabularies increases, many standard patterns are accumulated in both dictionaries. could not. As described above, since the storage capacity of the standard pattern is limited, only the standard pattern for the preselected word is registered in the unspecified dictionary. However, the selected registered word is not always necessary for the user, and when it is an unnecessary word, the storage capacity used is wasted. Also, what kind of word in advance in the dictionary has been registered, for the user, did not know unless you see the words that have been registered in the use manual or the like. For this reason,
The conventional voice recognition device has a problem in terms of user operability.

【０００５】そこで、本発明は以上の課題を解決するた
めになされたのもで、辞書の記憶容量を効率的に使用で
き、操作性の良い音声認識装置を提供することを目的と
する。Therefore, the present invention has been made to solve the above problems, and an object of the present invention is to provide a voice recognition device which can efficiently use the storage capacity of a dictionary and has good operability.

【０００６】[0006]

【課題を解決するための手段】請求項１記載の発明で
は、所定の操作を選択する選択手段と、この選択手段で
選択可能な操作を表す単語を構成する各音素についての
特徴を表す音素パターンが予め格納された音素辞書と、
標準パターンが格納される辞書記憶手段と、前記選択手
段で選択された操作を表す単語の標準パターンを、前記
音素辞書に格納された音素パターンから作成する標準パ
ターン作成手段と、この標準パターン作成手段で作成さ
れた標準パターンを前記辞書記憶手段に格納する不特定
辞書格納手段と、音声を入力する音声入力手段と、この
音声入力手段で入力された音声の特徴を抽出して対応す
る単語パターンを作成する単語パターン作成手段と、こ
の単語パターン作成手段で作成された単語パターンと前
記辞書記憶手段に格納された標準パターンとの類似度か
ら前記音声入力手段で入力された音声を認識する認識手
段と、この認識手段による認識結果を出力する出力手段
と、を音声認識装置に具備させて前記目的を達成する。
請求項２に記載の発明では、請求項１に記載の音声認識
装置において、前記単語パターン作成手段で作成された
単語パターンを、特定話者用の標準パターンとして前記
辞書記憶手段に格納する特定辞書格納手段を、さらに具
備させる。請求項３に記載の発明では、請求項１または
請求項２に記載の音声認識装置において、前記辞書格納
手段に格納された標準パターンが所定数になった場合、
前記不特定辞書格納手段および前記特定辞書格納手段
は、前記認識手段による認識頻度が最も少ない標準パタ
ーンを削除する。請求項４に記載の発明では、請求項
１、請求項２、または請求項３に記載の音声認識装置
を、ナビゲーション装置の入力手段として用いる。According to a first aspect of the present invention, there is provided a selecting means for selecting a predetermined operation, and a phoneme pattern representing features of each phoneme that constitutes a word representing an operation selectable by the selecting means. A phoneme dictionary in which is stored in advance,
Dictionary storage means for storing a standard pattern, standard pattern creating means for creating a standard pattern of a word representing the operation selected by the selecting means from the phoneme patterns stored in the phoneme dictionary, and the standard pattern creating means The unspecified dictionary storage means for storing the standard pattern created in 1. in the dictionary storage means, the voice input means for inputting voice, and the feature of the voice input by this voice input means are extracted to obtain a corresponding word pattern. A word pattern creating means for creating, and a recognizing means for recognizing the voice input by the voice input means based on the similarity between the word pattern created by the word pattern creating means and the standard pattern stored in the dictionary storage means. The above-mentioned object is achieved by providing a voice recognition device with an output means for outputting the recognition result by this recognition means.
According to a second aspect of the present invention, in the voice recognition apparatus according to the first aspect, a specific dictionary that stores the word pattern created by the word pattern creating means in the dictionary storage means as a standard pattern for a specific speaker. Storage means is further provided. In the invention according to claim 3, in the voice recognition device according to claim 1 or 2, when the number of standard patterns stored in the dictionary storage means reaches a predetermined number,
The unspecified dictionary storage means and the specified dictionary storage means delete the standard pattern with the least recognition frequency by the recognition means. According to the invention described in claim 4, the voice recognition device according to claim 1, claim 2, or claim 3 is used as an input unit of the navigation device.

【０００７】[0007]

【作用】請求項１に記載の音声認識装置では、では、選
択手段で選択可能な操作を表す単語を構成する各音素に
ついての特徴を表す音素パターンを予め音素辞書に格納
する。そして、選択手段で選択された操作を表す単語の
標準パターンを音素パターンから作成し、辞書記憶手段
に格納する。一方、音声入力手段で入力された音声の特
徴を抽出して対応する単語パターンを作成し、辞書記憶
手段に格納された標準パターンとの類似度から入力され
た音声を認識して、認識結果を出力する。請求項２に記
載の音声認識装置では、入力された音声から作成された
単語パターンを特定話者用の標準パターンとして辞書記
憶手段に格納する。請求項３に記載の音声認識装置で
は、辞書格納手段に格納された標準パターンが所定数に
なった場合、認識頻度が最も少ない標準パターンを削除
する。請求項４に記載の音声認識装置では、ナビゲーシ
ョン装置の入力手段として用いる。In the speech recognition apparatus according to the first aspect of the present invention, the phoneme pattern representing the characteristics of each phoneme forming the word representing the operation selectable by the selecting means is stored in advance in the phoneme dictionary. Then, a standard pattern of the word representing the operation selected by the selection means is created from the phoneme pattern and stored in the dictionary storage means. On the other hand, the feature of the voice input by the voice input means is extracted to create a corresponding word pattern, and the input voice is recognized based on the similarity with the standard pattern stored in the dictionary storage means, and the recognition result is obtained. Output. In the voice recognition device according to the second aspect, the word pattern created from the input voice is stored in the dictionary storage means as a standard pattern for a specific speaker. In the voice recognition device according to the third aspect, when the number of standard patterns stored in the dictionary storage means reaches a predetermined number, the standard pattern having the lowest recognition frequency is deleted. In the voice recognition device according to the fourth aspect, it is used as an input unit of the navigation device.

【０００８】[0008]

【実施例】以下、本発明の音声認識装置における一実施
例を図１ないし図４を参照して詳細に説明する。実施例の概要本実施例の音声認識装置では、ある操作がタッチパネル
等で初めて指定された時点で、指定された操作を示す単
語の標準パターンを、各音素の特徴を表す音素パターン
に基づいて作成し、不特定話者用の辞書に格納する。音
声が入力されると、入力音声の特徴を抽出して単語パタ
ーンを作成し、辞書に格納されている標準パターンとの
類似度から音声認識する。この音声認識の際に作成した
単語パターンを特定話者用の標準パターンとして辞書に
格納する。このように本実施例では、一度タッチパネル
等から指定された必要な単語についてのみ、標準パター
ンが作成され、次回から音声で入力することができるよ
うになる。音声認識の対象となる単語は、一度ユーザに
よって指定入力された単語なので、ユーザは認識対象を
改めて確認する必要がなくなる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the voice recognition apparatus of the present invention will be described in detail below with reference to FIGS. Outline of Example In the speech recognition apparatus of this example, when a certain operation is designated for the first time on a touch panel or the like, a standard pattern of a word indicating the designated operation is created based on a phoneme pattern representing characteristics of each phoneme. Then, it is stored in the dictionary for unspecified speakers. When a voice is input, the features of the input voice are extracted to create a word pattern, and the voice is recognized based on the similarity to the standard pattern stored in the dictionary. The word pattern created at the time of this voice recognition is stored in the dictionary as a standard pattern for a specific speaker. As described above, in the present embodiment, the standard pattern is created only for the necessary word once specified from the touch panel or the like, and it becomes possible to input it by voice from the next time. Since the word that is the target of voice recognition is the word that is once designated and input by the user, the user does not need to check the recognition target again.

【０００９】実施例の詳細図１は本発明の一実施例に係る音声認識装置をナビゲー
ション装置に適用した場合のシステム構成を表したもの
である。このナビゲーション装置は、演算部１０を備え
ている。この演算部１０には、タッチパネルとして機能
するディスプレイ１１ａとこのディスプレイ１１ａの周
囲に設けられた操作用のスイッチ１１ｂとを含む表示部
１１と、この表示部１１のタッチパネルやスイッチ１１
ｂからの入力を管理するスイッチ入力類管理部１２が接
続されている。Details of Embodiment FIG. 1 shows a system configuration when a voice recognition device according to an embodiment of the present invention is applied to a navigation device. This navigation device includes a calculation unit 10. The calculation unit 10 includes a display unit 11 including a display 11a that functions as a touch panel and operating switches 11b provided around the display 11a, and a touch panel and switches 11 of the display unit 11.
A switch input management unit 12 that manages the input from b is connected.

【００１０】スイッチ１１ｂには、ナビゲーションのメ
ニュー画面を指定するスイッチ、エアコンの調整用のス
イッチ、オーディオの操作を行うためのスイッチ等の各
種スイッチがある。これらのスイッチを押すと、対応す
るメニュー画面がディスプレイ１１ａに表示されるよう
になっている。タッチパネル１１ａに表示される画面
は、階層構造になっており、最上位層にメニュー画面が
ある。そして、ナビゲーション用のメニュー画面には、
例えば、目的地設定や、地名検索等を指定する指定キー
が表示され、目的地設定キーが指定されると、更にその
下位層の画面としてスキー場、ゴルフ場等を指定する指
定キーが表示されるようになっており、各指定キーの指
定によって最下層の画面まで順次表示されるようになっ
ている。本実施例では、これらタッチパネル１１ａの画
面に表示される各種指定キー、およびスイッチ１１ｂの
各種スイッチの内容が、音声認識の対象となる。すなわ
ち、これら各種の指定キーとスイッチが最初に押下され
る毎に、音声認識のための不特定話者辞書が作成され、
それ以後、作成された辞書に基づいて、入力された音声
の認識が行われると共に、入力音声による特定話者辞書
が作成されるようになっている。The switch 11b includes various switches such as a switch for designating a navigation menu screen, a switch for adjusting an air conditioner, and a switch for operating an audio. When these switches are pressed, the corresponding menu screen is displayed on the display 11a. The screen displayed on the touch panel 11a has a hierarchical structure, and the menu screen is at the top layer. And on the menu screen for navigation,
For example, a designation key for designating a destination, searching for a place name, etc. is displayed, and when the destination setting key is designated, a designation key for designating a ski resort, a golf course, etc. is displayed as a screen lower than that. The screen of the lowest layer is sequentially displayed by the designation of each designation key. In this embodiment, the contents of the various designation keys displayed on the screen of the touch panel 11a and the various switches of the switch 11b are the targets of voice recognition. That is, every time these various designated keys and switches are first pressed, an unspecified speaker dictionary for voice recognition is created,
After that, the input voice is recognized based on the created dictionary, and the specific speaker dictionary based on the input voice is created.

【００１１】また、演算部１０には、現在位置測定部１
３と、速度センサ１４と、地図情報記憶部１５と、本実
施例おける音声認識部１６と、音声出力部１７とが接続
されている。現在位置測定部１３は、緯度と経度による
座標データを検出することで、車両が現在走行または停
止している現在位置を検出する。この現在位置測定部１
３には、人工衛星を利用して車両の位置を測定するＧＰ
Ｓ(Global Position System)レシーバ２１と、路上に配
置されたビーコンからの位置情報を受信するビーコン受
信装置２０と、方位センサ２２と、距離センサ２３とが
接続され、現在位置測定部１３はこれらからの情報を用
いて車両の現在位置を測定するようになっている。Further, the arithmetic unit 10 includes a current position measuring unit 1
3, a speed sensor 14, a map information storage unit 15, a voice recognition unit 16 in this embodiment, and a voice output unit 17 are connected. The current position measuring unit 13 detects the current position where the vehicle is currently running or stopped by detecting coordinate data based on latitude and longitude. This current position measurement unit 1
3 is a GP that measures the position of vehicles using artificial satellites
An S (Global Position System) receiver 21, a beacon receiving device 20 that receives position information from beacons placed on the road, an azimuth sensor 22, and a distance sensor 23 are connected, and the current position measuring unit 13 is connected to them. The current position of the vehicle is measured using the information of.

【００１２】方位センサ２２は、例えば、地磁気を検出
して車両の方位を求める地磁気センサ、車両の回転角速
度を検出しその角速度を積分して車両の方位を求めるガ
スレートジャイロや光ファイバジャイロ等のジャイロ、
左右の車輪センサを配置しその出力パルス差（移動距離
の差）により車両の旋回を検出することで方位の変位量
を算出するようにした車輪センサ、等が使用される。距
離センサ２３は、例えば、車輪の回転数を検出して計数
し、または加速度を検出して２回積分するもの等の各種
の方法が使用される。なお、ＧＰＳレシーバ２１とビー
コン受信装置２０は単独で位置測定が可能であるが、Ｇ
ＰＳレシーバ２１やビーコン受信装置２０による受信が
不可能な場所では、方位センサ２２と距離センサ２３の
双方を用いた推測航法によって現在位置を検出するよう
になっている。The azimuth sensor 22 is, for example, a geomagnetic sensor that detects the geomagnetism to determine the azimuth of the vehicle, a gas rate gyro or an optical fiber gyro that detects the rotational angular velocity of the vehicle and integrates the angular velocity to determine the azimuth of the vehicle. gyro,
A wheel sensor or the like is used in which left and right wheel sensors are disposed, and a displacement of the azimuth is calculated by detecting turning of the vehicle based on an output pulse difference (difference in moving distance). As the distance sensor 23, for example, various methods such as a method of detecting and counting the number of rotations of a wheel, or a method of detecting acceleration and integrating twice are used. Note that the GPS receiver 21 and the beacon receiving device 20 can perform position measurement independently,
In a place where reception by the PS receiver 21 or the beacon receiving device 20 is not possible, the current position is detected by dead reckoning navigation using both the direction sensor 22 and the distance sensor 23.

【００１３】地図情報記憶部１５は、例えばＣＤＲＯＭ
等の大容量記憶装置で構成されている。この地図情報記
憶部１５には、目的地までの経路探索に必要な道路デー
タや、探索した経路をディスプレイ１１ａに表示するた
めの地図データ等の、経路探索および経路案内に必要な
各種データが格納されている。音声認識部１６には、人
間の音声や、電話番号に対応した発信音が入力されるマ
イク２４が接続されている。音声出力部１７は、音声を
電気信号として出力する音声出力用ＩＣ２６と、この音
声出力用ＩＣ２６の出力をディジタル−アナログ変換す
るＤ／Ａコンバータ２７と、変換されたアナログ信号を
増幅するアンプ２８とを備えている。アンプ２８の出力
端にはスピーカ２９が接続されている。The map information storage unit 15 is, for example, a CDROM.
And the like. The map information storage unit 15 stores various data necessary for route search and route guidance, such as road data necessary for route search to the destination and map data for displaying the searched route on the display 11a. Have been. The voice recognition unit 16 is connected to a microphone 24 into which a human voice or a dial tone corresponding to a telephone number is input. The audio output unit 17 includes an audio output IC 26 for outputting audio as an electric signal, a D / A converter 27 for digital-to-analog conversion of the output of the audio output IC 26, and an amplifier 28 for amplifying the converted analog signal. It has. A speaker 29 is connected to an output terminal of the amplifier 28.

【００１４】演算部１０は、ＣＰＵ（中央処理装置）、
ＲＯＭ（リード・オンリ・メモリ）、ＲＡＭ（ランダム
・アクセス・メモリ）等を備え、ＣＰＵがＲＡＭをワー
キングエリアとしてＲＯＭに格納されたプログラムを実
行することによって、上記の各構成を実現するようにな
っている。すなわち、演算部１０は、速度センサ１４お
よび地図情報記憶部１５に接続された地図データ読込部
３１と、地図描画部３２と、地図データ読込部３１およ
び地図描画部３２を管理する地図管理部３３と、地図描
画部３２および表示部１１に接続された画面管理部３４
と、スイッチ入力類管理部１２および音声認識部１６に
接続された入力管理部３５と、音声出力部１７の音声出
力用ＩＣ２６に接続された音声出力管理部３６、およ
び、地図管理部３３、画面管理部３４、入力管理部３
５、音声出力管理部３６を管理する全体管理部３７とを
備えている。入力管理部３５は、ディスプレイ１１ａの
タッチパネルに表示される各種指定キーや、スイッチ１
１ｂが初めて選択されたものか否かも管理するようにな
っている。そして、スイッチ等の選択が初めて行われた
ものである場合、音声認識部１６に、そのスイッチ等に
応じたタッチパネル等入力信号を入力管理部３５に供給
するようになっている。The arithmetic unit 10 includes a CPU (central processing unit),
A ROM (Read Only Memory), a RAM (Random Access Memory), etc. are provided, and the CPU executes the program stored in the ROM using the RAM as a working area to realize each of the above configurations. ing. That is, the calculation unit 10 manages the map data reading unit 31, the map drawing unit 32, and the map data reading unit 31 and the map drawing unit 32 which are connected to the speed sensor 14 and the map information storage unit 15. And a screen management unit 34 connected to the map drawing unit 32 and the display unit 11.
An input management unit 35 connected to the switch input management unit 12 and the voice recognition unit 16, a voice output management unit 36 connected to the voice output IC 26 of the voice output unit 17, a map management unit 33, and a screen. Management unit 34, input management unit 3
5, an overall management unit 37 that manages the audio output management unit 36. The input management unit 35 includes various designation keys displayed on the touch panel of the display 11a and the switch 1
It also manages whether 1b is the first one selected. When a switch or the like is selected for the first time, the voice recognition unit 16 is supplied with an input signal such as a touch panel corresponding to the switch or the like to the input management unit 35.

【００１５】図２は、図１における音声認識部１６の構
成を示すブロック図である。この図に示すように、音声
認識部１６は、マイク２４から入力される音声信号をデ
ィジタル信号に変換するＡ／Ｄ変換部１６１と、このＤ
／Ａ変換部１６１の出力信号から入力された音声につい
ての特徴を抽出して対応する単語パターンを作成する特
徴抽出部１６２と、所定の音声にする標準パターンが格
納される辞書としての辞書バッファ１６３とＲＡＭ１６
４とを備えている。辞書バッファ１６３は、不特定話者
認識用として一般的な標準パターンが格納される不特定
辞書１６３ａと、マイク２４から入力された特定話者の
音声から作成された単語パターンが標準パターンとして
格納される特定辞書１６３ｂを備えている。ここで、標
準パターンとは、所定単位毎での音声信号についての、
スペクトルや基本周波数の時系列情報である。所定単位
毎の音声信号としては、音節、単語、音素、半音節、単
語間、音素間、音節間、半音節間等の単位毎の音声信号
が使用される。また、特徴抽出部１６２は、多チャネル
・バンドパスフィルタや線形予測分析等によって、この
単語パターンを抽出するようになっている。FIG. 2 is a block diagram showing the configuration of the voice recognition unit 16 in FIG. As shown in this figure, the voice recognition unit 16 includes an A / D conversion unit 161 for converting a voice signal input from the microphone 24 into a digital signal, and the D / D conversion unit 161.
A / A converter 161 extracts a feature of an input voice from an output signal and creates a corresponding word pattern, and a feature extraction unit 162, and a dictionary buffer 163 as a dictionary that stores a standard pattern of a predetermined voice. And RAM16
4 is provided. The dictionary buffer 163 stores an unspecified dictionary 163a in which a general standard pattern for recognizing an unspecified speaker is stored, and a word pattern created from the sound of the specified speaker input from the microphone 24 as a standard pattern. It has a specific dictionary 163b. Here, the standard pattern refers to an audio signal for each predetermined unit,
It is time series information of spectrum and fundamental frequency. As the voice signal for each predetermined unit, a voice signal for each unit such as syllable, word, phoneme, semi-syllable, inter-word, inter-phoneme, inter-syllable, etc. is used. Further, the feature extraction unit 162 is adapted to extract this word pattern by a multi-channel bandpass filter, a linear prediction analysis, or the like.

【００１６】また、音声認識部１６は、辞書バッファ１
６３とＲＡＭ１６４に格納された標準パターンと、特徴
抽出手部１６２によって抽出された単語パターンとを比
較するパターンマッチング部１６５と、パターンマッチ
ング部１６５の比較結果に基づいてマイク２４から入力
された音声の内容を認識し、その認識内容に従って、演
算部１０に対する音声入力信号を生成して、演算部１０
の入力管理部３５へ出力する認識結果処理部１６６とを
備えている。Further, the voice recognition unit 16 uses the dictionary buffer 1
63, the pattern matching unit 165 that compares the standard pattern stored in the RAM 164 with the word pattern extracted by the feature extracting unit 162, and the voice input from the microphone 24 based on the comparison result of the pattern matching unit 165. The content is recognized, a voice input signal to the calculation unit 10 is generated according to the recognition content, and the calculation unit 10 is generated.
Of the recognition result processing unit 166 for outputting to the input management unit 35.

【００１７】さらに、音声認識部１６は、辞書部１６７
と、ＲＯＭ１６８、および辞書管理部１６９を備えてい
る。辞書部１６７はＲＯＭで構成され、各音素の特徴を
表す音素パターンが格納されている。この音素パターン
は、不特定辞書を辞書管理部１６９で作成するためのも
ので、複数人のアナウンサー等による発声音を平均化す
ることで各音素毎に作成されたものが格納されている。
なお、辞書部１６７のＲＯＭは各種記憶装置を使用する
ことが可能であるが、ＣＤ（コンパクトディスク）ＲＯ
Ｍを使用するようにしてよもい。Further, the voice recognition unit 16 has a dictionary unit 167.
And a ROM 168 and a dictionary management unit 169. The dictionary unit 167 is composed of a ROM and stores phoneme patterns representing the characteristics of each phoneme. This phoneme pattern is used to create an unspecified dictionary by the dictionary management unit 169, and is stored for each phoneme by averaging voicing sounds by a plurality of announcers and the like.
Although the ROM of the dictionary unit 167 can use various storage devices, a CD (compact disc) RO
You can use M.

【００１８】ＲＯＭ１６８には、ユーザによって押下さ
れた、ディスプレイ１１ａのタッチパネルに表示された
指定キーや、スイッチ１１ｂのスイッチに応じたタッチ
パネル等入力信号が入力管理部３５から供給されるよう
になっている。そして、ＲＯＭ１６８は、タッチパネル
等入力信号に応じた単語を示す音素指定信号に変換して
辞書管理部１６９に供給するようになっており、そのた
めの変換テーブルを備えている。例えば、ディスプレイ
１１ａのタッチパネルから「目的地設定」の指定キーが
押下され、それに対応するタッチパネル等入力信号が入
力管理部３５から供給された場合、ＲＯＭ１６８は、タ
ッチパネル等入力信号に対応じた単語「もくてきちせっ
てい」を示す音素指定信号に変換して辞書管理部１６９
に供給する。The input management unit 35 supplies the ROM 168 with input signals such as a designated key displayed on the touch panel of the display 11a and a touch panel corresponding to the switch of the switch 11b, which are pressed by the user. . The ROM 168 is adapted to convert into a phoneme designating signal indicating a word corresponding to an input signal such as a touch panel and supply the phoneme designating signal to the dictionary management unit 169, and is provided with a conversion table therefor. For example, when the "destination setting" designation key is pressed from the touch panel of the display 11a and a corresponding touch panel input signal is supplied from the input management unit 35, the ROM 168 stores the word "corresponding to the touch panel input signal". The dictionary management unit 169
To supply.

【００１９】辞書管理部１６９は、ＲＯＭ１６８から供
給された音素指定信号に対応する各音素パターンを辞書
部１６７から読み出して、不特定話者認識用の辞書を合
成し、これを辞書バッファ１６３の不特定辞書１６３ａ
に格納するようになっている。また辞書管理部１６９に
は、マイク２４から入力された音声についての認識が成
功すると、認識結果処理部１６６から音声入力信号が供
給されるようになっている。辞書管理部１６９は、この
音声入力信号が供給されると、その認識された音声につ
いて特徴抽出部１６２で抽出した単語パターンを標準パ
ターンとして、特定辞書１６３ｂとＲＡＭ１６４に格納
するようになっている。ここで、ＲＡＭ１６４は、表示
部１１で押下可能なスイッチと指定キーの数だけ（＝音
声入力信号の数だけ）の単語を格納するエリアが確保さ
れており、各エリアには１つの標準パターンが格納され
る。従って、マイク２４から入力された音声が認識され
る毎に、その入力音声に対応する格納エリアが、新しい
標準パターンによって更新されるようになっている。ま
た、ＲＡＭ１６４には、入力音声の音素パターンも音素
辞書として更新される。すなわち、マイク２４から入力
された音声が認識される毎に、その入力された音声につ
いて特徴抽出部１６２で抽出した単語パターンから各音
素毎の音素パターンを作成し、ＲＡＭ１６４が更新され
る。The dictionary management unit 169 reads out each phoneme pattern corresponding to the phoneme designating signal supplied from the ROM 168 from the dictionary unit 167, synthesizes a dictionary for unspecified speaker recognition, and stores this in the dictionary buffer 163. Specific dictionary 163a
It is designed to be stored in. When the recognition of the voice input from the microphone 24 is successful, the dictionary management unit 169 is supplied with the voice input signal from the recognition result processing unit 166. When this voice input signal is supplied, the dictionary management unit 169 stores the word pattern extracted by the feature extraction unit 162 for the recognized voice as a standard pattern in the specific dictionary 163b and the RAM 164. Here, the RAM 164 is provided with an area for storing as many words as the switches that can be pressed on the display unit 11 and designated keys (= the number of voice input signals), and one standard pattern is provided in each area. Is stored. Therefore, every time the voice input from the microphone 24 is recognized, the storage area corresponding to the input voice is updated with the new standard pattern. The phoneme pattern of the input voice is also updated in the RAM 164 as a phoneme dictionary. That is, each time the voice input from the microphone 24 is recognized, a phoneme pattern for each phoneme is created from the word pattern extracted by the feature extraction unit 162 for the input voice, and the RAM 164 is updated.

【００２０】また特定辞書１６３ｂにも、音声入力信号
の数だけの単語を格納するエリアが確保されており、各
エリアには、複数の標準パターンが格納されるようにな
っている。従って、音声認識される毎に、その入力され
た音声に対応する格納エリアに新しい標準パターンが順
次蓄積されるようになっている。なお、音声入力信号に
対応する格納エリアに、所定数の標準パターンが格納さ
れた以後は、最も古い標準パターンと最新の標準パター
ンとの間で更新が行われる。The specific dictionary 163b also has an area for storing as many words as the number of voice input signals, and a plurality of standard patterns are stored in each area. Therefore, every time the voice is recognized, a new standard pattern is sequentially stored in the storage area corresponding to the input voice. After the predetermined number of standard patterns are stored in the storage area corresponding to the voice input signal, the update is performed between the oldest standard pattern and the latest standard pattern.

【００２１】次に、このように構成された実施例の動作
について説明する。図３は、ユーザによるナビゲーショ
ン装置の使用状態を表したものである。本実施例のナビ
ゲーション装置は、システムの初期状態において、ＲＡ
Ｍ１６４と辞書バッファ１６３の不特定辞書１６３ａ、
特定辞書１６３ｂに、標準パターンが格納されていな
い。従って、システムの初期状態では音声認識を行うこ
とができず、図３（ａ）〜（ｄ）に示すように、順次表
示装置１１のスイッチ１１ｂおよびディスプレイ１１ａ
のタッチパネルから所定の処理を選択する。ここで、ナ
ビゲーション用のメニュー画面（ａ）から、タッチパネ
ルで順次目的地の入力を行う場合について説明する。ま
ず、メニュー画面（ａ）において、ユーザか「目的地設
定」を選択すると、その下位層の目的地設定画面（ｂ）
が表示される。同様に、目的地設定画面（ｂ）において
「ゴルフ場」を選択し、さらに、「千葉県」、「習志野
ＣＣ」というように順次選択することで、全体管理部３
７により、習志野カントリークラブが目的地として設定
される。Next, the operation of the embodiment thus constructed will be described. FIG. 3 shows a usage state of the navigation device by the user. The navigation device of the present embodiment has the RA in the initial state of the system.
M164 and the unspecified dictionary 163a in the dictionary buffer 163,
The standard pattern is not stored in the specific dictionary 163b. Therefore, voice recognition cannot be performed in the initial state of the system, and as shown in FIGS. 3A to 3D, the switch 11b and the display 11a of the display device 11 are sequentially displayed.
Select a predetermined process from the touch panel. Here, a case where the destinations are sequentially input from the navigation menu screen (a) using the touch panel will be described. First, when the user selects "destination setting" on the menu screen (a), the destination setting screen (b) of the lower layer
Is displayed. Similarly, by selecting "Golf course" on the destination setting screen (b), and further sequentially selecting "Chiba prefecture", "Narashino CC", etc.
Narashino Country Club is set as the destination by 7.

【００２２】そして、表示部１１のディスプレイ１１ａ
のタッチパネルやスイッチ１１ｂのスイッチが、初めて
選択される毎に、そのスイッチに対応す単語の標準パタ
ーンが辞書管理部１６９で作成され、辞書バッファ１６
３の不特定辞書１６３に格納される。すなわち、図３
（ａ）〜（ｄ）の順に選択が行われると、選択に対応す
る単語「目的地設定」「ゴルフ場」、「千葉県」、「習
志野カントリークラブ」の標準パターンが順次特定辞書
１６３ａに格納される。The display 11a of the display unit 11
Each time the switch of the touch panel or the switch 11b is selected for the first time, a standard pattern of words corresponding to the switch is created by the dictionary management unit 169, and the dictionary buffer 16
3 in the unspecified dictionary 163. That is, FIG.
When the selection is performed in the order of (a) to (d), the standard patterns of the words "destination setting", "golf course", "Chiba prefecture", and "Narashino country club" corresponding to the selection are sequentially stored in the specific dictionary 163a. To be done.

【００２３】このようにして、表示部１１の操作によっ
て目的地設定が行われ、同時に、対応する指定キーやス
イッチが意味する単語の不特定辞書が作成されると、そ
れ以後は、音声による入力が可能になる。すなわち、図
３（ａ）に示すナビゲーション用のメニュー画面におい
て、（ｅ）に示すようにユーザが「目的地設定」と発声
してマイク２４に入力すると、音声認識部１６で認識さ
れ、ディスプレイ１１ａには目的地設定画面（ｂ）を表
示する。更に、ユーザが「ゴルフ場」と発声すると（図
３（ｆ））、ゴルフ場画面が表示される。同様にして、
ユーザが「千葉県」、「習志野カントリークラブ」と順
に発声すると、その音声が認識され、最終的に地習志野
ＣＣが目的地として設定される。In this way, the destination is set by the operation of the display unit 11, and at the same time, the unspecified dictionary of the words indicated by the corresponding designated key or switch is created, and thereafter, the voice input is performed. Will be possible. That is, on the navigation menu screen shown in FIG. 3A, when the user utters "destination setting" and inputs it into the microphone 24 as shown in (e), the voice recognition unit 16 recognizes it and displays it on the display 11a. Displays a destination setting screen (b). Further, when the user utters "golf course" (FIG. 3 (f)), a golf course screen is displayed. Similarly,
When the user utters “Chiba Prefecture” and “Narashino Country Club” in that order, the voice is recognized, and finally Chirashino CC is set as the destination.

【００２４】ここで、ユーザが発声し認識された音声に
ついては、その音声の単語パターンが特定話者用の標準
パターンとして、特定辞書１３６ｂとＲＡＭ１６４に格
納される。なお、図３（ｇ）において、ユーザが「千葉
県」と発声せずに、「北海道」と発声した場合、北海道
という単語の不特定辞書はまだ作成されていないので、
認識することができない。音声「北海道」についての認
識を可能にするためには、図３（ｃ）のゴルフ場画面ま
たは他の画面に表示されている「北海道」のキーを一度
選択する必要がある。Here, with respect to the voice recognized and uttered by the user, the word pattern of the voice is stored in the specific dictionary 136b and the RAM 164 as a standard pattern for a specific speaker. In FIG. 3G, if the user does not say "Chiba Prefecture" but "Hokkaido", the unspecified dictionary of the word Hokkaido has not been created yet.
I can't recognize. In order to enable recognition of the voice "Hokkaido", it is necessary to once select the "Hokkaido" key displayed on the golf course screen of FIG. 3C or another screen.

【００２５】次に動作の詳細について説明する。図４
は、辞書作成および作成した辞書による音声認識の動作
を表したフローチャートである。まず、制御部１０は、
入力管理部３５にデータが入力されたか否かを判断し、
入力があった場合（ステップ３１；Ｙ）、入力されたデ
ータがディスプレイ１１のタッチパネルまたはスイッチ
１１ｂからの入力か否かを判断する（ステップ３２）。Next, details of the operation will be described. FIG.
FIG. 6 is a flowchart showing an operation of dictionary creation and voice recognition by the created dictionary. First, the control unit 10
It is determined whether or not data is input to the input management unit 35,
When there is an input (step 31; Y), it is determined whether the input data is an input from the touch panel of the display 11 or the switch 11b (step 32).

【００２６】タッチパネル等からの入力である場合（ス
テップ３２；Ｙ）、入力管理部３５は、入力されたキー
の使用が初めてか否かを判断することで、選択対象とな
ったキーを表す単語の標準パターンが不特定辞書にある
か否かを判断する（ステップ３３）。選択対象の不特定
辞書がない場合（ステップ３３；Ｙ）、音声認識部１６
は、選択対象の不特定辞書を作成する（ステップ３
４）。すなわち、音声認識部１６のＲＯＭ１６８は、入
力管理部３５から供給されるタッチパネル等入力信号
を、対応する単語を示す音素指定信号に変換して辞書管
理部１６９に供給する。辞書管理部１６９は、供給され
た音素指定信号により辞書部１６７から各音素の音素パ
ターンを読み出し、ステップ３２で入力されたキーを表
す単語に対する不特定話者用辞書を作成して不特定辞書
１６３に格納する。In the case of input from the touch panel or the like (step 32; Y), the input management unit 35 determines whether or not the input key has been used for the first time, and thereby the word representing the selected key is displayed. It is judged whether or not the standard pattern of is in the unspecified dictionary (step 33). If there is no unspecified dictionary to be selected (step 33; Y), the voice recognition unit 16
Creates an unspecified dictionary to be selected (step 3
4). That is, the ROM 168 of the voice recognition unit 16 converts the input signal such as the touch panel supplied from the input management unit 35 into a phoneme designation signal indicating the corresponding word and supplies the phoneme designation signal to the dictionary management unit 169. The dictionary management unit 169 reads out the phoneme pattern of each phoneme from the dictionary unit 167 according to the supplied phoneme designating signal, creates the unspecified speaker dictionary for the word representing the key input in step 32, and creates the unspecified dictionary 163. To store.

【００２７】ステップ３４において不特定辞書が作成さ
れた後、または、選択対象の不特定辞書がある場合（ス
テップ３３；Ｎ）、入力管理部３５は、タッチパネル等
からの入力を全体管理部３７に供給する（ステップ３
５）。全体管理部３７は、タッチパネル等からの入力に
応じて、画面管理部３４を制御し、つぎ認識対象語彙を
選択するための画面に切り換えて（ステップ３６）、メ
インルーチンにリターンする。After the unspecified dictionary is created in step 34, or when there is an unspecified dictionary to be selected (step 33; N), the input management unit 35 inputs the input from the touch panel or the like to the general management unit 37. Supply (Step 3)
5). The overall management unit 37 controls the screen management unit 34 in response to an input from the touch panel or the like, switches to a screen for selecting the next vocabulary to be recognized (step 36), and returns to the main routine.

【００２８】一方、ステップ３２において、タッチパネ
ル等からの入力ではなく音声入力である場合（；Ｙ）、
音声認識部１６は、ユーザによって発声された音声をマ
イク２４から入力し（ステップ３７）、入力音声につい
ての音声認識を行う（ステップ３８）。すなわち、音声
認識部１６は、入力された音声データをＡ／Ｄ変換部１
６１でディジタルデータに変換し、特徴抽出部１６２で
入力音声の単語パターンを抽出して、パターンマッチン
グ部１６５と辞書管理部１６９に供給する。On the other hand, in step 32, when the input is not a touch panel input but voice input (; Y),
The voice recognition unit 16 inputs the voice uttered by the user from the microphone 24 (step 37), and performs voice recognition on the input voice (step 38). That is, the voice recognition unit 16 converts the input voice data into the A / D conversion unit 1
The data is converted into digital data at 61, the word pattern of the input voice is extracted at the feature extraction unit 162, and the word pattern is supplied to the pattern matching unit 165 and the dictionary management unit 169.

【００２９】パターンマッチング部１６５では、入力音
声の単語パターンと、まず、ＲＡＭ１６４に格納されて
いる各単語についての最新の標準パターンとを比較し、
各単語との類似度を算出し、類似度が最も高い単語を入
力された音声であると認識する。いずれの単語との類似
度も、所定の閾値以下であった場合には認識不能である
ため、次に特定辞書１６３ｂに格納されている各単語の
標準パターンと比較して類似度を算出し、閾値よりも大
きくて最も類似度が高い単語を音声であると認識する。
特定辞書１６３ｂの標準パターンとの類似度もすべて閾
値以下である場合には、さらに、特定辞書１６３ａに格
納されている、不特定話者用の全標準パターンと比較
し、閾値よりも大きくて最も類似度が高い単語を音声で
あると認識し、いずれも閾値以下である場合には、入力
された音声について認識不可能であると判断する。パタ
ーンマッチング部１６５による認識結果がでると、認識
結果処理部１６６では、認識内容に従って音声入力信号
を生成し、演算部１０の入力管理部３５に供給する。The pattern matching unit 165 first compares the word pattern of the input voice with the latest standard pattern for each word stored in the RAM 164.
The degree of similarity with each word is calculated, and the word with the highest degree of similarity is recognized as the input voice. If the similarity with any of the words is less than or equal to a predetermined threshold value, it cannot be recognized. Therefore, the similarity is calculated by comparing with the standard pattern of each word stored in the specific dictionary 163b. Recognize that the word that is larger than the threshold and has the highest degree of similarity is speech.
If all the similarities to the standard pattern of the specific dictionary 163b are also less than or equal to the threshold value, further comparison with all the standard patterns for the unspecified speaker stored in the specific dictionary 163a is performed, and the similarity is larger than the threshold value. A word having a high degree of similarity is recognized as a voice, and if both are below a threshold, it is determined that the input voice cannot be recognized. When the recognition result by the pattern matching unit 165 is obtained, the recognition result processing unit 166 generates a voice input signal according to the recognition content and supplies it to the input management unit 35 of the calculation unit 10.

【００３０】そして、制御部１０の全体管理部３７は、
入力管理部３５に供給された認識結果が認識不能である
場合（ステップ３９；Ｎ）、画面管理部３４を制御し
て、入力された音声に対応する辞書が存在しない旨の表
示をディスプレイ１１ａに表示して（ステップ４０）、
メインルーチンにリターンする。一方、標準パターンと
の類似度が閾値よりも大きく、入力音声の認識が成功し
た場合（ステップ３９；Ｙ）、特定辞書の更新を行う
（ステップ４１）。すなわち、認識結果処理部１６６は
認識した単語に対応する音声入力信号を辞書管理部１６
９にも供給する。辞書管理部１６９では、音声入力信号
が供給されると、ＲＡＭ１６４のその音声入力信号に対
応した格納エリアを、特徴抽出部１６２から供給された
単語パターンに更新する。更に、辞書管理部１６９は、
特徴抽出部１６２から供給された単語パターンを特定辞
書１６３ｂの対応する格納エリアに格納するか、また
は、その格納エリアに格納されている最も古い標準パタ
ーンを供給された単語パターンに書き換える。The overall management unit 37 of the control unit 10 is
When the recognition result supplied to the input management unit 35 is unrecognizable (step 39; N), the screen management unit 34 is controlled to display on the display 11a that there is no dictionary corresponding to the input voice. Display (step 40),
Return to the main routine. On the other hand, when the similarity with the standard pattern is larger than the threshold and the recognition of the input voice is successful (step 39; Y), the specific dictionary is updated (step 41). That is, the recognition result processing unit 166 outputs the voice input signal corresponding to the recognized word to the dictionary management unit 16
9 is also supplied. When the voice input signal is supplied, the dictionary management unit 169 updates the storage area of the RAM 164 corresponding to the voice input signal with the word pattern supplied from the feature extraction unit 162. Further, the dictionary management unit 169
The word pattern supplied from the feature extraction unit 162 is stored in the corresponding storage area of the specific dictionary 163b, or the oldest standard pattern stored in the storage area is rewritten to the supplied word pattern.

【００３１】ステップ４１において、特定辞書の更新が
終了すると、ステップ３５に移行し、全体管理部３７
は、マイク２４から入力され音声認識部１６で認識され
た音声の音声入力信号を、全体管理部３７に供給する。
そして、全体管理部３７は、音声入力信号に応じて、画
面管理部３４を制御し、つぎ認識対象語彙を選択するた
めの画面に切り換えて（ステップ３６）、メインルーチ
ンにリターンする。When the updating of the specific dictionary is completed in step 41, the process proceeds to step 35, and the overall management unit 37
Supplies the voice input signal of the voice input from the microphone 24 and recognized by the voice recognition unit 16 to the overall management unit 37.
Then, the overall management unit 37 controls the screen management unit 34 according to the voice input signal, switches to the screen for selecting the next vocabulary to be recognized (step 36), and returns to the main routine.

【００３２】なお、以上説明した実施例では、ＲＡＭ１
６４と不特定辞書１６３ａおよび特定辞書１６３は、表
示部１１で押下可能なスイッチと指定キーの数だけの単
語を格納するエリアが確保されていたが、これよりも少
ない数、例えば半分や１／３の格納エリアとしてもよ
い。この場合、辞書管理部１６９は、エリア不足になっ
た場合に対応して、各エリアに格納された標準パターン
が示す単語の使用頻度を計数するようにしておき、最も
使用頻度が少ない格納エリアをクリアし、新しい単語パ
ターン、または標準パターンの入力用に確保するように
する。これにより、認識率の低下を抑えつつ、記憶容量
を減らすことができる。また、パターンマッチング分１
６５で比較する辞書数が減少するので、認識速度も向上
させることができる。また、認識語彙数や辞書バッファ
等の記憶容量に制約がある場合でも、使用によって辞書
の再構築を行うことで効率の良いシステムとすることが
できる。In the embodiment described above, the RAM 1
64, the unspecified dictionary 163a, and the specified dictionary 163 have an area for storing words corresponding to the number of switches and designated keys that can be pressed on the display unit 11, but a smaller number, for example, half or 1 / It may be a storage area of 3. In this case, the dictionary management unit 169 counts the usage frequency of the words indicated by the standard pattern stored in each area in response to the area shortage, and determines the storage area with the least usage frequency. Clear and set aside for entering new word patterns or standard patterns. As a result, it is possible to reduce the storage capacity while suppressing a decrease in the recognition rate. Also, pattern matching 1
Since the number of dictionaries compared at 65 is reduced, the recognition speed can also be improved. Further, even when the number of recognized vocabularies and the storage capacity of the dictionary buffer and the like are limited, an efficient system can be realized by reconstructing the dictionary by using the dictionary.

【００３３】また、ＲＯＭ１６８から出力される音素指
定信号を出力したが、その音素指定信号としてテキスト
文字列を示すデータを使用してもよい。Although the phoneme designating signal output from the ROM 168 is output, data indicating a text character string may be used as the phoneme designating signal.

【００３４】また、その音声の入力によって表示される
画面を特定する画面ポインタを、認識音声に対する標準
パターンの各々に対応付けて記憶することで、ダイレク
トに画面を呼び出すことができるようにしてもよい。例
えば、「目的地設定」という音声が認識された場合に対
応する画面として図３（ｂ）が表示される場合には、音
声「目的地設定」の標準パターンに対応付けて図３
（ｂ）を特定する画面ポインタが併せて記憶される。同
様に、音声「ゴルフ場」の標準パターンに対応付けて図
３（ｃ）の画面ポインタが、音声「千葉県」の標準パタ
ーンに対応付けて図３（ｄ）の画面ポインタが併せて記
憶される。以後は、目的地設定、ゴルフ場、千葉県、習
志野ＣＣの順に音声の入力と認識を行わなくても、「ゴ
ルフ場」と音声入力することで、画面ポインタから図３
（ｃ）の画面が表示される。その後、「習志野ＣＣ」と
音声入力することで、習志野ＣＣが目的地として設定さ
れ、入力処理が容易になる。さらに、音声の標準パター
ンに画面を特定する画面ポインタを併せて記憶するので
はなく、音声の標準パターンに、一連の入力処理手順や
入力履歴を併せて格納するようにしてもよい。これによ
って、図３の例の場合、直接「習志野ＣＣ」と音声を出
力するだけで目的地設定が終了するようになり、より入
力処理を容易かつ迅速に行うことができる。Further, the screen pointer for specifying the screen displayed by the input of the voice may be stored in association with each of the standard patterns for the recognized voice so that the screen can be called directly. . For example, in the case where FIG. 3B is displayed as a screen corresponding to the case where the voice "destination setting" is recognized, FIG. 3 is displayed in association with the standard pattern of the voice "destination setting".
A screen pointer that specifies (b) is also stored. Similarly, the screen pointer of FIG. 3C is stored in association with the standard pattern of the voice “Golf course”, and the screen pointer of FIG. 3D is also stored in association with the standard pattern of the voice “Chiba Prefecture”. It After that, even if voice input and recognition are not performed in order of destination setting, golf course, Chiba prefecture, and Narashino CC, by voice inputting "golf course", the screen pointer shown in FIG.
The screen of (c) is displayed. Then, by voice-inputting “Narashino CC”, Narashino CC is set as the destination, and the input process becomes easy. Further, instead of storing the standard pattern of voice together with the screen pointer for specifying the screen, a series of input processing procedures and input history may be stored together with the standard pattern of voice. As a result, in the case of the example in FIG. 3, the destination setting is completed by directly outputting the voice "Narashino CC", and the input processing can be performed more easily and quickly.

【００３５】[0035]

【発明の効果】本発明の音声認識装置によれば、辞書の
記憶容量を効率的に使用でき、操作性を向上させること
ができる。According to the voice recognition apparatus of the present invention, the storage capacity of the dictionary can be used efficiently and the operability can be improved.

[Brief description of drawings]

【図１】本発明の一実施例に係る音声認識装置をナビゲ
ーション装置に適用した場合のシステム構成図である。FIG. 1 is a system configuration diagram when a voice recognition device according to an embodiment of the present invention is applied to a navigation device.

【図２】同上、音声認識部の構成を示すブロック図であ
る。FIG. 2 is a block diagram showing a configuration of a voice recognition unit of the above.

【図３】同上、ユーザによるナビゲーション装置の使用
状態を表す説明図である。FIG. 3 is an explanatory diagram showing a usage state of a navigation device by a user.

【図４】同上、辞書作成および作成した辞書による音声
認識の動作を表したフローチャートである。FIG. 4 is a flowchart showing an operation of creating a dictionary and voice recognition by the created dictionary.

[Explanation of symbols]

１０演算部１１表示部１１ａディスプレイ１３現在１測定部１５地図情報記憶部１６音声認識部１６１Ａ／Ｄ変換部１６２特徴抽出部１６３辞書バッファ１６４ＲＡＭ１６５パターンマッチング部１６６認識結果処理部１６７辞書部１６８ＲＯＭ１６９辞書管理部１７音声出力部２４マイク３３地図管理部３４画面管理部３５入力管理部３７全体管理部 10 arithmetic unit 11 display unit 11a display 13 present 1 measurement unit 15 map information storage unit 16 voice recognition unit 161 A / D conversion unit 162 feature extraction unit 163 dictionary buffer 164 RAM 165 pattern matching unit 166 recognition result processing unit 167 dictionary unit 168 ROM 169 Dictionary management unit 17 Voice output unit 24 Microphone 33 Map management unit 34 Screen management unit 35 Input management unit 37 Overall management unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ０９Ｂ 29/10 Ｇ０９Ｂ 29/10 Ａ // Ｇ０１Ｃ 21/00 Ｇ０１Ｃ 21/00 Ｈ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Office reference number FI Technical display location G09B 29/10 G09B 29/10 A // G01C 21/00 G01C 21/00 H

Claims

[Claims]

1. A selection unit for selecting a predetermined operation, a phoneme dictionary in which phoneme patterns representing features of each phoneme that constitutes a word representing an operation selectable by the selection unit are stored in advance, and a standard pattern is provided. Dictionary storing means to be stored, standard pattern creating means for creating a standard pattern of a word representing the operation selected by the selecting means from the phoneme patterns stored in the phoneme dictionary, and the standard pattern creating means. An unspecified dictionary storage means for storing the standard pattern in the dictionary storage means, a voice input means for inputting a voice, and a word for creating a corresponding word pattern by extracting features of the voice input by the voice input means. From the pattern creating means and the degree of similarity between the word pattern created by the word pattern creating means and the standard pattern stored in the dictionary storage means, Serial speech recognition means for recognizing speech inputted by the input means, the speech recognition apparatus characterized by comprising output means for outputting a recognition result by the recognizing means.

2. The specific dictionary storage means for storing the word pattern created by the word pattern creation means in the dictionary storage means as a standard pattern for a specific speaker, according to claim 1. Voice recognition device.

3. When the number of standard patterns stored in the dictionary storage means reaches a predetermined number, the unspecified dictionary storage means and the specified dictionary storage means delete the standard pattern with the least recognition frequency by the recognition means. The voice recognition device according to claim 1 or 2, wherein

4. The voice recognition device according to claim 1, wherein the voice recognition device is used as an input means of a navigation device.