JPH1049194A

JPH1049194A - Voice recognition device

Info

Publication number: JPH1049194A
Application number: JP8216752A
Authority: JP
Inventors: Koji Hori; 孝二堀
Original assignee: Equos Research Co Ltd
Current assignee: Equos Research Co Ltd
Priority date: 1996-07-30
Filing date: 1996-07-30
Publication date: 1998-02-20

Abstract

PROBLEM TO BE SOLVED: To efficiently recognize voices by appropriately classifying the contents of a voice dictionary. SOLUTION: When a voice inputting is conducted in the voice recognition device, an uttering is generally conducted at a constant speed. Therefore, the uttering time is comparatively constant for a same word regardless of the speakers. Using this idea, a word dictionary is classified into plural independent dictionary groups against each word, which becomes the object of voice recognition, based on the information including the number of the characters of the word and a voice continuation time (a measurement is made for a general average value). If the dictionary is classified by the number of characters, a pattern matching section 165 specifies the number of characters of the voice inputted from a microphone 24 from the voice segment data (a voice continuation time) detected in the segment detection of a preprocessing section 161. Then, a successive pattern matching is conducted from the individual dictionary having the number of characters which is close to the number of the characters of the specified input voice with a priority.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識装置に係
り、例えば、車両用のナビゲーション装置における入力
装置等として使用される音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device, and more particularly to a speech recognition device used as an input device in a vehicle navigation device.

【０００２】[0002]

【従来の技術】人間の話した音声を言葉として認識する
音声認識装置が各種方面で実用化されている。この音声
認識装置は、例えば、工場における各種装置に対応する
指示をはなれた場所から音声で指示する入力装置として
実用化されており、また、自動車のナビゲーション装置
において、目的地や指示情報等を音声入力する場合の音
声入力装置として用いることが考えられている。このよ
うな音声認識装置では、一般に入力された音声を特定す
るために、予め認識対象となる音声の周波数分布を分析
することで、例えば、スペクトルや基本周波数の時系列
情報等を入力音声の特徴量として抽出し、そのパターン
を各単語に対応させて格納する音声認識用辞書を備えて
いる。2. Description of the Related Art Speech recognition devices for recognizing speech spoken by humans as words have been put to practical use in various fields. This voice recognition device has been put to practical use as an input device for giving a voice instruction from a place where an instruction corresponding to various devices in a factory has been released. It is considered to be used as a voice input device when inputting. Such a speech recognition apparatus generally analyzes the frequency distribution of the speech to be recognized in advance in order to identify the input speech, and for example, analyzes the spectrum and time-series information of the fundamental frequency to obtain the characteristics of the input speech. It has a speech recognition dictionary that extracts as quantities and stores the patterns in association with each word.

【０００３】そして、認識するべき音声が入力される
と、入力された音声の周波数パターンと音声認識用辞書
に格納された各単語のパターンをパターンマッチングに
より比較照合し、各単語に対する類似度を算出する。次
に算出された類似度が最も高い単語（パターンが最も近
い単語）を、入力された音声であると認識し、その単語
を出力するようにしている。つまり、入力された単語の
周波数分布のパターンがどの単語パターンに最もよく似
ているかを調べることによって、入力音声を判定してい
る。When a voice to be recognized is input, the frequency pattern of the input voice is compared with the pattern of each word stored in the voice recognition dictionary by pattern matching, and the similarity for each word is calculated. I do. Next, the word having the highest calculated similarity (the word having the closest pattern) is recognized as the input voice, and the word is output. That is, the input voice is determined by checking which word pattern most closely matches the pattern of the frequency distribution of the input word.

【０００４】音声認識装置において使用される音声認識
用辞書は、通常マッチング処理時間との関係から、通常
１０００単語程度で構成されている。１０００以上の単
語についての認識が必要な場合には、グループ毎に単語
を分けた複数の辞書を用意し、アプリケーションプログ
ラムによって辞書を切り替えて、マッチングを行う必要
があり、その切り替えをどのように行うかが問題にな
る。[0004] A dictionary for speech recognition used in a speech recognition apparatus is usually composed of about 1000 words in view of the relationship with the normal matching processing time. If it is necessary to recognize more than 1000 words, it is necessary to prepare a plurality of dictionaries in which the words are divided for each group, switch the dictionaries using an application program, and perform matching. Is a problem.

【０００５】ところで、音声認識装置を車載用のナビゲ
ーション装置に適用した技術に特開平７−６４４８０号
公報に記載された、車載情報処理用音声認識装置があ
る。この音声認識装置では、音声辞書に登録されている
ナビゲーション装置用の地図の表示内容に係る地名や施
設名などの語彙とを比較照合して入力語を認識する際、
音声辞書に登録されている語彙が大量になっても、音声
による入力語の音声認識率を効率よく迅速に行わせると
ともに、類似語による誤認識の確率を低減すしている。
そためのに、このナビゲーション装置では、音声辞書の
登録内容を地域に応じてグループ分けしたうえで、ナビ
ゲーション装置によって求められている車両の現在位置
に対する距離にもとづいて、入力語を認識する際に用い
る音声辞書のグループを優先順位をもって決定するよう
にしている。[0005] A technology in which the voice recognition device is applied to a vehicle-mounted navigation device is a voice recognition device for in-vehicle information processing described in Japanese Patent Application Laid-Open No. 7-64480. In this voice recognition device, when recognizing an input word by comparing and collating with a vocabulary such as a place name or a facility name related to a display content of a map for a navigation device registered in a voice dictionary,
Even if the vocabulary registered in the speech dictionary becomes large, the speech recognition rate of the input words by speech is efficiently and quickly performed, and the probability of erroneous recognition by similar words is reduced.
For this purpose, this navigation device divides the registered contents of the voice dictionary into groups according to the region, and recognizes input words based on the distance to the current position of the vehicle obtained by the navigation device. The voice dictionary group to be used is determined with priority.

【０００６】[0006]

【発明が解決しようとする課題】しかし、前記公報に記
載された音声認識装置では、音声辞書の優先順位決定指
標が現在位置であるため、現在位置から目的地の入力語
の位置座標との距離が離れているほど音声辞書の切替え
回数が増える。また、地名で代表されるような、広大な
敷地の目的地であれば音声辞書の切替え回数は少なくて
よいが、商店や個人宅のような市街地図のように詳細な
地図にしか記載されていない目的地を入力した場合は、
地図の詳細度の低い音声辞書から詳細度の高い音声辞書
へ順次音声辞書を切替える必要性があり、かえって検索
に時間を要していた。However, in the speech recognition device described in the above publication, since the priority determination index of the speech dictionary is the current position, the distance from the current position to the position coordinates of the input word of the destination is determined. The more distant, the more times the voice dictionary is switched. Also, if the destination is a vast site such as a place name, the number of times the voice dictionary is switched may be small, but it is described only on a detailed map such as a city map such as a store or a private house. If you enter a destination that is not
It is necessary to sequentially switch the voice dictionary from a voice dictionary having a low level of detail to a voice dictionary having a high level of detail, and it takes time to search.

【０００７】本発明の目的は、音声辞書の内容を適切に
分類することにより、効率的に音声を認識することが可
能な音声認識装置を提供することにある。An object of the present invention is to provide a speech recognition device capable of efficiently recognizing speech by appropriately classifying the contents of a speech dictionary.

【０００８】[0008]

【課題を解決するための手段】請求項１に記載した発明
では、認識対象となる複数の単語の標準パターンを、そ
の文字数の区別が可能な状態に格納した単語辞書と、音
声を入力する音声入力手段と、この音声入力手段から入
力された音声についての文字数を特定する文字数特定手
段と、この文字数特定手段で特定された文字数に応じ
て、単語の標準パターンを前記単語辞書から選択する単
語辞書選択手段と、前記音声入力手段から入力された音
声についての特徴を抽出する特徴抽出手段と、この特徴
抽出手段で抽出された特徴と、前記単語辞書選択手段で
選択された標準パターンとの類似度を算出する類似度算
出手段と、この類似度算出手段で算出された類似度か
ら、入力された音声を判定する判定手段と、を音声認識
装置に具備させて、前記目的を達成する。請求項２に記
載の発明では、認識対象となる複数の単語の標準パター
ンを、その音声継続時間が区別可能な状態に格納した単
語辞書と、音声を入力する音声入力手段と、この音声入
力手段から入力された音声についての音声継続時間を検
出する音声継続時間検出手段と、この音声継続時間検出
手段で検出された音声継続時間と所定時間間隔内にある
単語の標準パターンを前記単語辞書から選択する単語辞
書選択手段と、前記音声入力手段から入力された音声に
ついての特徴を抽出する特徴抽出手段と、この特徴抽出
手段で抽出された特徴と、前記単語辞書選択手段で選択
された標準パターンとの類似度を算出する類似度算出手
段と、この類似度算出手段で算出された類似度から、入
力された音声を判定する判定手段と、を音声認識装置に
具備させて、前記目的を達成する。請求項３に記載の発
明では、認識対象となる複数の単語の標準パターンを、
その文字数の区別が可能な状態に格納した単語辞書と、
音声を入力する音声入力手段と、この音声入力手段から
入力された音声についての文字数を特定する文字数特定
手段と、前記音声入力手段から入力された音声について
の特徴を抽出する特徴抽出手段と、この特徴抽出手段で
抽出された特徴と、前記単語辞書に格納された標準パタ
ーンとの類似度を算出する類似度算出手段と、この類似
度算出手段で算出された各単語の類似度に、前記文字数
特定手段で特定された文字数に応じた重み付けを行う重
み付け手段と、この重み付け手段で、重み付けした後の
類似度から、入力された音声を判定する判定手段と、を
音声認識装置に具備させて、前記目的を達成する。請求
項４に記載の発明では、認識対象となる複数の単語の標
準パターンを、その音声継続時間が区別可能な状態に格
納した単語辞書と、音声を入力する音声入力手段と、こ
の音声入力手段から入力された音声についての音声継続
時間を検出する音声継続時間検出手段と、前記音声入力
手段から入力された音声についての特徴を抽出する特徴
抽出手段と、この特徴抽出手段で抽出された特徴と、前
記単語辞書に格納された標準パターンとの類似度を算出
する類似度算出手段と、この類似度算出手段で算出され
た各単語の類似度に、前記音声継続時間検出手段で検出
された音声継続時間に応じた重み付けを行う重み付け手
段と、この重み付け手段で、重み付けした後の類似度か
ら、入力された音声を判定する判定手段と、を音声認識
装置に具備させて、前記目的を達成する。According to the first aspect of the present invention, there is provided a word dictionary in which standard patterns of a plurality of words to be recognized are stored in a state where the number of characters can be distinguished, and a voice for inputting voice. Input means, a character number specifying means for specifying the number of characters in the voice input from the voice input means, and a word dictionary for selecting a standard pattern of words from the word dictionary according to the number of characters specified by the character number specifying means Selecting means, a feature extracting means for extracting a feature of the voice input from the voice input means, and a similarity between the feature extracted by the feature extracting means and the standard pattern selected by the word dictionary selecting means. The speech recognition apparatus comprises: a similarity calculation unit that calculates the similarity; and a determination unit that determines the input voice from the similarity calculated by the similarity calculation unit. To achieve the purpose. According to the second aspect of the present invention, a word dictionary storing standard patterns of a plurality of words to be recognized in a state where their voice durations can be distinguished, voice input means for inputting voice, and voice input means And a standard pattern of a word within a predetermined time interval between the voice duration detected by the voice duration detection means and a voice duration detected by the voice duration detection means. A word dictionary selecting means, a feature extracting means for extracting a feature of a voice input from the voice input means, a feature extracted by the feature extracting means, and a standard pattern selected by the word dictionary selecting means. The speech recognition apparatus includes: a similarity calculating unit that calculates the similarity of the speech recognition unit; and a determination unit that determines the input voice based on the similarity calculated by the similarity calculating unit. So it achieves the above object. In the invention according to claim 3, the standard pattern of a plurality of words to be recognized is
A word dictionary stored in a state where the number of characters can be distinguished,
Voice input means for inputting voice, character number specifying means for specifying the number of characters of the voice input from the voice input means, feature extracting means for extracting features of the voice input from the voice input means, A similarity calculating means for calculating the similarity between the feature extracted by the feature extracting means and the standard pattern stored in the word dictionary; and the similarity of each word calculated by the similarity calculating means, A weighting means for weighting according to the number of characters specified by the specifying means, and a determination means for determining the input voice from the similarity after weighting by the weighting means, the voice recognition device comprising: The above objective is achieved. According to the fourth aspect of the present invention, a word dictionary in which standard patterns of a plurality of words to be recognized are stored in such a manner that their voice durations can be distinguished, voice input means for inputting voice, and voice input means A voice duration detecting means for detecting a voice duration of the voice input from the voice input unit; a feature extracting means for extracting a feature of the voice input from the voice input means; and a feature extracted by the feature extracting means. A similarity calculating means for calculating a similarity with the standard pattern stored in the word dictionary; and a similarity calculated for each word calculated by the similarity calculating means, the voice detected by the voice duration detecting means. The speech recognition device includes: a weighting unit that performs weighting according to the duration; and a determination unit that determines the input voice from the similarity after weighting by the weighting unit. Te, to achieve the above purpose.

【０００９】[0009]

【発明の実施の形態】以下、本発明の音声認識装置にお
ける実施形態を図１ないし図３を参照して詳細に説明す
る。（１）実施形態の概要本実施形態の音声認識装置では、日常会話ではしゃべる
速度が区々であっても、音声入力を行う場合には一般に
一定の速度で発声されるため、同一単語については、話
者に関わりなくその発声時間が比較的一定である、とい
うことに着目したものである。以上の点を利用して、音
声認識の対象となる各単語について、その単語の文字数
や、音声継続時間（一般的な平均値を測定する）等の情
報に基づいて、単語辞書を複数の個別辞書にグループ分
け（分類）する。そして、文字数で辞書を分類した場合
であれば、入力された音声の文字数に近い文字数の個別
辞書から優先的に順次パターンマッチングを行うように
する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the speech recognition apparatus according to the present invention will be described below in detail with reference to FIGS. (1) Overview of the Embodiment In the speech recognition apparatus of the present embodiment, even if the speaking speed varies in daily conversation, the voice is generally uttered at a constant speed when performing voice input. And that the utterance time is relatively constant regardless of the speaker. Utilizing the above points, for each word to be subjected to speech recognition, the word dictionary is divided into a plurality of individual words based on information such as the number of characters of the word and the speech duration (measure a general average value). Group (classify) in a dictionary. If the dictionaries are classified according to the number of characters, pattern matching is performed sequentially with priority from the individual dictionaries having the number of characters close to the number of characters of the input voice.

【００１０】（２）実施形態の詳細図１は本発明の一実施形態に係る音声認識装置をナビゲ
ーション装置に適用した場合のシステム構成を表したも
のである。このナビゲーション装置は、演算部１０を備
えている。この演算部１０には、タッチパネルとして機
能するディスプレイ１１ａとこのディスプレイ１１ａの
周囲に設けられた操作用のスイッチ１１ｂとを含む表示
部１１と、この表示部１１のタッチパネルやスイッチ１
１ｂからの入力を管理するスイッチ入力類管理部１２が
接続されている。(2) Details of Embodiment FIG. 1 shows a system configuration when a voice recognition device according to one embodiment of the present invention is applied to a navigation device. This navigation device includes a calculation unit 10. The calculation unit 10 includes a display unit 11 including a display 11a functioning as a touch panel and an operation switch 11b provided around the display 11a, and a touch panel and a switch 1 of the display unit 11
A switch input class management unit 12 for managing the input from 1b is connected.

【００１１】スイッチ１１ｂには、ナビゲーションのメ
ニュー画面を指定するスイッチ、エアコンの調整用のス
イッチ、オーディオの操作を行うためのスイッチ等の各
種スイッチがある。これらのスイッチを押すと、対応す
るメニュー画面がディスプレイ１１ａに表示されるよう
になっている。The switches 11b include various switches such as a switch for designating a menu screen for navigation, a switch for adjusting an air conditioner, and a switch for operating audio. When these switches are pressed, the corresponding menu screen is displayed on the display 11a.

【００１２】演算部１０には、現在位置測定部１３と、
速度センサ１４と、地図情報記憶部１５と、本実施形態
おける音声認識部１６と、音声出力部１７とが接続され
ている。現在位置測定部１３は、緯度と経度による座標
データを検出することで、車両が現在走行または停止し
ている現在位置を検出する。この現在位置測定部１３に
は、人工衛星を利用して車両の位置を測定するＧＰＳ(G
lobal Positioning System)レシーバ２１と、路上に配
置されたビーコンからの位置情報を受信するビーコン受
信装置２０と、方位センサ２２と、距離センサ２３とが
接続され、現在位置測定部１３はこれらからの情報を用
いて車両の現在位置を測定するようになっている。The arithmetic unit 10 includes a current position measuring unit 13 and
The speed sensor 14, the map information storage unit 15, the voice recognition unit 16 in the present embodiment, and the voice output unit 17 are connected. The current position measuring unit 13 detects the current position where the vehicle is currently running or stopped by detecting coordinate data based on latitude and longitude. The current position measuring unit 13 has a GPS (G
(lobal Positioning System) A receiver 21, a beacon receiving device 20 for receiving position information from a beacon placed on the road, an azimuth sensor 22, and a distance sensor 23 are connected, and the current position measurement unit 13 receives information from these. Is used to measure the current position of the vehicle.

【００１３】方位センサ２２は、例えば、地磁気を検出
して車両の方位を求める地磁気センサ、車両の回転角速
度を検出しその角速度を積分して車両の方位を求めるガ
スレートジャイロや光ファイバジャイロ等のジャイロ、
左右の車輪センサを配置しその出力パルス差（移動距離
の差）により車両の旋回を検出することで方位の変位量
を算出するようにした車輪センサ、等が使用される。距
離センサ２３は、例えば、車輪の回転数を検出して計数
し、または加速度を検出して２回積分するもの等の各種
の方法が使用される。なお、ＧＰＳレシーバ２１とビー
コン受信装置２０は単独で位置測定が可能であるが、Ｇ
ＰＳレシーバ２１やビーコン受信装置２０による受信が
不可能な場所では、方位センサ２２と距離センサ２３の
双方を用いた推測航法によって現在位置を検出するよう
になっている。The azimuth sensor 22 is, for example, a geomagnetic sensor for detecting the terrestrial magnetism to determine the azimuth of the vehicle, a gas rate gyro or an optical fiber gyro for detecting the angular velocity of the vehicle and integrating the angular velocity to determine the azimuth of the vehicle. gyro,
A wheel sensor or the like is used in which left and right wheel sensors are disposed, and a displacement of the azimuth is calculated by detecting turning of the vehicle based on an output pulse difference (difference in moving distance). As the distance sensor 23, for example, various methods such as a method of detecting and counting the number of rotations of a wheel, or a method of detecting acceleration and integrating twice are used. Note that the GPS receiver 21 and the beacon receiving device 20 can perform position measurement independently,
In a place where reception by the PS receiver 21 or the beacon receiving device 20 is not possible, the current position is detected by dead reckoning navigation using both the direction sensor 22 and the distance sensor 23.

【００１４】地図情報記憶部１５は、例えばＣＤＲＯＭ
等の大容量記憶装置で構成されている。この地図情報記
憶部１５には、目的地までの経路探索に必要な道路デー
タや、探索した経路をディスプレイ１１ａに表示するた
めの地図データ等の、経路探索および経路案内に必要な
各種データが格納されている。また、地図情報記憶部１
５には、公共施設、ガソリンスタンド、公園、等の目的
地として設定可能な各種建造物や地点についての名称
と、その位置を示す座標データ（緯度、経度）からな
る、目的地データが格納されている。音声認識部１６に
は、音声が入力されるマイク２４が接続されている。音
声出力部１７は、音声を電気信号として出力する音声出
力用ＩＣ２６と、この音声出力用ＩＣ２６の出力をディ
ジタル−アナログ変換するＤ／Ａコンバータ２７と、変
換されたアナログ信号を増幅するアンプ２８とを備えて
いる。アンプ２８の出力端にはスピーカ２９が接続され
ている。The map information storage unit 15 is, for example, a CDROM.
And the like. The map information storage unit 15 stores various data necessary for route search and route guidance, such as road data necessary for route search to the destination and map data for displaying the searched route on the display 11a. Have been. Also, the map information storage unit 1
5 stores destination data including names of various buildings and points that can be set as destinations such as public facilities, gas stations, parks, and the like, and coordinate data (latitude and longitude) indicating the positions. ing. The voice recognition unit 16 is connected to a microphone 24 to which voice is input. The audio output unit 17 includes an audio output IC 26 for outputting audio as an electric signal, a D / A converter 27 for digital-to-analog conversion of the output of the audio output IC 26, and an amplifier 28 for amplifying the converted analog signal. It has. A speaker 29 is connected to an output terminal of the amplifier 28.

【００１５】演算部１０は、ＣＰＵ（中央処理装置）、
ＲＯＭ（リード・オンリ・メモリ）、ＲＡＭ（ランダム
・アクセス・メモリ）等を備え、ＣＰＵがＲＡＭをワー
キングエリアとして、ＲＯＭまたは外部記憶装置に格納
されたプログラムを実行することによって、上記の各構
成を実現するようになっている。すなわち、演算部１０
は、速度センサ１４および地図情報記憶部１５に接続さ
れた地図データ読込部３１と、地図描画部３２と、地図
データ読込部３１および地図描画部３２を管理する地図
管理部３３と、地図描画部３２および表示部１１に接続
された画面管理部３４と、スイッチ入力類管理部１２お
よび音声認識部１６に接続された入力管理部３５と、音
声出力部１７の音声出力用ＩＣ２６に接続された音声出
力管理部３６、通信管理部３８、および、地図管理部３
３、画面管理部３４、入力管理部３５、音声出力管理部
３６、通信管理部３８を管理する全体管理部３７とを備
えている。通信管理部３８には、図示しない自動車電話
や、ＰＨＳ、携帯電話等の通信機器が接続可能になって
おり、通常の電話通信の他、ファクシミリ通信やパソコ
ン通信等のマルチメディア通信、ＡＴＩＳによる通信等
の各種通信を行う場合に、通信を管理するようになって
いる。The arithmetic unit 10 includes a CPU (central processing unit),
A ROM (Read Only Memory), a RAM (Random Access Memory), and the like are provided, and the CPU executes the programs stored in the ROM or the external storage device using the RAM as a working area, thereby implementing each of the above configurations. Is to be realized. That is, the operation unit 10
A map data reading unit 31 connected to the speed sensor 14 and the map information storage unit 15; a map drawing unit 32; a map management unit 33 that manages the map data reading unit 31 and the map drawing unit 32; 32, a screen management unit 34 connected to the display unit 11, an input management unit 35 connected to the switch input type management unit 12 and the voice recognition unit 16, and a voice connected to the voice output IC 26 of the voice output unit 17. Output management unit 36, communication management unit 38, and map management unit 3
3, a screen management unit 34, an input management unit 35, an audio output management unit 36, and an overall management unit 37 that manages a communication management unit 38. The communication management unit 38 can be connected to communication devices such as a car telephone, a PHS, and a mobile phone (not shown). In addition to ordinary telephone communication, multimedia communication such as facsimile communication and personal computer communication, and communication using ATIS. When various communications such as are performed, the communications are managed.

【００１６】図２は、図１における音声認識部１６の構
成を示すブロック図である。この図に示すように、音声
認識部１６は、前処理部１６１、特徴抽出部１６２、単
語辞書１６３、パターンマッチング部１６５、および、
判定部１６６を備えている。前処理部１６１は、マイク
２４から入力される音声信号をディジタル信号に変換す
るとともに、Ａ／Ｄ変換後の音声信号に対して音声区間
の検出、プリエンファシス（高域強調）、雑音除去等の
前処理を行うようになっている。特徴抽出部１６２は、
前処理部１６２で前処理が行われた後の音声信号から、
その音声についての特徴を抽出するようになっている。
抽出した音声についての特徴は、その単語の単語パター
ンとされる。ここで、音声信号の特徴は、例えば、高速
フーリエ変換（ＦＦＴ）により得られる、スペクトルや
ケプストラムについての、時系列情報が使用される。こ
の特徴抽出部１６２は、多チャネル・バンドパスフィル
タや線形予測分析等の各種分析法によって、入力音声に
ついての特徴を抽出するようになっている。FIG. 2 is a block diagram showing the configuration of the speech recognition section 16 in FIG. As shown in the figure, the speech recognition unit 16 includes a preprocessing unit 161, a feature extraction unit 162, a word dictionary 163, a pattern matching unit 165,
A determination unit 166 is provided. The pre-processing unit 161 converts an audio signal input from the microphone 24 into a digital signal, and performs audio section detection, pre-emphasis (high-frequency emphasis), noise removal, and the like on the audio signal after the A / D conversion. Pre-processing is performed. The feature extraction unit 162
From the audio signal after the pre-processing performed by the pre-processing unit 162,
The feature of the voice is extracted.
The feature of the extracted voice is a word pattern of the word. Here, as the feature of the audio signal, for example, time-series information about a spectrum or a cepstrum obtained by fast Fourier transform (FFT) is used. The feature extraction unit 162 extracts a feature of the input speech by various analysis methods such as a multi-channel bandpass filter and a linear prediction analysis.

【００１７】単語辞書１６３には、音声認識の対象とな
るすべての単語についての標準パターンが格納されてい
る。この標準パターンは、不特定話者認識用のもので、
特徴抽出部１６２による音声の分析方法と同一の方法に
よって抽出した各単語の特徴が標準パターンとして格納
されている。音声認識の対象となる単語としては、タッ
チパネル１１ａの画面に表示される各種指定キーとスイ
ッチ１１ｂの各種スイッチの内容、及び、地図情報記憶
部１５に格納されている目的地設定可能な目的地名等で
ある。The word dictionary 163 stores standard patterns for all words to be subjected to speech recognition. This standard pattern is for speaker-independent recognition.
The feature of each word extracted by the same method as the speech analysis method by the feature extraction unit 162 is stored as a standard pattern. The words to be subjected to voice recognition include various designation keys displayed on the screen of the touch panel 11a and the contents of various switches of the switch 11b, destination names stored in the map information storage unit 15 and capable of setting destinations, and the like. It is.

【００１８】図３は、単語辞書１６３の内容の一例を概
念的に表したものである。この図３に示すように単語辞
書１６３は、複数の個別辞書に分類されており、各個別
辞書に分類される単語数は１０００単語以下になってい
る。各個別辞書は、２文字単語辞書、３文字単語辞書、
…というように、各単語の文字数によって分類されてい
る。各個別辞書に格納される単語としては、上述したよ
うに、タッチパネル１１ａに表示される各種指定キーと
して、「ほせい」、「ごるふ」、「ゆうえんち」等があ
り、目的地名として「たまてっく」、「としまえん」、
「きよみずでら」等がある。各個別辞書には、実際は、
該当単語の標準パターンと、その単語に対応する符号列
からなるコード情報とが格納されている。各単語のコー
ド情報は、地図記憶部１５に格納されている目的地名の
コード情報や、タッチパネル１１ａ等からの入力内容に
対応したコード情報と同一のコード情報が使用される。FIG. 3 conceptually shows an example of the contents of the word dictionary 163. As shown in FIG. 3, the word dictionary 163 is classified into a plurality of individual dictionaries, and the number of words classified into each individual dictionary is 1000 words or less. Each individual dictionary is a two-letter word dictionary, a three-letter word dictionary,
.. Are classified according to the number of characters of each word. As described above, the words stored in each of the individual dictionaries include, as described above, various designation keys displayed on the touch panel 11a, such as "Hosei", "Golfu", "Yuenchi", and "Tamatekku" as the destination name. , "Toshimaen",
"Kiyomizu dera" and others. Each individual dictionary actually contains:
A standard pattern of the word and code information including a code string corresponding to the word are stored. As the code information of each word, the same code information as the code information of the destination name stored in the map storage unit 15 and the code information corresponding to the input content from the touch panel 11a or the like is used.

【００１９】パターンマッチング部１６５は、単語辞書
１６３に格納された各単語の標準パターンと、特徴抽出
部１６２によって抽出された単語パターン（特徴）とを
比較比較し、両者の類似度を算出するようになってい
る。パターンマッチング部１６５には、前処理部１６１
で検出された音声区間のデータ（音声継続時間）が供給
されている。パターンマッチング部１６５は、音声継続
時間から、発声された音声の文字数を特定し、特定した
文字数に近い文字数の個別辞書から優先的にパターンマ
ッチングを行うようになっている。音声の継続時間と単
語の文字数との関係については、両者の関係を予め測定
することで対応テーブルを作成しておく。例えば、複数
人により、３文字からなる複数の単語のそれぞれについ
て、複数回づつ発声してもらい、その発声時間の分布を
測定する。同様に他の文字数の単語についても発声時間
の分布を測定する。この測定値から、各単語について分
布が多い時間帯を、重複しない用に抽出することで、音
声継続時間と文字数の対応テーブルを作成する。作成し
た対応テーブルは、パターンマッチング部１６５で参照
できるようにしておく。The pattern matching unit 165 compares the standard pattern of each word stored in the word dictionary 163 with the word pattern (feature) extracted by the feature extraction unit 162, and calculates the similarity between the two. It has become. The pattern matching unit 165 includes a pre-processing unit 161
The data (speech continuation time) of the speech section detected in is supplied. The pattern matching unit 165 specifies the number of characters of the uttered voice from the voice continuation time, and preferentially performs pattern matching from an individual dictionary having a number of characters close to the specified number of characters. As for the relationship between the duration of the voice and the number of characters in the word, a correspondence table is created by measuring the relationship between the two in advance. For example, a plurality of words are uttered a plurality of times by a plurality of persons for each of a plurality of words composed of three characters, and the distribution of the utterance time is measured. Similarly, the distribution of the utterance time is measured for words having other numbers of characters. From this measured value, a time zone in which the distribution of each word is large is extracted for non-overlapping, thereby creating a correspondence table of the voice duration and the number of characters. The created correspondence table can be referred to by the pattern matching unit 165.

【００２０】判定部１６６は、パターンマッチング部１
６５の比較結果に基づいてマイク２４から入力された音
声の内容を認識し、その認識内容（認識単語）に対応す
るコード情報を、演算部１０の入力管理部３５に供給す
るようになっている。The determination unit 166 is provided by the pattern matching unit 1
Based on the comparison result of 65, the content of the voice input from the microphone 24 is recognized, and code information corresponding to the recognized content (recognized word) is supplied to the input management unit 35 of the arithmetic unit 10. .

【００２１】次に、このように構成された音声認識装置
における音声認識動作について説明する。マイク２４か
ら認識対象となる音声が音声認識部１６に入力される
と、前処理部１６１では、入力された音声のアナログ信
号をディジタル信号に変換した後、声区間の検出、プリ
エンファシス、雑音除去等の前処理を行う。前処理部１
６１は、前処理後の音声信号を特徴抽出部１６２に供給
すると共に、音声区間の検出で得られた音声区間データ
（音声継続時間）をパターンマッチング部１６５に供給
する。Next, the speech recognition operation in the speech recognition apparatus thus configured will be described. When a voice to be recognized is input from the microphone 24 to the voice recognition unit 16, the preprocessing unit 161 converts an analog signal of the input voice into a digital signal, and then detects a voice section, pre-emphasis, and noise removal. And so on. Preprocessing unit 1
61 supplies the preprocessed voice signal to the feature extraction unit 162 and also supplies voice section data (voice duration) obtained by voice section detection to the pattern matching unit 165.

【００２２】特徴抽出部１６２では、供給された音声信
号を分析することで、その入力音声の特徴を抽出する。
そして、抽出した特徴を、その入力音声についての単語
パターンとして、パターンマッチング部１６５に供給す
る。パターンマッチング部１６５では、特徴抽出部１６
２から単語パターンが供給されるまでの間に、前処理部
１６１から供給された音声継続時間に対応する入力音声
の文字数を、音声継続時間−文字数対応テーブルにより
特定する。そして、パターンマッチング１６５は、単語
辞書１６３の各個別辞書の中から、対応する文字数の個
別辞書を選択しておく。そして、単語パターンが供給さ
れると、選択しておいた個別辞書内の各標準パターンと
比較し、各単語に対する類似度を算出して判定部１６６
に供給する。The characteristic extracting section 162 analyzes the supplied audio signal to extract the characteristic of the input voice.
Then, the extracted feature is supplied to the pattern matching unit 165 as a word pattern for the input voice. In the pattern matching unit 165, the feature extraction unit 16
From 2 to the time when the word pattern is supplied, the number of characters of the input voice corresponding to the voice duration supplied from the preprocessing unit 161 is specified by the voice duration-character number correspondence table. Then, the pattern matching 165 selects an individual dictionary having a corresponding number of characters from among the individual dictionaries of the word dictionary 163. Then, when the word pattern is supplied, it is compared with each of the standard patterns in the selected individual dictionary, and the similarity for each word is calculated.
To supply.

【００２３】判定部１６６では、供給された各単語に対
する類似度の大きい順にソートし、類似度が所定のしき
い値を越えていていることを条件に、類似度が大きい上
位の単語を所定個Ｎ個取り出す。そして、類似度が最も
大きい単語を入力音声に対する認識単語とし、他の単語
を類似度が大きい順に次候補として、各単語に対応する
コード情報を、制御部１０の入力管理部３５に供給す
る。The determination unit 166 sorts the supplied words in descending order of the degree of similarity, and on the condition that the degree of similarity exceeds a predetermined threshold value, determines a predetermined number of high-order words having a high degree of similarity. Take out N pieces. Then, the word having the highest similarity is set as the recognition word for the input voice, and the other words are set as the next candidates in descending order of the similarity, and the code information corresponding to each word is supplied to the input management unit 35 of the control unit 10.

【００２４】なお、判定部１６６は、パターンマッチン
グ部１６５から供給される類似度が、所定のしきい値を
越える単語の数がＮ個無い場合には、パターンマッチン
グ部１６５に対して他の個別辞書についての再度のパタ
ーンマッチングを要求する。パターンマッチング部１６
５は、この要求があると、音声継続時間が次に近い個別
辞書に切り換えて、再度パターンマッチングを行う。例
えば、最初に音声継続時間から５文字の個別辞書につい
てのパターンマッチングを行った後、再度のパターンマ
ッチング要求が判定部１６６からあると、次に音声継続
時間が近い、４文字、又は、６文字の個別辞書について
のパターンマッチングを行う。ここで４文字と６文字の
選択については、音声継続時間が５文字の時間範囲の中
心点での時間よりも長い場合には６文字で、短い場合に
は４文字が選択される。判定部１６６は、既に供給され
ている類似度と、再度のパターンマッチングで新たに供
給された類似度の中から、しきい値を越える上位Ｎ個を
選択してそのコード情報を入力管理部３５に供給する。
これによってもしきい値を越える単語がＮ個無い場合に
は、しきい値を越える単語がＮ個以上になるまで再度パ
ターンマッチングを要求する。If the number of words whose similarity supplied from the pattern matching unit 165 exceeds a predetermined threshold value is not N, the determination unit 166 sends the pattern matching unit 165 another individual word. Request the pattern matching again for the dictionary. Pattern matching unit 16
5 receives this request, switches to the individual dictionary whose voice duration is the next closest, and performs pattern matching again. For example, after performing pattern matching for the individual dictionary of five characters from the voice duration first, and then receiving a pattern matching request again from the determination unit 166, the next four voices or six characters next to the voice duration Perform pattern matching for the individual dictionary. Here, regarding the selection of four characters and six characters, six characters are selected when the audio duration is longer than the time at the center of the time range of five characters, and four characters are selected when the voice duration is short. The determination unit 166 selects the top N items exceeding the threshold value from the similarities already supplied and the similarities newly supplied by the pattern matching again, and inputs the code information to the input management unit 35. To supply.
If there are no N words exceeding the threshold value, pattern matching is requested again until N words exceed the threshold value.

【００２５】演算部１０では、認識単語に対応するコー
ド情報が入力管理部に供給されると、全体管理部３７
が、音声出力管理部３６を介してスピーカ２９から音声
によるアンサーバックを行うことで、認識音声の確認を
行う。または、供給されたコード情報に対応するＮ個の
単語の所定数をディスプレイ１１ａに表示し、ユーザに
選択してもらうことで認識音声を特定する。When the code information corresponding to the recognized word is supplied to the input management unit, the calculation unit 10 controls the overall management unit 37.
However, by performing answerback by voice from the speaker 29 via the voice output management unit 36, the recognized voice is confirmed. Alternatively, a predetermined number of N words corresponding to the supplied code information is displayed on the display 11a, and the user selects the word, thereby identifying the recognition voice.

【００２６】以上説明したように本実施形態によれば、
音声入力を行う場合の発声時間は比較的一定であること
に着目して、音声認識の対象となる各単語をその文字数
語との個別辞書に分類すると共に、入力音声の継続時間
から入力音声の文字数を特定し、特定した単語数に近い
文字数の個別辞書から優先的にパターンマッチングを行
うようにしたので、音声辞書の選択と切り換えを適切に
行うことともに、認識時間を短縮することができる。ま
た、個別辞書の適切な分類と、適切な選択が行われるた
め、認識率を向上させることができる。As described above, according to the present embodiment,
Paying attention to the fact that the utterance time when performing voice input is relatively constant, each word to be subjected to voice recognition is classified into an individual dictionary with the number of characters, and the duration of the input voice is used to determine the input voice. Since the number of characters is specified and the pattern matching is preferentially performed from the individual dictionaries having the number of characters close to the specified number of words, it is possible to appropriately select and switch the voice dictionary and reduce the recognition time. In addition, since appropriate classification and selection of the individual dictionaries are performed, the recognition rate can be improved.

【００２７】以上説明した実施形態では、本発明の好適
な実施形態の内の１実施形態について説明したもので、
本発明は特許請求の範囲に記載した発明の範囲において
種々の変形が可能である。例えば、説明した実施形態で
は、音声継続時間に対応する文字数に近い文字数の個別
辞書から順次パターンマッチングを行い、所定のしきい
値を越える類似度の単語がＮ個以上になった時点で他の
個別辞書に対するパターンマッチングを終了する構成と
したが、本発明では他に、すべての個別辞書に対するパ
ターンマッチングを行うようにしても良い。この場合、
各個別辞書の単語に対するパターンマッチングの結果得
られる類似度に対して、入力音声の継続時間から特定し
た文字数に応じた重みづけをしても良い。例えば、入力
音声が５文字であると特定された場合、図３の５文字の
個別辞書内の単語「ゆうえんち」、「としまえん」、…
に対する類似度に最も大きな重み付けをする。次に１文
字違いの４文字と６文字の個別辞書内の単語に次に大き
な重み付けをする。なお、この重み付けをどの範囲まで
（何文字違いの文字まで）重み付けをするかについて
は、任意に選択することができる。また、重み付けとし
て、所定の値を加算するのか、または、所定係数を乗算
するのかについて、および、加算値、乗算値についても
任意に選択することができる。In the embodiment described above, only one of the preferred embodiments of the present invention has been described.
The present invention can be variously modified within the scope of the invention described in the claims. For example, in the embodiment described above, pattern matching is sequentially performed from the individual dictionaries having the number of characters close to the number of characters corresponding to the voice continuation time. Although the pattern matching for the individual dictionaries is terminated, the present invention may alternatively perform pattern matching for all the individual dictionaries. in this case,
The similarity obtained as a result of pattern matching for words in each individual dictionary may be weighted according to the number of characters specified from the duration of the input voice. For example, if the input voice is specified to be five characters, the words “Yuenchi”, “Toshimaen”,.
Is weighted the most to the degree of similarity to. Next, the next largest weight is assigned to words in the individual dictionary of 4 and 6 characters that differ by 1 character. In addition, it is possible to arbitrarily select to which range (up to how many characters different characters) are to be weighted. As the weighting, it is possible to arbitrarily select whether to add a predetermined value or multiply by a predetermined coefficient, and also to add or multiply the value.

【００２８】また、以上説明した実施形態では、パター
ンマッチングを行う回路またはチップ等が１つである場
合を前提に説明したが、本発明では、複数配置するよう
にしても良い。例えば、分類した個別辞書のそれぞれに
対して、専用のパターンマッチング用の回路又はチップ
等を配置するようにしても良い。この場合には、入力音
声の継続時間から特定した文字数に応じた重みづけをし
する。このように構成することで、高速で認識すると共
に、高い認識率を得ることができる。Further, in the above-described embodiment, the description has been made on the assumption that the number of circuits or chips for performing pattern matching is one, but in the present invention, a plurality of circuits or chips may be arranged. For example, a dedicated circuit or chip for pattern matching may be arranged for each of the classified individual dictionaries. In this case, weighting is performed according to the number of characters specified from the duration of the input voice. With such a configuration, high-speed recognition and a high recognition rate can be obtained.

【００２９】更に、説明した情報では、単語辞書１６３
の内容を認識対象となる単語の文字数によって個別辞書
に分類したが、本発明では、各単語の発声時間により分
類するようにしても良い。すなわち、認識対象となる全
単語について、複数人による発声時間（音声継続時間）
を測定し、その平均時間毎に個別辞書を分類する。例え
ば、音声継続時間が０．１秒台の個別辞書、０．２秒台
の個別辞書、０．３秒台の個別辞書、…、ｍ秒台の個別
辞書、…、に分類する。個別辞書を、文字数ではなく、
認識対象単語の実際の発声時間の平均から求めた音声継
続時間とすることで、より認識率を上げることができ
る。Further, in the described information, the word dictionary 163
Are classified into individual dictionaries according to the number of characters of a word to be recognized. However, in the present invention, the classification may be performed according to the utterance time of each word. In other words, for all the words to be recognized, the utterance time (speech duration) by multiple people
Is measured, and the individual dictionaries are classified for each average time. For example, the voice continuation time is classified into an individual dictionary of the order of 0.1 seconds, an individual dictionary of the order of 0.2 seconds, an individual dictionary of the order of 0.3 seconds, ..., an individual dictionary of the order of m seconds, ... Individual dictionaries, not the number of characters,
The recognition rate can be further increased by using the speech duration obtained from the average of the actual utterance times of the recognition target words.

【００３０】また、説明した実施形態では、図３に示す
ように、各文字数に応じた個別辞書に各単語を分類した
が、個別辞書による分類をすることなく、各単語の標準
パターンデータとコード情報に加えて、その単語の音声
継続時間情報、または、その単語の文字数情報を格納す
るようにしても良い。この場合、パターンマッチング部
１６５では、パターンマッチングを行う前に、単語辞書
の全単語の中から、前処理部１６１から供給される入力
音声の音声継続時間、または、これから特定した文字数
に対応する単語をセレクトし、その後にパターンマッチ
ングを行う。In the embodiment described above, as shown in FIG. 3, each word is classified into an individual dictionary according to the number of characters. However, the standard pattern data and the code of each word are classified without using the individual dictionary. In addition to the information, voice duration information of the word or character number information of the word may be stored. In this case, before performing the pattern matching, the pattern matching unit 165 selects, from all words in the word dictionary, the speech duration of the input speech supplied from the preprocessing unit 161 or the word corresponding to the number of characters specified from this. And then perform pattern matching.

【００３１】また、以上説明した実施形態では、音声認
識装置の全機能をナビゲーション装置に適用したが、本
発明では、音声認識装置の一部、又は全部をナビゲーシ
ョン装置外の他の装置に配置するようにしても良い。他
の装置としては、車両に対して目的地までの走行経路等
に関する情報を通信によって提供する、情報提供局とす
ることが望ましい。情報提供局には、少なくとも文字数
毎、または、音声継続時間毎の個別辞書に分類された単
語辞書１６３と、パターンマッチング部１６５を配置し
ておくが、前処理部１６１、特徴抽出部１６２、およ
び、判定部１６６を含めた音声認識部１６全体を情報提
供局に配置しておくことが、ナビゲーション装置側の装
置構成を少なくするうえで好ましい。音声認識部１６全
体を情報提供局に配置した場合、目的地等の音声をナビ
ゲーション装置から入力し、これを通信管理部３８を介
して自動車電話等から情報提供局に送信する。情報提供
局では、受信した音声に対して、前処理、特徴抽出、パ
ターンマッチング、および、判定を行い、類似度が上位
Ｎ個の認識単語に対するコード情報を、通信によってナ
ビゲーション装置に送信する。情報提供局によるパター
ンマッチング処理と判定処理については、前記実施形態
で説明した方法でも、その変形例で説明したいずれの方
法でも良い。ナビゲーション装置では、通信管理部３８
を介してこのコード情報を受信し、目的地の設定等を行
う。なお、情報提供局では、音声認識により得られた目
的地に基づいて、その目的地までの経路探索を行い、探
索経路の情報をナビゲーション装置に送信するようにし
ても良い。In the embodiment described above, all functions of the speech recognition device are applied to the navigation device. However, in the present invention, part or all of the speech recognition device is arranged in another device outside the navigation device. You may do it. As another device, it is desirable to be an information providing station that provides information on a traveling route to a destination and the like to a vehicle by communication. In the information providing station, a word dictionary 163 classified into individual dictionaries at least for each number of characters or for each voice duration and a pattern matching unit 165 are arranged. The preprocessing unit 161, the feature extraction unit 162, It is preferable to arrange the entire voice recognition unit 16 including the determination unit 166 in the information providing station in order to reduce the device configuration on the navigation device side. When the entire voice recognition unit 16 is arranged at the information providing station, the voice of the destination or the like is input from the navigation device, and is transmitted from the car telephone or the like to the information providing station via the communication management unit 38. The information providing station performs pre-processing, feature extraction, pattern matching, and determination on the received voice, and transmits the code information for the top N recognized words having similarities to the navigation device by communication. The pattern matching processing and the determination processing by the information providing station may be the method described in the above embodiment or any of the methods described in the modified examples. In the navigation device, the communication management unit 38
This code information is received via the PC and the destination is set. The information providing station may search for a route to the destination based on the destination obtained by voice recognition, and transmit information on the searched route to the navigation device.

【００３２】[0032]

【発明の効果】本発明によれば、音声辞書の内容を適切
に分類したので、効率的に音声を認識することができ
る。According to the present invention, since the contents of the speech dictionary are appropriately classified, speech can be recognized efficiently.

[Brief description of the drawings]

【図１】本発明に係る、音声認識装置をナビゲーション
装置に適用した場合の構成図である。FIG. 1 is a configuration diagram when a voice recognition device according to the present invention is applied to a navigation device.

【図２】音声認識部の構成図である。FIG. 2 is a configuration diagram of a voice recognition unit.

【図３】単語辞書の内容の一例を概念的に表した説明図
である。FIG. 3 is an explanatory diagram conceptually showing an example of the contents of a word dictionary.

[Explanation of symbols]

１０演算部１１表示部１１ａディスプレイ１３現在位置測定部１５地図情報記憶部１６音声認識部１６１前処理部１６２特徴抽出部１６３単語辞書１６５パターンマッチング部１６６判定部１７音声出力部２４マイク３３地図管理部３４画面管理部３５入力管理部３７全体管理部３８通信管理部 Reference Signs List 10 arithmetic unit 11 display unit 11a display 13 current position measurement unit 15 map information storage unit 16 speech recognition unit 161 preprocessing unit 162 feature extraction unit 163 word dictionary 165 pattern matching unit 166 determination unit 17 audio output unit 24 microphone 33 map management unit 34 Screen management unit 35 Input management unit 37 Overall management unit 38 Communication management unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所 // Ｇ０１Ｃ 21/00 Ｇ０１Ｃ 21/00 ＨＧ０９Ｂ 29/10 Ｇ０９Ｂ 29/10 Ａ ──────────────────────────────────────────────────続き Continuation of the front page (51) Int.Cl. ⁶ Identification number Agency reference number FI Technical display location // G01C 21/00 G01C 21/00 H G09B 29/10 G09B 29/10 A

Claims

[Claims]

1. A word dictionary in which standard patterns of a plurality of words to be recognized are stored in such a manner that the number of characters can be distinguished, voice input means for inputting voice, and voice input from the voice input means. Character number specifying means for specifying the number of characters of the word, word dictionary selecting means for selecting a standard pattern of words from the word dictionary according to the number of characters specified by the character number specifying means, and voice input from the voice input means A feature extraction unit for extracting a feature of the feature extraction unit; a similarity calculation unit for calculating a similarity between the feature extracted by the feature extraction unit and the standard pattern selected by the word dictionary selection unit; Determining means for determining the input voice from the similarity calculated by the means.

2. A word dictionary in which standard patterns of a plurality of words to be recognized are stored in such a manner that their voice durations can be distinguished, voice input means for inputting voice, and voice input means. Voice duration detecting means for detecting a voice duration of voice, and a word dictionary selection for selecting a standard pattern of a word within a predetermined time interval from the voice duration detected by the voice duration detecting means from the word dictionary. Means, a feature extracting means for extracting a feature of the voice input from the voice input means, and a similarity between the feature extracted by the feature extracting means and the standard pattern selected by the word dictionary selecting means. A voice characterized by comprising: a similarity calculation means for calculating; and a determination means for determining an input voice from the similarity calculated by the similarity calculation means. Recognition device.

3. A word dictionary in which standard patterns of a plurality of words to be recognized are stored in such a manner that the number of characters can be distinguished, voice input means for inputting voice, and voice input from the voice input means. A character number specifying unit for specifying the number of characters of the character string; a characteristic extracting unit for extracting a characteristic of the voice input from the voice input unit; a characteristic extracted by the characteristic extracting unit; and a standard stored in the word dictionary. A similarity calculating means for calculating a similarity with the pattern; a weighting means for weighting the similarity of each word calculated by the similarity calculating means in accordance with the number of characters specified by the character number specifying means; A speech recognition device comprising: a determination unit configured to determine an input voice from the similarity after weighting by the weighting unit.

4. A word dictionary in which standard patterns of a plurality of words to be recognized are stored in a state in which their voice durations can be distinguished, voice input means for inputting voice, and voice input means. Voice duration detection means for detecting the voice duration time of voice; feature extraction means for extracting features of voice input from the voice input means; features extracted by the feature extraction means; A similarity calculating means for calculating the similarity with the standard pattern stored in the memory; and a similarity of each word calculated by the similarity calculating means, in accordance with the voice duration detected by the voice duration detecting means. Weighting means for performing weighting, and determination means for determining an input voice from the similarity after weighting by the weighting means. Identification equipment.