JPH10143191A

JPH10143191A - Speech recognition system

Info

Publication number: JPH10143191A
Application number: JP8301802A
Authority: JP
Inventors: Shinji Wakizaka; 新路脇坂; Kazuyoshi Ishiwatari; 一嘉石渡; Koji Ito; 功二伊東; Tetsuji Toushita; 哲司塔下; Makoto Tanaka; 田中　　誠
Original assignee: Hitachi Microcomputer System Ltd; Hitachi Ltd
Current assignee: Hitachi Microcomputer System Ltd; Hitachi Ltd
Priority date: 1996-11-13
Filing date: 1996-11-13
Publication date: 1998-05-29
Also published as: TW360858B; US6112174A; KR19980042248A; KR100274276B1

Abstract

PROBLEM TO BE SOLVED: To actualize an excellent speech recognition interface by limiting words and documents as objects of speech recognition and performing the speech recognition. SOLUTION: A speech analysis part 106 performs a noise process and takes a speech analysis of a speech inputted through a microphone 106. A speech recognition part 107 collates the input speech with the speech analytic result of the input speech calculated by the speech analysis part 106 by using a dictionary 105 and a sound model 108 in sequence to calculate the closest word in the dictionary 105. In this case, a dictionary switching part 103 selects one of dictionaries or switches them for the speech recognition according to the contents of dictionary switching information 102. For example, plural dictionaries are stored on a memory card or in a ROM 104, and when the speech recognition is performed, only a necessary dictionary is transferred to a RAM 105 to perform a speech recognizing process. Namely, words and documents to be recognized are limited and the speech recognition is performed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、カーナビゲーショ
ンシステム、ＰＤＡ（Personal Digital Assistant）に
代表される小型情報機器、携帯型音声翻訳機などに用い
て好適な音声認識システムに係り、特に、カーナビゲー
ションシステムにおける地名、交差点名、通り名等の膨
大な単語の音声認識に用いて好適な、音声認識誘導シス
テムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition system suitable for use in a car navigation system, a small information device represented by a PDA (Personal Digital Assistant), a portable speech translator, and the like. The present invention relates to a speech recognition guidance system suitable for speech recognition of huge words such as place names, intersection names, and street names in a system.

【０００２】[0002]

【従来の技術】音声認識技術を用いた小型情報システム
が、近時普及しつつある。すなわち、カーナビゲション
システムをはじめとして、ＰＤＡに代表される携帯型情
報機器、携帯型翻訳機等である。ただし、従来の技術で
は、認識率や認識応答時間の性能を低下させないため
に、認識する語数の制約がある。2. Description of the Related Art Small information systems using voice recognition technology have recently become widespread. That is, portable information devices represented by PDAs, portable translators, and the like, as well as car navigation systems. However, in the related art, there is a restriction on the number of words to be recognized in order not to lower the performance of the recognition rate and the recognition response time.

【０００３】また、特開平５−３５７７６号公報（名
称；「言語自動選択機能付翻訳装置」）には、マイクか
ら入力した操作者の音声を認識して、翻訳し、翻訳した
言語の音声を出力するようにした携帯用の翻訳装置が開
示されている。[0003] Japanese Patent Application Laid-Open No. 5-35776 (name: "Translator with automatic language selection function") recognizes and translates an operator's voice input from a microphone, and translates the translated language voice. A portable translator for outputting is disclosed.

【０００４】図８は、このような従来の音声翻訳装置の
１例を示すブロック図である。同図において、８０１は
制御部、８０２は音声区間切出し部、８０３は音声認識
部、８０４は表示部、８０５は音声合成部、８０６は翻
訳語データ用メモリカード、８０７は音声認識辞書部、
８０８はスピーカ、８０９はマイク、８１０はスピーカ
アンプ、８１１は操作信号である。FIG. 8 is a block diagram showing an example of such a conventional speech translator. In the figure, reference numeral 801 denotes a control unit, 802 denotes a voice section cutout unit, 803 denotes a voice recognition unit, 804 denotes a display unit, 805 denotes a voice synthesis unit, 806 denotes a memory card for translated word data, 807 denotes a voice recognition dictionary unit,
808 is a speaker, 809 is a microphone, 810 is a speaker amplifier, and 811 is an operation signal.

【０００５】制御部８０１はマイクロプロセッサ等から
なり、装置の各部を制御する。音声区間切出し部８０２
は、マイク８０９から入力された音声をデジタル信号に
変換して切り出し、音声認識部８０３に送る。音声認識
部８０３は、キーボード又はスイッチ等による操作信号
８１１を受けた制御部８０１の指示により、マイク８０
９、音声区間切出し部８０２を経て、切り出された音声
を分析する。そして、その結果を、音声認識辞書部８０
７に格納された標準音声パターンと比較することによ
り、音声認識を行う。[0005] The control unit 801 is composed of a microprocessor or the like, and controls each unit of the apparatus. Voice section extraction unit 802
Converts the voice input from the microphone 809 into a digital signal, cuts out the digital signal, and sends the digital signal to the voice recognition unit 803. The voice recognition unit 803 receives an operation signal 811 from a keyboard, a switch, or the like, and receives an instruction from the control unit 801 to receive the operation signal 811.
9. Analyze the cut-out voice via the voice section cut-out unit 802. Then, the result is input to the speech recognition dictionary unit 80.
The voice recognition is performed by comparing with the standard voice pattern stored in.

【０００６】音声合成部８０５は、音声認識部８０３に
より認識された音声に対応した翻訳語を、翻訳語データ
用メモリカード８０６から読み込み、これを音声信号に
変換して、スピーカアンプ８１０、スピーカ８０８を経
て音声として出力させる。A speech synthesizer 805 reads a translated word corresponding to the speech recognized by the speech recognizer 803 from the translated word data memory card 806, converts this into a speech signal, and outputs the speech signal to the speaker amplifier 810 and the speaker 808. And output as audio.

【０００７】表示部８０４は、翻訳装置の使用者への指
示や翻訳語の文字による表示等を行う。翻訳語データ用
メモリカード８０６は、ＲＯＭカード等からなり、翻訳
語を音声合成して出力する場合には、音声データを格納
している。また、この翻訳語データ用メモリカード８０
６から、翻訳語に対応したキャラクターコードを読み込
み、表示部８０４に表示する。そして、この翻訳語デー
タ用メモリカード８０６を他の言語のものと交換するこ
とにより、複数の言語に翻訳することが可能となる。音
声認識辞書部８０７は、ＲＡＭ等からなり、操作者の発
生に応じた標準音声パターンを格納している。この標準
音声パターンは、操作者があらかじめ格納しておく。[0007] The display unit 804 gives instructions to the user of the translation apparatus, displays translated characters, and the like. The translated word data memory card 806 is composed of a ROM card or the like, and stores voice data when the translated word is synthesized and output. Also, the memory card 80 for this translated word data
From 6, a character code corresponding to the translated word is read and displayed on the display unit 804. By exchanging the translated word data memory card 806 with one for another language, translation into a plurality of languages becomes possible. The voice recognition dictionary unit 807 includes a RAM or the like, and stores a standard voice pattern according to the occurrence of the operator. This standard voice pattern is stored in advance by the operator.

【０００８】[0008]

【発明が解決しようとする課題】上述したように、音声
認識技術を用いた小型情報システムは、カーナビゲショ
ンシステムをはじめとして、ＰＤＡに代表される携帯型
情報機器、携帯型翻訳機等として、今後ますます普及し
てくると予想される。ところで、音声認識を用いたヒュ
ーマンインターフェースの向上においては、認識率およ
び認識応答時間が問題となる。As described above, a small information system using a speech recognition technique is used as a portable information device represented by a PDA, a portable translator, etc., including a car navigation system. It is expected to become more and more popular in the future. By the way, in improving a human interface using voice recognition, a recognition rate and a recognition response time are problems.

【０００９】しかしながら、従来の技術では、認識率や
認識応答時間の性能を低下させないために、認識する語
数を制約しなければならない。一方、認識する語数を増
やすと、音声の特徴が似通った単語が増加して認識率が
低下する。また、認識対象となるすべての単語に対し
て、音声認識処理を行うので、そのために必要なワーク
メモリや辞書メモリ等の規模が大きくなり、処理時間も
増加する。However, in the prior art, the number of words to be recognized must be restricted in order not to lower the performance of the recognition rate and the recognition response time. On the other hand, when the number of words to be recognized is increased, words having similar voice characteristics increase, and the recognition rate decreases. In addition, since speech recognition processing is performed on all words to be recognized, the size of a work memory, a dictionary memory, and the like required for the processing increases, and the processing time also increases.

【００１０】なお将来的には、音声認識技術の革新や、
それを実現するソフトウエア、ハードウエアの性能向上
により、認識する語数の制約がなくなることも考えられ
るが、当面は、認識率や認識応答時間の性能を低下させ
ないために、認識する語数を制約せざるを得ないのが現
状である。In the future, innovations in speech recognition technology,
It is conceivable that the limitation of the number of words to be recognized may be removed by improving the performance of software and hardware that realizes this, but for the time being, the number of words to be recognized must be limited in order not to reduce the performance of the recognition rate and recognition response time. At present it is inevitable.

【００１１】斯様な現状においても、音声認識技術を用
いた小型情報システムでは、特に、カーナビゲションシ
ステムなどでは、使い勝手を良くするために、音声認識
する語彙数を増加させたいという要求がある。[0011] Even in such a current situation, there is a demand for increasing the number of vocabulary words for voice recognition in a small information system using voice recognition technology, particularly in a car navigation system, etc., in order to improve the usability. .

【００１２】本発明は上記の点に鑑みなされたもので、
その目的とするところは、音声認識する語彙数を増加さ
せても、認識率や認識応答時間の性能を低下させない
で、音声認識できるシステムを実現することにある。ま
た、本発明の目的とするところは、音声認識を用いたカ
ーナビゲーションシステムおいて、良好な音声認識イン
ターフェースを実現することにある。The present invention has been made in view of the above points,
An object of the present invention is to realize a system capable of performing voice recognition without reducing the performance of the recognition rate and the recognition response time even when the number of words to be recognized is increased. Another object of the present invention is to realize a good voice recognition interface in a car navigation system using voice recognition.

【００１３】[0013]

【課題を解決するための手段】上記した目的を達成する
ため、本発明による音声認識システムは、音声認識の対
象となる単語や文章を任意の数、あるいは、指定された
数だけ用意して、それらを１つの辞書として定義し、ま
た、別な音声認識の対象となる単語や文章を任意の数、
あるいは、指定された数だけ用意して、それらをもう１
つの辞書として定義し、これらの辞書を複数用意して、
複数の辞書を格納しておく第１の記憶部と、複数の辞書
から１つだけ辞書を選択し、格納しておく第２の記憶部
と、複数の辞書から１つだけ辞書を選択する辞書切り変
え情報を受けて、辞書を切り変える辞書切り変え部と、
取り込んだ音声に対して、音声分析処理を行う音声分析
部と、この音声分析部による音声分析結果に対して、辞
書切り変え部により選択され第２の記憶部に格納された
辞書と、音響モデルとから、音声認識処理を行う音声認
識部とを備え、音声認識の対象となる単語や文章を限定
して、音声認識を行うように、構成する。In order to achieve the above object, a speech recognition system according to the present invention prepares an arbitrary number or a specified number of words or sentences to be subjected to speech recognition. They are defined as one dictionary, and any number of words or sentences to be recognized by another
Alternatively, prepare the specified number, and
Defined as one dictionary, prepare multiple of these dictionaries,
A first storage unit for storing a plurality of dictionaries, a second storage unit for selecting and storing only one dictionary from the plurality of dictionaries, and a dictionary for selecting only one dictionary from the plurality of dictionaries A dictionary switching unit that receives the switching information and switches the dictionary;
A voice analysis unit that performs voice analysis processing on the captured voice, a dictionary selected by the dictionary switching unit and stored in the second storage unit based on the voice analysis result by the voice analysis unit, Therefore, a speech recognition unit that performs speech recognition processing is provided, and the speech recognition is performed by limiting the words and sentences to be subjected to speech recognition.

【００１４】また、複数の辞書を格納しておく前記第１
の記憶部は、メモリカードまたはＲＯＭで構成し、複数
の辞書から１つだけ辞書を選択し格納しておく前記第２
の記憶部は、ＲＡＭで構成する。Further, the first dictionary storing a plurality of dictionaries.
Is a memory card or a ROM, and selects and stores only one dictionary from a plurality of dictionaries.
Is composed of a RAM.

【００１５】また、複数の辞書から１つだけ辞書を選択
する前記辞書切り変え情報は、カーナビゲーションシス
テムで用いられている衛星測位システムＧＰＳ（Global
Positioning system ）からの位置情報を、用いるよう
にされる。The dictionary switching information for selecting only one dictionary from a plurality of dictionaries is a satellite positioning system GPS (Global Positioning System) used in a car navigation system.
The position information from the positioning system is used.

【００１６】また、音声認識システムは、カーナビゲー
ションシステムにおける音声認識システムとされ、音声
認識の対象となる単語や文章を任意の数、あるいは、指
定された数だけ用意して、それらを１つの辞書として作
成する際に、辞書は、任意のエリア、あるいは、指定さ
れたエリアに存在する地名、交差点名、通り名、建物
名、ガソリンスタンド、コンビニエンスストア、ファミ
リーレストラン等の単語から構成し、これらの辞書をエ
リアごとに用意しておくように、構成する。The speech recognition system is a speech recognition system in a car navigation system. An arbitrary number or a designated number of words or sentences to be subjected to speech recognition are prepared, and these are prepared in one dictionary. When creating a dictionary, the dictionary is composed of words such as place names, intersection names, street names, building names, gas stations, convenience stores, family restaurants, etc. existing in an arbitrary area or a specified area. The dictionary is prepared for each area.

【００１７】また、前記目的を達成するため、本発明に
よる音声認識システムは、音声認識の対象となる単語や
文章を任意の数、あるいは、指定された数だけ用意し
て、それらを１つの辞書として定義し、さらに、別な音
声認識の対象となる単語や文章を任意の数、あるいは、
指定された数だけ用意して、それらをもう１つの辞書と
して定義し、これらの辞書を複数用意して、複数の辞書
を格納しておく第１の記憶部と、複数の辞書から１つだ
け辞書を選択し、格納しておく第２の記憶部と、複数の
辞書から１つだけ辞書を選択する辞書切り変え情報、ま
たは、認識した結果を受けて、辞書を切り変える辞書切
り変え部と、取り込んだ音声に対して、音声分析処理を
行う音声分析部と、この音声分析部による音声分析結果
に対して、辞書切り変え部により選択され第２の記憶部
に格納された辞書と、音響モデルとから、音声認識処理
を行う音声認識部とを備え、音声認識の対象となる単語
や文章を限定して、音声認識を行うように、構成する。In order to achieve the above object, a speech recognition system according to the present invention prepares an arbitrary number or a designated number of words and sentences to be subjected to speech recognition, and stores them in one dictionary. , And any number of words or sentences to be subjected to another speech recognition, or
Prepare a designated number of them, define them as another dictionary, prepare a plurality of these dictionaries, and store a plurality of dictionaries in the first storage unit, and only one of the dictionaries A second storage unit for selecting and storing dictionaries; a dictionary switching unit for switching dictionaries in response to dictionary switching information for selecting only one dictionary from a plurality of dictionaries or a recognized result; A voice analysis unit that performs a voice analysis process on the captured voice, a dictionary selected by the dictionary switching unit and stored in the second storage unit based on the voice analysis result obtained by the voice analysis unit, A speech recognition unit that performs speech recognition processing is provided from the model, and the speech recognition is performed by limiting words and sentences to be subjected to speech recognition.

【００１８】また、音声認識システムは、カーナビゲー
ションシステムにおける音声認識システムとされ、音声
認識の対象となる単語や文章を任意の数、あるいは、指
定された数だけ用意して、それらを１つの辞書として作
成する際に、辞書は、任意のエリア、あるいは、指定さ
れたエリアに存在する地名、交差点名、通り名、建物
名、ガソリンスタンド、コンビニエンスストア、ファミ
リーレストラン等の単語から構成し、これらの辞書をエ
リアごとに用意して音声認識し、音声認識結果におい
て、辞書の中に該当する単語が存在しない場合には、次
の音声認識の対象となる辞書に切り変えて、音声認識を
行うように、構成する。The voice recognition system is a voice recognition system in a car navigation system. An arbitrary number or a specified number of words or sentences to be subjected to voice recognition are prepared, and they are stored in one dictionary. When creating a dictionary, the dictionary is composed of words such as place names, intersection names, street names, building names, gas stations, convenience stores, family restaurants, etc. existing in an arbitrary area or a specified area. A dictionary is prepared for each area and speech recognition is performed. If there is no corresponding word in the dictionary in the speech recognition result, the dictionary is switched to the next dictionary for speech recognition, and speech recognition is performed. Then, configure.

【００１９】また、音声認識システムは、カーナビゲー
ションシステムにおける音声認識システムとされ、音声
認識の対象となる単語や文章を任意の数、あるいは、指
定された数だけ用意して、それらを１つの辞書として作
成する際に、辞書は、任意のエリア、あるいは、指定さ
れたエリアに存在する地名、交差点名、通り名、建物
名、ガソリンスタンド、コンビニエンスストア、ファミ
リーレストラン等の単語から構成し、これらの辞書をエ
リアごとに用意して音声認識し、音声認識結果が、辞書
のインデックスを示す場合には、インデックスが示す音
声認識の対象となる辞書に切り変えて、音声認識を行う
ように、構成する。The speech recognition system is a speech recognition system in a car navigation system. An arbitrary number or a specified number of words or sentences to be subjected to speech recognition are prepared, and these are prepared in one dictionary. When creating a dictionary, the dictionary is composed of words such as place names, intersection names, street names, building names, gas stations, convenience stores, family restaurants, etc. existing in an arbitrary area or a specified area. A dictionary is prepared for each area and speech recognition is performed. When the speech recognition result indicates a dictionary index, the dictionary is switched to a dictionary to be recognized by the speech indicated by the index, and speech recognition is performed. .

【００２０】[0020]

【発明の実施の形態】以下、本発明の実施の形態を、図
面を用いて説明する。図１は、本発明の第１実施形態に
係る音声認識システムの処理機能を示すブロックロック
である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block lock showing a processing function of the speech recognition system according to the first embodiment of the present invention.

【００２１】図１において、１０１は、音声を取り込む
ためのマイクである。１０２は、辞書切り変え情報であ
る。ここで、本発明でいう辞書とは、音声認識の対象と
なる言葉、単語（名詞、動詞等）の集合体であり、例え
ば、カーナビゲションシステムにおいては、通り名、地
名、建造物名、町名、番地、交差点名、ガソリンスタン
ド、コンビニエンスストア、ファミリーレストラン等
や、必要最小限の会話に必要な言葉の集合体である。そ
して例えば、１つの辞書は、１０００〜５０００語の単
語で構成する。この辞書を複数用意して、音声認識の対
象として、複数の辞書から１つの辞書を選択して音声認
識を行う。In FIG. 1, reference numeral 101 denotes a microphone for taking in voice. Reference numeral 102 denotes dictionary switching information. Here, the dictionary in the present invention is a set of words and words (nouns, verbs, etc.) to be subjected to speech recognition. For example, in a car navigation system, a street name, a place name, a building name, It is a collection of words necessary for a minimum necessary conversation, such as street names, street addresses, intersection names, gas stations, convenience stores, and family restaurants. And, for example, one dictionary is composed of words of 1000 to 5000 words. A plurality of the dictionaries are prepared, and one dictionary is selected from the plurality of dictionaries as a target of the voice recognition to perform the voice recognition.

【００２２】１０３は、辞書切り変え部であり、辞書切
り変え情報１０２の内容にしたがって、音声認識の対象
として、複数の辞書から１つの辞書を選択するか、また
は、切り変える。例えば、複数の辞書がメモリカードや
ＲＯＭ（Read Only Memory）に格納されていて、音声認
識するときに必要な辞書だけＲＡＭ（Random AccessMem
ory）に転送して音声認識処理を行う。Reference numeral 103 denotes a dictionary switching unit, which selects or switches one dictionary from a plurality of dictionaries as a speech recognition target according to the contents of the dictionary switching information 102. For example, a plurality of dictionaries are stored in a memory card or a ROM (Read Only Memory), and only dictionaries necessary for voice recognition are stored in a RAM (Random Access Memory).
ory) to perform voice recognition processing.

【００２３】１０４は、複数の辞書を格納しておく記憶
装置あるいは記憶領域であり、メモリカードやＲＯＭで
構成する。１０５は、音声認識の対象として、複数の辞
書から１つの辞書を選択して格納するための記憶装置あ
るいは記憶領域であり、ＲＡＭで構成する。Reference numeral 104 denotes a storage device or storage area for storing a plurality of dictionaries, which is constituted by a memory card or a ROM. Reference numeral 105 denotes a storage device or storage area for selecting and storing one dictionary from a plurality of dictionaries as a target of voice recognition, and is configured by a RAM.

【００２４】１０６は、音声分析部であり、マイク１０
１で取り込んだ音声に対して、ノイズ処理や音声分析を
行う。１０７は、音声認識部であり、音声分析部１０６
で算出された入力音声の音声分析結果に対して、逐次、
辞書１０５および音響モデル１０８から、入力音声の照
合を行い、辞書１０５の中で、一番近い単語を計算す
る。Reference numeral 106 denotes a voice analysis unit, and the microphone 10
Noise processing and voice analysis are performed on the voice captured in step 1. Reference numeral 107 denotes a voice recognition unit, and a voice analysis unit 106
The voice analysis results of the input voice calculated in
The input voice is collated from the dictionary 105 and the acoustic model 108, and the closest word in the dictionary 105 is calculated.

【００２５】１０８は、不特定話者の音声認識に対応し
た音響モデルであり、例えば隠れマルコフモデル（ＨＭ
Ｍ：Hidden Markov Model ）である。１０９は、音声認
識部１０７で計算された音声認識結果である。Reference numeral 108 denotes an acoustic model corresponding to speech recognition of an unspecified speaker, such as a hidden Markov model (HM)
M: Hidden Markov Model). Reference numeral 109 denotes a speech recognition result calculated by the speech recognition unit 107.

【００２６】なお、図１に示す各処理ブロックは、複数
のＬＳＩやメモリで構成されたシステムであっても、半
導体素子上に構成された１つないし複数のシステムオン
チップであってもよい。Each processing block shown in FIG. 1 may be a system constituted by a plurality of LSIs or memories, or one or a plurality of system-on-chips constituted on semiconductor elements.

【００２７】図２は、本実施形態における、辞書切り変
え／音声認識処理のフローチャートである。FIG. 2 is a flowchart of the dictionary switching / speech recognition processing in the present embodiment.

【００２８】ステップＳＴ２０１は、辞書切り変え情報
１０２が更新されたか否かを問う判定処理である。辞書
切り変え情報１０２は、例えば、カーナビゲーションシ
ステムであれば、衛星測位システム（ＧＰＳ：Global P
ositioning System ）からの位置を示す信号である。Step ST201 is a determination process for asking whether or not the dictionary switching information 102 has been updated. The dictionary switching information 102 is, for example, a satellite positioning system (GPS: Global P
ositioning system).

【００２９】辞書切り変え部１０３は、ＧＰＳからの位
置を示す信号を受けて、その位置が認識対象の単語辞書
を切り換える必要がある事を示している場合には（ステ
ップＳＴ２０１でＹＥＳの場合には）、ステップＳＴ２
０３で認識対象の単語の辞書に切り変える。また、その
位置が認識対象の単語辞書を切り換える必要がない事を
示している場合には（ステップＳＴ２０１でＮＯの場合
には）、辞書を変更せずに、そのままステップＳＴ２０
２で音声認識処理を実行する。The dictionary switching unit 103 receives the signal indicating the position from the GPS, and if the position indicates that the word dictionary to be recognized needs to be switched (in the case of YES in step ST201). ), Step ST2
At 03, it switches to the dictionary of the word to be recognized. If the position indicates that it is not necessary to switch the word dictionary to be recognized (NO in step ST201), the dictionary is not changed and the process proceeds to step ST20.
In step 2, a voice recognition process is executed.

【００３０】例えば、辞書を切り変える条件としては、
次のようにすればよい。すなわち、車がＸ地点からＹ地
点へ向かって走行しているとしたときには、車の現在位
置がＸ地点を含むあらかじめ定められたエリアＥ_X 内に
ある場合には、エリアＥ_X 内で用いる辞書Ｄ_X を用い、
車の現在位置がＹ地点を含むあらかじめ定められたエリ
アＥ_Y 内に入った場合には、エリアＥ_Y 内で音声認識に
使用する辞書Ｄ_Y に切り変える。For example, conditions for switching the dictionary include:
You can do as follows. That is, when the car was running toward the point X to point Y, when the current position of the vehicle is in the predetermined area E _X including X point is used in the area E _X Dictionary using the D _X,
If the current position of the vehicle has entered the predetermined area E _Y including point Y is changed over to the dictionary D _Y to be used for speech recognition in the area E _Y.

【００３１】図３は、カーナビゲーションにおける辞書
の切り変えについて説明するための図である。FIG. 3 is a diagram for explaining switching of dictionaries in car navigation.

【００３２】図３の（ａ）において、３０１は、カーナ
ビゲーションシステムを搭載した車が実際に走行してい
る道路を表示している。また、３０２は、カーナビゲー
ションシステムを搭載した車が現在走行しているポイン
ト（Ａ地点）と走行方向とを表示している。In FIG. 3A, reference numeral 301 denotes a road on which a car equipped with a car navigation system is actually traveling. Reference numeral 302 denotes a point (point A) where the vehicle equipped with the car navigation system is currently traveling and a traveling direction.

【００３３】Ａ地点において、音声認識可能な単語は、
３０４が示すエリア１の中に存在する地名、通り名、交
差点名、建造物名、ガソリンスタンド、コンビニエンス
ストア、レストラン等である。ここで、表示されている
縮尺度によって、エリアの中に存在する地名、通り名、
交差点名、建造物名、ガソリンスタンド、コンビニエン
スストア、レストラン等の数は異なる。また、表示して
いるエリアが、市街地である場合と、田舎や山間部等の
過疎地帯である場合とでも、エリアの中に存在する地
名、通り名、交差点名、建造物名、ガソリンスタンド、
コンビニエンスストア、レストラン等の数は異なる。At the point A, the words that can be voice-recognized are:
There are a place name, a street name, an intersection name, a building name, a gas station, a convenience store, a restaurant, and the like existing in the area 1 indicated by 304. Here, depending on the displayed scale, the place name, street name,
The numbers of intersection names, building names, gas stations, convenience stores, restaurants, etc. are different. Also, whether the displayed area is an urban area or a depopulated area such as a countryside or a mountainous area, the place name, street name, intersection name, building name, gas station,
The number of convenience stores, restaurants, etc. is different.

【００３４】そこで、縮尺度１／ｋのｋが大きい場合に
は、広範囲なエリアを表示していることから、単語数は
増える。例えば、音声認識において、認識率と認識応答
時間の性能を低下させない単語数が、最大３０００語と
すると、３０００語単位にエリアを分割する。ただし、
広範囲のエリアの場合には、大きな通り名や交差点名、
有名な建造物名の単語で辞書を構成する。Therefore, when k of the reduced scale 1 / k is large, a wide area is displayed, and the number of words increases. For example, in speech recognition, if the number of words that does not lower the performance of the recognition rate and the recognition response time is 3000 words at the maximum, the area is divided into 3000 word units. However,
In the case of a large area, large street names, intersection names,
Construct a dictionary with words of famous building names.

【００３５】逆に、縮尺度１／ｋのｋが小さい場合に
は、狭い範囲のエリアを表示していることから、単語数
は減少する。しかし、細かい通り名や交差点名、ローカ
ルな建造物名まで含めると、単語数は増大する。よっ
て、縮尺度１／ｋのｋが小さい場合にも、運転者は、よ
り詳細な通り名や交差点名、建造物名を知りたがること
から、辞書の単語数は、例えば最大３０００語に限定さ
れるものとする。Conversely, when k of the reduced scale 1 / k is small, a narrow area is displayed, and the number of words is reduced. However, including detailed street names, intersection names, and local building names will increase the number of words. Therefore, even when k of the reduced scale 1 / k is small, the driver wants to know more detailed street names, intersection names, and building names, so the number of words in the dictionary is, for example, up to 3000 words. Shall be limited.

【００３６】いま例えば、表示されているエリア１にお
いて、運転者が、カーナビゲーションシステムに対し
て、例えば「〇〇〇」と発声すると（ここで、〇〇〇は
ある特定のガソリン供給メーカを指すものとする）、エ
リア１内に〇〇〇系のガソリンスタンドが５ｋｍ先に存
在すれば、「５ｋｍ先にあります。」と音声合成で答え
てくれる。Now, for example, in the displayed area 1, when the driver utters, for example, “〇〇〇” to the car navigation system (here, 〇〇〇 indicates a specific gasoline supply maker). If there is a 5km gas station in Area 1 located 5km away, he will respond with voice synthesis saying "It is 5km away."

【００３７】次に、過去Ａ地点を走行していた車が、現
在はＢ地点を走行しているものとする。この場合には、
図３の（ａ）において、３０３が、カーナビゲーション
システムを搭載した車が現在走行しているポイント（Ｂ
地点）と走行方向を表示している。Ｂ地点においては、
音声認識可能な単語は、３０５が示すエリア２の中に存
在する地名、通り名、交差点名、建造物名、ガソリンス
タンド、コンビニエンスストア、レストラン等である。Next, it is assumed that a car that has been traveling at point A in the past is now traveling at point B. In this case,
In FIG. 3A, reference numeral 303 denotes a point (B) at which the vehicle equipped with the car navigation system is currently running.
Point) and the driving direction are displayed. At point B,
The words that can be voice-recognized include a place name, a street name, an intersection name, a building name, a gas station, a convenience store, a restaurant, and the like existing in the area 2 indicated by 305.

【００３８】図３の（ｂ）は、上記したエリアと辞書と
の関係を示すテーブル３０６であり、カーナビゲーショ
ンシステムが具備している。辞書１は、エリア１の中に
存在する地名、通り名、交差点名、建造物名、ガソリン
スタンド、コンビニエンスストア、レストラン等の単語
で構成されている。また、辞書２は、エリア２の中に存
在する地名、通り名、交差点名、建造物名、ガソリンス
タンド、コンビニエンスストア、レストラン等の単語で
構成されている。以下同様に、辞書ｎは、エリアｎの中
に存在する地名、通り名、交差点名、建造物名、ガソリ
ンスタンド、コンビニエンスストア、レストラン等の単
語で構成されている。FIG. 3B is a table 306 showing the relationship between the above-mentioned area and the dictionary, which is provided in the car navigation system. The dictionary 1 includes words such as a place name, a street name, an intersection name, a building name, a gas station, a convenience store, and a restaurant existing in the area 1. The dictionary 2 includes words such as a place name, a street name, an intersection name, a building name, a gas station, a convenience store, and a restaurant existing in the area 2. Similarly, the dictionary n is composed of words such as place names, street names, intersection names, building names, gas stations, convenience stores, and restaurants existing in the area n.

【００３９】図４は、本発明の第２実施形態に係る音声
認識システムの処理機能を示すブロックロックであり、
同図において、前記図１と均等なものには同一符号を付
し、その説明は重複を避けるために割愛する。図４にお
いて、４０１は、音声認識部１０７から、辞書切り変え
部１０３へ音声認識結果１０９をフィードバックするた
めの認識結果を示す情報または信号である。FIG. 4 is a block lock showing a processing function of the speech recognition system according to the second embodiment of the present invention.
In the figure, components equivalent to those in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted to avoid duplication. In FIG. 4, reference numeral 401 denotes information or a signal indicating a recognition result for feeding back the voice recognition result 109 from the voice recognition unit 107 to the dictionary switching unit 103.

【００４０】なお、本実施形態においても、図４に示す
各処理ブロックは、複数のＬＳＩやメモリで構成された
システムであっても、半導体素子上に構成された１つな
いし複数のシステムオンチップであってもよい。In this embodiment, each processing block shown in FIG. 4 may be implemented by one or more system-on-chips formed on a semiconductor device, even in a system constituted by a plurality of LSIs and memories. It may be.

【００４１】図５は、本実施形態における、辞書切り変
え／音声認識処理のフローチャートである。FIG. 5 is a flowchart of dictionary switching / speech recognition processing in the present embodiment.

【００４２】ステップＳＴ５０１は、辞書切り変え情報
１０２が更新されたか否かを問う判定処理である。辞書
切り変え情報１０２は、例えば、カーナビゲーションシ
ステムであれば、先にも述べたように、衛星測位システ
ム（ＧＰＳ：Global Positioning System ）からの位置
を示す信号である。Step ST501 is a determination process for asking whether or not the dictionary switching information 102 has been updated. For example, in the case of a car navigation system, the dictionary switching information 102 is a signal indicating a position from a satellite positioning system (GPS: Global Positioning System) as described above.

【００４３】辞書切り変え部１０３は、ＧＰＳからの位
置を示す信号を受けて、その位置が認識対象の単語辞書
を切り換える必要がある事を示している場合には（ステ
ップＳＴ５０１でＹＥＳの場合には）、ステップＳＴ５
０３で認識対象の単語の辞書に切り変える。また、その
位置が認識対象の単語辞書を切り換える必要がない事を
示している場合には（ステップＳＴ５０１でＮＯの場合
には）、辞書を変更せずに、そのままステップＳＴ５０
２で音声認識処理を実行する。Receiving the signal indicating the position from the GPS, dictionary switching unit 103 determines that the position indicates that it is necessary to switch the word dictionary to be recognized (in the case of YES in step ST501). ), Step ST5
At 03, it switches to the dictionary of the word to be recognized. If the position indicates that it is not necessary to switch the word dictionary to be recognized (NO in step ST501), the dictionary is not changed and the process proceeds to step ST50.
In step 2, a voice recognition process is executed.

【００４４】ステップＳＴ５０２に続くステップＳＴ５
０４は、音声認識結果として該当するものがあるか否か
を問う判定処理である。Step ST5 following step ST502
04 is a determination process for asking whether or not there is a corresponding voice recognition result.

【００４５】入力した音声に対して、辞書の中に該当す
る単語がない場合には（ステップＳＴ５０４でＹＥＳの
場合には）、辞書切り変え部１０３は、音声認識部１０
７から該当なしの認識結果４０１を受けて、ステップＳ
Ｔ５０５で、次の候補の認識対象の単語辞書に切り変え
る。また、入力した音声に対して、辞書の中に該当する
単語がある場合には（ステップＳＴ５０４でＮＯの場合
には）、音声認識処理を終了し、認識結果に対して、シ
ステムにおける次の処理へ移行する。If there is no corresponding word in the dictionary with respect to the input speech (in the case of YES in step ST504), dictionary switching section 103 sets speech recognition section 10
Receiving the recognition result 401 indicating that there is no corresponding information from Step 7
At T505, the word dictionary is switched to the next candidate recognition target word dictionary. If there is a corresponding word in the dictionary with respect to the input speech (NO in step ST504), the speech recognition processing ends, and the next processing in the system is performed on the recognition result. Move to.

【００４６】図６は、上述したカーナビゲーションシス
テムの音声認識システムに限ることなく、さらに、ＰＤ
Ａ（Personal Digital Assistants ）に代表されるよう
な携帯型情報機器、携帯型翻訳機等のシステムに、本発
明の音声認識システムを搭載した場合の、本発明の第３
実施形態における辞書切り変え／音声認識処理のフロー
チャートである。FIG. 6 is not limited to the voice recognition system of the car navigation system described above, and further includes a PD.
A (third embodiment) of the present invention when the speech recognition system of the present invention is mounted on a system such as a portable information device or a portable translator represented by A (Personal Digital Assistants).
5 is a flowchart of dictionary switching / speech recognition processing in the embodiment.

【００４７】ステップＳＴ６０１は、音声認識処理であ
り、コマンド、インデックス等を辞書として登録してお
き、辞書に対して音声認識させる。Step ST601 is a speech recognition process in which commands, indexes, and the like are registered as a dictionary, and the dictionary is subjected to speech recognition.

【００４８】ステップＳＴ６０２では、コマンド、イン
デックス等の認識結果に対して、例えば、「住所録」の
認識結果が示す辞書に切り変える。辞書は、各コマンド
やインデックスごとに分類された辞書が作成されてお
り、住所録辞書は、登録されている人名で構成されてい
る。In step ST602, the recognition result of the command, the index and the like is switched to, for example, a dictionary indicated by the recognition result of "address book". As the dictionary, a dictionary classified for each command or index is created, and the address book dictionary is composed of registered person names.

【００４９】そこで例えば、ステップＳＴ６０３におい
て「富士山太郎」と音声入力すると、「富士山太郎」を
音声認識処理して、富士山太郎の住所が出力される。Therefore, for example, when "Taro Fujiyama" is voice-inputted in step ST603, "Taro Fujiyama" is subjected to voice recognition processing, and the address of Taro Fujiyama is output.

【００５０】図７は、本発明による音声認識システムを
構築するためのハードウエア構成の１例を示す図であ
る。FIG. 7 is a diagram showing an example of a hardware configuration for constructing a speech recognition system according to the present invention.

【００５１】図７において、７０１は、音声を取り込む
ためのマイクであり、カーナビゲーションシステム等で
は、周囲の雑音を取り込まないために指向性をもたせた
指向性マイクである。In FIG. 7, reference numeral 701 denotes a microphone for taking in voice, and in a car navigation system or the like, a directional microphone having directivity so as not to take in ambient noise.

【００５２】７０２は、辞書を切り変えるためのデータ
または制御信号であり、カーナビゲーションシステムで
は、ＧＰＳから送られてくる位置データである。Reference numeral 702 denotes data or control signals for switching dictionaries, and in a car navigation system, position data sent from the GPS.

【００５３】７０３は、カーナビゲーションシステムや
ＰＤＡ等のメインシステムの制御と、音声認識システム
における音声認識処理とを行う、ＣＰＵやＲＩＳＣマイ
コンである。Reference numeral 703 denotes a CPU or a RISC microcomputer that controls a main system such as a car navigation system or a PDA and performs voice recognition processing in a voice recognition system.

【００５４】７０４は、マイク７０１により取り込まれ
たアナログ音声データをデジタル音声データに変換する
Ａ／Ｄ変換ＩＣである。Reference numeral 704 denotes an A / D conversion IC for converting analog audio data captured by the microphone 701 into digital audio data.

【００５５】７０５は、辞書切り変えデータ７０２を受
けて、ＣＰＵ７０３に対して、辞書切り変え情報を読み
込ませるためのインターフェースである。An interface 705 receives the dictionary switching data 702 and causes the CPU 703 to read the dictionary switching information.

【００５６】７０６は、辞書、音響モデル、プログラム
を格納しておくＲＯＭやメモリカードである。Reference numeral 706 denotes a ROM or a memory card for storing a dictionary, an acoustic model, and a program.

【００５７】７０７は、ＲＯＭ７０６に比べて、アクセ
ス時間の短いＲＡＭであり、ＲＯＭ７０６から転送され
た一部の辞書や、音響モデル、プログラムが格納され、
また、音声認識処理に必要な必要最小限のワークメモリ
である。Reference numeral 707 denotes a RAM which has a shorter access time than the ROM 706, and stores a part of the dictionary, the acoustic model, and the program transferred from the ROM 706.
Further, it is a minimum necessary work memory required for the speech recognition processing.

【００５８】７０８は、システムにおけるデータバス、
アドレスバス、制御信号バスなどのバスである。708 is a data bus in the system,
It is a bus such as an address bus and a control signal bus.

【００５９】マイク７０１から取り込まれた音声は、辞
書切り変えデータ７０２により、切り変えられた辞書に
対して音声認識される。辞書の切り変えは、ＣＰＵ７０
３が行い、ＲＯＭ７０６の全体の辞書の中から、必要に
応じて、一部の辞書をＲＡＭ７０７へ転送して、一連の
音声認識処理は、ＣＰＵ７０３とＲＡＭ７０７の間でデ
ータ処理されることにより実行される。The voice fetched from the microphone 701 is recognized by the dictionary switching data 702 for the switched dictionary. Switching of the dictionary is performed by the CPU 70.
3 and transfers some of the dictionaries from the entire dictionary of the ROM 706 to the RAM 707 as necessary, and a series of speech recognition processing is executed by data processing between the CPU 703 and the RAM 707. You.

【００６０】[0060]

【発明の効果】以上のように、本発明によれば、カーナ
ビゲーションシステムや、ＰＤＡ等の携帯型情報機器、
携帯型翻訳機などで、音声認識を用いた人にやさしいイ
ンターフェースが実現でき、特に、認識する語彙数が増
加しても、認識率や認識速度を低下させることのない、
性能の高い音声認識システムが実現できる。As described above, according to the present invention, a car navigation system, a portable information device such as a PDA,
With a portable translator, a human-friendly interface using speech recognition can be realized. Especially, even if the number of words to be recognized increases, the recognition rate and the recognition speed do not decrease.
A high-performance speech recognition system can be realized.

[Brief description of the drawings]

【図１】本発明の第１実施形態に係る音声認識システム
の処理機能を示すを示すブロックロックである。FIG. 1 is a block lock showing a processing function of a speech recognition system according to a first embodiment of the present invention.

【図２】本発明の第１実施形態における、辞書切り変え
／音声認識処理を示すフローチャート図である。FIG. 2 is a flowchart illustrating dictionary switching / speech recognition processing according to the first embodiment of the present invention.

【図３】本発明の第１実施形態における、カーナビゲー
ションシステムでの辞書の切り変えについて示す説明図
である。FIG. 3 is an explanatory diagram showing switching of dictionaries in the car navigation system according to the first embodiment of the present invention.

【図４】本発明の第２実施形態に係る音声認識システム
の処理機能を示すを示すブロックロックである。FIG. 4 is a block lock showing a processing function of a speech recognition system according to a second embodiment of the present invention.

【図５】本発明の第２実施形態における、辞書切り変え
／音声認識処理を示すフローチャート図である。FIG. 5 is a flowchart illustrating dictionary switching / speech recognition processing according to the second embodiment of the present invention.

【図６】本発明の第３実施形態における、辞書切り変え
／音声認識処理を示すフローチャート図である。FIG. 6 is a flowchart illustrating dictionary switching / speech recognition processing according to a third embodiment of the present invention.

【図７】本発明による音声認識システムを構築するため
のハードウエア構成の１例を示すブロック図である。FIG. 7 is a block diagram showing an example of a hardware configuration for constructing a speech recognition system according to the present invention.

【図８】従来の音声認識を用いた携帯型音声翻訳装置の
構成を示すブロック図である。FIG. 8 is a block diagram showing a configuration of a conventional portable speech translator using speech recognition.

[Explanation of symbols]

１０１マイク１０２辞書切り変え情報１０３辞書切り変え部１０４辞書を格納する第１のメモリ１０５辞書を格納する第２のメモリ１０６音声分析部１０７音声認識部１０８音響モデル１０９音声認識結果 Reference Signs List 101 microphone 102 dictionary switching information 103 dictionary switching unit 104 first memory 105 for storing dictionary 105 second memory 106 for storing dictionary 106 voice analysis unit 107 voice recognition unit 108 acoustic model 109 voice recognition result

フロントページの続き (72)発明者伊東功二東京都小平市上水本町五丁目20番１号株式会社日立製作所半導体事業部内 (72)発明者塔下哲司東京都小平市上水本町五丁目20番１号株式会社日立製作所半導体事業部内 (72)発明者田中誠東京都小平市上水本町五丁目22番１号株式会社日立マイコンシステム内Continuing on the front page (72) Koji Ito, Inventor 5-2-1, Kamizuhoncho, Kodaira-shi, Tokyo Inside Semiconductor Division, Hitachi, Ltd. No. 1 Hitachi Semiconductor Co., Ltd. Semiconductor Division (72) Inventor Makoto Tanaka 5-2-1, Kamisumihonmachi, Kodaira-shi, Tokyo Inside Hitachi Microcomputer System Co., Ltd.

Claims

[Claims]

1. An arbitrary number or a designated number of words or sentences to be subjected to speech recognition are prepared, defined as one dictionary, and words to be subjected to another speech recognition are defined. An arbitrary number or a designated number of words and sentences are prepared, defined as another dictionary, a plurality of these dictionaries are prepared, and a first storage unit for storing a plurality of dictionaries is prepared. And a second storage unit for selecting and storing only one dictionary from the plurality of dictionaries, and receiving dictionary switching information for selecting only one dictionary from the plurality of dictionaries; A voice analysis unit that performs voice analysis processing on the captured voice; a voice analysis result obtained by the voice analysis unit selected by the dictionary switching unit and stored in the second storage unit From a dictionary and an acoustic model Speech recognition system and a speech recognition unit for performing recognition processing by limiting the words and sentences to be speech recognition, and performs voice recognition.

2. The speech recognition system according to claim 1, wherein the first storage unit is configured by a memory card or a ROM, and the second storage unit is configured by a RAM.

3. The system according to claim 1, wherein the dictionary switching information for selecting only one dictionary from a plurality of dictionaries is a satellite positioning system GPS (Global Positioning System) used in a car navigation system.
m) a speech recognition system, characterized in that it is position information from

4. The voice recognition system according to claim 1, 2 or 3, wherein the voice recognition system is a voice recognition system in a car navigation system, wherein an arbitrary number or a specified number of words and sentences to be subjected to voice recognition are provided. When preparing them as a single dictionary, the dictionaries can be created in any area or in a specified area, such as place names, intersection names, street names, building names, gas stations, convenience stores, and families. A speech recognition system comprising words of restaurants and the like, and preparing these dictionaries for each area.

5. An arbitrary number or a designated number of words and sentences to be subjected to speech recognition are prepared, defined as one dictionary, and another word to be subjected to speech recognition. An arbitrary number or a designated number of words and sentences are prepared, defined as another dictionary, a plurality of these dictionaries are prepared, and a first storage unit for storing a plurality of dictionaries is prepared. And a second storage unit for selecting and storing only one dictionary from the plurality of dictionaries, receiving dictionary switching information for selecting only one dictionary from the plurality of dictionaries, or receiving a recognized result, A dictionary switching unit for switching a dictionary, a voice analysis unit for performing a voice analysis process on the captured voice, and a voice analysis result selected by the dictionary switching unit and stored in the second storage unit From the dictionary and the acoustic model,
A voice recognition system comprising: a voice recognition unit that performs voice recognition processing; and performing voice recognition by limiting words and sentences to be subjected to voice recognition.

6. The voice recognition system according to claim 5, wherein the voice recognition system is a voice recognition system in a car navigation system, and an arbitrary number or a specified number of words or sentences to be subjected to voice recognition are prepared. When they are created as a single dictionary, the dictionary is used for words such as place names, intersection names, street names, building names, gas stations, convenience stores, and family restaurants that exist in any area or specified area. These dictionaries are prepared for each area and speech recognition is performed. If there is no corresponding word in the dictionary in the speech recognition result, the dictionary is switched to the next dictionary for speech recognition. And a voice recognition system for performing voice recognition.

7. The voice recognition system according to claim 5, wherein the voice recognition system is a voice recognition system in a car navigation system, and an arbitrary number or a specified number of words or sentences to be subjected to voice recognition are prepared. When they are created as a single dictionary, the dictionary is used for words such as place names, intersection names, street names, building names, gas stations, convenience stores, and family restaurants that exist in any area or specified area. These dictionaries are prepared for each area and speech recognition is performed. If the speech recognition result indicates a dictionary index, the dictionary is switched to the dictionary for speech recognition indicated by the index, and speech recognition is performed. A speech recognition system characterized by performing.