JP2009175630A

JP2009175630A - Speech recognition device, mobile terminal, speech recognition system, speech recognition device control method, mobile terminal control method, control program, and computer readable recording medium with program recorded therein

Info

Publication number: JP2009175630A
Application number: JP2008016646A
Authority: JP
Inventors: Tomoji Hirose; 友二廣瀬
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2008-01-28
Filing date: 2008-01-28
Publication date: 2009-08-06

Abstract

<P>PROBLEM TO BE SOLVED: To attain a speech recognition device capable of performing highly accurate speech recognition. <P>SOLUTION: The speech recognition device 1 includes: a usage data base determination section 18 which selects a data base for outputting character information corresponding to speech featured value which is associated with positional information for indicating a position; and a speech recognition section 12 for performing speech recognition by using the data base selected by the usage data base determination section 18. Thereby, speech recognition using the data base corresponding to the present position of own device is performed, and the highly accurate speech recognition is attained. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、入力された音声を文字情報に変換する音声認識装置、携帯端末、音声認識システム、音声認識装置制御方法、携帯端末制御方法、制御プログラムおよび該プログラムを記録したコンピュータ読み取り可能な記録媒体に関するものである。 The present invention relates to a voice recognition device, a portable terminal, a voice recognition system, a voice recognition device control method, a portable terminal control method, a control program, and a computer-readable recording medium on which the program is recorded. It is about.

近年、音声で文章を入力することができる音声認識システムが実用化され、様々な分野で実用システムとして利用されている。これは、ユーザから発声された音声の音声信号をデジタルデータに変換し、予め定められたパターンと比較することによって発話内容を認識する音声認識エンジンを利用し、ユーザの音声を認識して文章として受け付けるものである。 In recent years, a speech recognition system capable of inputting a sentence by voice has been put into practical use and used as a practical system in various fields. This is because the voice signal of the voice uttered by the user is converted into digital data and compared with a predetermined pattern, using a voice recognition engine that recognizes the utterance content, and the user's voice is recognized as a sentence. Accept.

そして、この音声認識システムは、技術の進歩により認識性能が年々向上し、機能は多様化（多機能化）している。それとともに、音声認識システムで認識対象となる単語も増加する傾向にある。例えば、音声認識システムがカーナビゲーションシステムに搭載される場合、認識対象となる単語は、操作などの各機能、全国の地名や施設名などが含まれることになる。このため、認識対象となる単語は膨大な数となる。この膨大な数の単語を辞書記憶装置に格納し、辞書記憶装置の中から、ユーザからの発話（音声）を的確かつ効率的に認識することは、非常に難しい。 And this speech recognition system has improved recognition performance year by year due to technological advances, and its functions are diversified (multifunctional). At the same time, the number of words to be recognized in the speech recognition system tends to increase. For example, when a voice recognition system is installed in a car navigation system, words to be recognized include functions such as operations, place names and facility names throughout the country, and the like. For this reason, the number of words to be recognized is enormous. It is very difficult to store such an enormous number of words in a dictionary storage device and to accurately and efficiently recognize the utterance (voice) from the user from the dictionary storage device.

さらに、音声認識では、周囲の環境の雑音の影響や、ユーザの声質、音量、発声速度等の相違に起因して誤認識が発生しうる。また、方言等が発声された場合、その方言等に対応する単語が単語辞書に登録されていなければ未知語となり、入力された音声を正しく認識することができない。 Furthermore, in speech recognition, misrecognition may occur due to the influence of noise in the surrounding environment and differences in the user's voice quality, volume, speech rate, and the like. Also, when a dialect or the like is uttered, if the word corresponding to the dialect or the like is not registered in the word dictionary, it becomes an unknown word and the input speech cannot be recognized correctly.

このように、全ての音声認識システムには、認識対象となる単語を予め登録しておく辞書に登録されていない単語は絶対に正しく認識できないという制約が存在する。 As described above, all voice recognition systems have a restriction that words that are not registered in a dictionary in which words to be recognized are registered in advance cannot be recognized correctly.

そして、特許文献１には、不特定話者用認識部と特定話者用認識部とを備え、特定の話者の場合には、特定話者用認識部を用い、それ以外の場合には、不特定話者用認識部を用いるというように、上記２つの認識部を選択的に切替えることにより、上記２つの認識部にそれぞれ備えられた辞書を用いるシステムが記載されている。
特開平０３−９３９９号公報（１９９１年１月１７日公開） Patent Document 1 includes an unspecified speaker recognition unit and a specific speaker recognition unit. In the case of a specific speaker, the specific speaker recognition unit is used, and in other cases. A system is described that uses a dictionary provided in each of the two recognition units by selectively switching the two recognition units, such as using an unspecified speaker recognition unit.
Japanese Patent Laid-Open No. 03-9399 (published on January 17, 1991)

しかしながら、上記特許文献１に記載の構成では下記のような問題がある。すなわち、特許文献１に記載の構成では、特定の話者であるか否かによって、特定話者用認識部と不特定話者用認識部とを選択的に切替えるのみなので、話者に対応した辞書が用いられるのみである。 However, the configuration described in Patent Document 1 has the following problems. That is, in the configuration described in Patent Document 1, only the specific speaker recognition unit and the non-specific speaker recognition unit are selectively switched depending on whether or not the speaker is a specific speaker. Only dictionaries are used.

よって、話者が同じであれば、音声認識装置が用いられる状況がどのようなものであっても、用いられる辞書は同じである。したがって、音声認識装置が用いられる状況が変わっても、認識できる単語は変わらず、音声認識装置の精度は向上しない。 Therefore, as long as the speakers are the same, the same dictionary is used regardless of the situation in which the speech recognition apparatus is used. Therefore, even if the situation in which the voice recognition device is used changes, the recognizable word does not change, and the accuracy of the voice recognition device does not improve.

本発明は、上記の問題点に鑑みてなされたものであり、その目的は、音声認識の精度が高い音声認識装置、携帯端末、音声認識システム、音声認識装置制御方法、携帯端末制御方法、制御プログラム、および該プログラムを記録したコンピュータ読み取り可能な記録媒体を実現することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a voice recognition device, a portable terminal, a voice recognition system, a voice recognition device control method, a portable terminal control method, and a control with high voice recognition accuracy. An object is to realize a program and a computer-readable recording medium on which the program is recorded.

上記課題を解決するために、本発明に係る音声認識装置は、位置を示す位置情報に対応付けられた、音声の特徴量に対応する文字情報を出力するデータベースを選択するデータベース選択手段と、上記データベース選択手段により選択されたデータベースを用いて音声認識を行う音声認識手段とを備えていることを特徴としている。 In order to solve the above-described problem, a speech recognition apparatus according to the present invention includes a database selection unit that selects a database that outputs character information corresponding to a feature amount of speech associated with position information indicating a position, and And voice recognition means for performing voice recognition using the database selected by the database selection means.

また、本発明に係る音声認識装置の制御方法は、位置を示す位置情報に対応付けられた、音声の特徴量に対応する文字情報を出力するデータベースを選択するデータベース選択ステップと、上記データベース選択ステップで選択されたデータベースを用いて音声認識を行う音声認識ステップとを含むことを特徴としている。 The method for controlling a speech recognition apparatus according to the present invention includes a database selection step for selecting a database that outputs character information corresponding to a feature amount of speech, which is associated with position information indicating a position, and the database selection step. And a speech recognition step for performing speech recognition using the database selected in (1).

上記の構成および方法によれば、位置を示す位置情報と対応づけられたデータベースを用いて音声認識を行う。 According to said structure and method, speech recognition is performed using the database matched with the positional information which shows a position.

これにより、位置情報が示す位置に応じたデータベースを用いた音声認識を行うことができる。 Thereby, the voice recognition using the database according to the position indicated by the position information can be performed.

よって、音声認識装置に入力する音声を受け付ける装置（マイク等）が、ある場所に存在する場合に、当該場所でよく用いられると考えられる単語や音響・言語モデルを充実させたデータベースを、音声認識で用いることができる。それゆえ、より精度の高い音声認識を実現することができる。 Therefore, when a device (such as a microphone) that accepts speech input to the speech recognition device is present at a certain location, a speech database that is enriched with words and acoustic / language models that are considered to be frequently used at that location is used. Can be used. Therefore, more accurate speech recognition can be realized.

例えば、音声を関西地方で入力する場合、音声が関西弁である可能性が高いと考えられるため、関西弁の音響・言語モデルのデータベースを音声認識に用いるデータベースとして選択すれば、より精度の高い音声認識を行うことが可能となる。また、音声を駅で入力する場合、旅行関連の話題が話された音声である可能性が高いと考えられるため、旅行関連の単語が充実した辞書を音声認識に用いるデータベースとして選択すれば、より精度の高い音声認識を行うことが可能となる。 For example, when speech is input in the Kansai region, it is likely that the speech is a Kansai dialect. Therefore, if the Kansai dialect acoustic / language model database is selected as the database used for speech recognition, the accuracy will be higher. Voice recognition can be performed. Also, when inputting speech at a station, it is highly likely that travel related topics are spoken, so if you select a dictionary with travel related words as a database for speech recognition, It becomes possible to perform highly accurate speech recognition.

本発明に係る音声認識装置は、さらに、位置情報を取得する位置情報取得手段を備えているものであってもよい。 The speech recognition apparatus according to the present invention may further include position information acquisition means for acquiring position information.

上記の構成によれば、位置情報取得手段によって取得された位置情報に対応付けられたデータベースを用いて音声認識を行う。ここで、位置情報取得手段は、自装置の位置を示す位置情報を取得するものであってもよいし、携帯端末等の外部装置から通信路を介して外部装置の位置を示す位置情報を取得するものであってもよい。そして、位置情報が自装置の位置を示す場合には、音声認識を行う音声は自装置または自装置の近傍にて取得する。また、位置情報が外部装置の位置を示す場合には、音声認識を行う音声は当該外部装置または外部装置の近傍にて取得する。 According to said structure, speech recognition is performed using the database matched with the positional information acquired by the positional information acquisition means. Here, the position information acquisition unit may acquire position information indicating the position of the own device, or acquire position information indicating the position of the external device from an external device such as a portable terminal via a communication path. You may do. When the position information indicates the position of the own device, the voice for performing voice recognition is acquired at the own device or in the vicinity of the own device. Further, when the position information indicates the position of the external device, the voice for performing voice recognition is acquired in the vicinity of the external device or the external device.

これにより、上記位置情報が、音声を入力した装置の位置を示すものとなるので、音声の入力の位置に対応したデータベースを用いて音声認識を行うことができる。 As a result, the position information indicates the position of the device to which the voice is input, so that voice recognition can be performed using a database corresponding to the position of the voice input.

本発明に係る音声認識装置は、通信路よって接続された携帯端末から、該携帯端末の位置を示す位置情報および該携帯端末に入力された音声の特徴量情報を受信する音声／位置情報受信手段と、上記音声認識手段で音声認識した結果である音声認識結果情報を上記携帯端末へ送信する文字情報送信手段と、を備えているものであってもよい。 The voice recognition apparatus according to the present invention is a voice / position information receiving means for receiving position information indicating the position of the portable terminal and feature information of voice input to the portable terminal from portable terminals connected by a communication path. And character information transmission means for transmitting voice recognition result information, which is a result of voice recognition by the voice recognition means, to the portable terminal.

上記の構成によれば、通信路によって接続された携帯端末から、受信した位置を示す位置情報と対応づけられたデータベースを用いて音声認識を行う。そして、音声認識の結果が上記携帯端末へ送信される。 According to said structure, speech recognition is performed using the database matched with the positional information which shows the received position from the portable terminal connected by the communication path. Then, the result of voice recognition is transmitted to the portable terminal.

これにより、携帯端末の位置に応じたデータベースを用いた音声認識を行うことができる。 Thereby, the voice recognition using the database according to the position of the mobile terminal can be performed.

本発明に係る音声認識装置では、上記データベース選択手段は、現在使用しているデータベースと異なるデータベースを使用するデータベースとして選択するとき、表示部に、新たなデータベースを選択してよいか確認する確認画面を表示させるものであってもよい。 In the speech recognition apparatus according to the present invention, when the database selection unit selects a database that uses a database different from the database currently used, a confirmation screen for confirming whether a new database may be selected on the display unit May be displayed.

上記の構成によれば、使用するデータベースが変更される場合、確認画面が表示部表示される。これにより、ユーザは、データベースが変更される前に確認することができる。 According to the above configuration, when the database to be used is changed, the confirmation screen is displayed on the display unit. This allows the user to check before the database is changed.

上記課題を解決するために、本発明に係る携帯端末は、自装置の位置を示す位置情報を取得する位置情報取得手段と、入力された音声から音声認識に用いる特徴量を抽出する特徴量抽出手段と、上記位置情報取得手段が取得した位置情報と、上記特徴量抽出手段が抽出した特徴量を示す特徴量情報とを音声認識装置に送信する音声／位置情報送信手段と、上記送信手段によって送信した特徴量情報および位置情報に基づいて上記音声認識装置が音声認識した結果である文字情報を、上記音声認識装置から受信する文字情報受信手段と、を備えていることを特徴としている。 In order to solve the above-described problem, a mobile terminal according to the present invention includes a position information acquisition unit that acquires position information indicating the position of its own device, and a feature amount extraction that extracts a feature amount used for speech recognition from input speech. Means, position information acquired by the position information acquisition means, and feature quantity information indicating the feature quantity extracted by the feature quantity extraction means, to the speech recognition apparatus, and by the transmission means Character information receiving means for receiving, from the voice recognition device, character information that is a result of voice recognition performed by the voice recognition device based on the transmitted feature information and position information.

上記の構成によれば、自装置に入力された音声と、自装置の位置とを示す情報とを音声認識装置に送信し、自装置の位置に応じたデータベースを用いた音声認識が行われた結果を受信する。 According to the above configuration, the voice input to the own apparatus and the information indicating the position of the own apparatus are transmitted to the voice recognition apparatus, and the voice recognition is performed using the database corresponding to the position of the own apparatus. Receive the result.

これにより、自装置に音声認識手段、およびデータベースを備えていなくても、位置に応じたデータベースを用いた音声認識を行うことができる。 Thus, voice recognition using a database corresponding to a position can be performed even if the device itself does not include voice recognition means and a database.

上記音声認識サーバと上記携帯端末とから構成される音声認識システムであっても、上述した効果を奏することができる。 Even the voice recognition system including the voice recognition server and the portable terminal can achieve the effects described above.

また、上記課題を解決するために、本発明に係る携帯端末は、複数の機能を有する携帯端末において、自装置で実行中の機能または直近に実行した機能を示す実行機能情報を取得する実行機能情報取得手段と、上記実行機能情報取得手段が取得した実行機能情報に対応づけられた、入力された音声の特徴量に対応する文字情報を出力するデータベースを選択するデータベース選択手段と、上記データベース選択手段により選択されたデータベースを用いて音声認識を行う音声認識手段とを備えていることを特徴としている。 In order to solve the above-described problem, the mobile terminal according to the present invention has an execution function for acquiring execution function information indicating a function being executed by the own device or a function executed most recently in the mobile terminal having a plurality of functions. An information acquisition unit; a database selection unit that selects a database that outputs character information corresponding to the feature amount of the input speech and is associated with the execution function information acquired by the execution function information acquisition unit; And voice recognition means for performing voice recognition using the database selected by the means.

また、本発明に係る携帯端末の制御方法は、複数の機能を有する携帯端末において、自装置で実行中の機能または直近に実行した機能を示す実行機能情報を取得する実行機能情報取得ステップと、上記実行機能情報取得ステップにて取得した実行機能情報に対応づけられた、入力された音声の特徴量に対応する文字情報を出力するデータベースを選択するデータベース選択ステップと、上記データベース選択ステップにて選択されたデータベースを用いて音声認識を行う音声認識ステップとを含むことを特徴としている。 In addition, in the mobile terminal control method according to the present invention, in a mobile terminal having a plurality of functions, an execution function information acquisition step of acquiring execution function information indicating a function being executed on the own device or a function executed most recently, A database selection step for selecting a database that outputs character information corresponding to the feature amount of the input speech associated with the execution function information acquired in the execution function information acquisition step, and selected in the database selection step And a speech recognition step for performing speech recognition using the prepared database.

上記の構成および方法によれば、自装置で、現在または直近に実行していた機能を示す実行機能情報を取得し、実行機能情報と対応づけられたデータベースを用いて音声認識を行う。 According to the above configuration and method, the own apparatus acquires the execution function information indicating the function that is currently or most recently executed, and performs voice recognition using the database associated with the execution function information.

これにより、自装置で現在または直近に実行していた機能に応じたデータベースを用いた音声認識を行うことができる。 As a result, it is possible to perform speech recognition using a database corresponding to the function currently or most recently executed by the own device.

よって、例えば、自装置が、音楽プレーヤとしての機能を有し、音楽プレーヤとしての機能を実行している時、音楽関連の単語を充実させたデータベースを選択して音声認識を行うことができ、より精度の高い音声認識を実現することができる。 Thus, for example, when the device itself has a function as a music player and is executing the function as a music player, it can perform speech recognition by selecting a database enriched with music-related words, Higher accuracy speech recognition can be realized.

なお、上記音声認識装置、および携帯端末は、コンピュータによって実現してもよく、この場合には、コンピュータを上記各手段として動作させることにより上記音声認識装置、携帯端末、音声認識サーバをコンピュータにて実現させる音声認識装置、携帯端末、音声認識サーバの制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The voice recognition device and the portable terminal may be realized by a computer. In this case, the voice recognition device, the portable terminal, and the voice recognition server are operated by the computer by operating the computer as the respective means. A speech recognition device, a portable terminal, a speech recognition server control program to be realized, and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.

以上のように、本発明に係る音声認識装置は、位置を示す位置情報に対応付けられた、入力された音声の特徴量に対応する文字情報を出力するデータベースを選択するデータベース選択手段と、上記データベース選択手段により選択されたデータベースを用いて音声認識を行う音声認識手段とを備えている構成である。 As described above, the speech recognition apparatus according to the present invention includes a database selection unit that selects a database that outputs character information corresponding to a feature amount of an input speech, which is associated with position information indicating a position, And a voice recognition unit that performs voice recognition using the database selected by the database selection unit.

また、本発明に係る音声認識装置の制御方法は、位置を示す位置情報に対応付けられた、入力された音声の特徴量に対応する文字情報を出力するデータベースを選択するデータベース選択ステップと、上記データベース選択ステップで選択されたデータベースを用いて音声認識を行う音声認識ステップとを含む方法である。 Further, the control method of the speech recognition apparatus according to the present invention includes a database selection step of selecting a database that outputs character information corresponding to the feature amount of the input speech and associated with the position information indicating the position; A speech recognition step of performing speech recognition using the database selected in the database selection step.

よって、音声認識装置に入力する音声を受け付ける装置（マイク等）が、ある場所に存在する場合に、当該場所で、よく用いられると考えられる単語や音響・言語モデルを充実させたデータベースを用いて音声認識を行うことができ、より精度の高い音声認識を実現することができる。 Therefore, when a device (such as a microphone) that receives speech input to the speech recognition device is present at a certain location, a database enriched with words and acoustic / language models that are considered to be frequently used at the location is used. Voice recognition can be performed, and more accurate voice recognition can be realized.

また、本発明に係る音声認識装置は、複数の機能を有する音声認識装置において、自装置で実行中の機能、または直近に実行した機能を示す実行機能情報を取得する実行機能情報取得手段と、上記実行機能情報取得手段が取得した実行機能情報に対応づけられた、入力された音声の特徴量に対応する文字情報を出力するデータベースを選択するデータベース選択手段と、上記データベース選択手段により選択されたデータベースを用いて音声認識を行う音声認識手段とを備えている構成である。 In addition, the speech recognition apparatus according to the present invention, in a speech recognition apparatus having a plurality of functions, execution function information acquisition means for acquiring execution function information indicating a function being executed by the own apparatus or a function executed most recently, A database selection unit that selects a database that outputs character information corresponding to the feature amount of the input speech, which is associated with the execution function information acquired by the execution function information acquisition unit, and is selected by the database selection unit. And a voice recognition unit that performs voice recognition using a database.

また、本発明に係る携帯端末は、音声認識装置に音声情報を送信し、送信した音声情報が示す文字情報を該音声認識装置から受信する携帯端末において、位置を示す位置情報を取得する位置情報取得手段と、入力された音声から音声認識に用いる音声の特徴量を抽出する特徴量抽出手段と、上記位置情報取得手段が取得した位置情報と、上記特徴量抽出手段が抽出した特徴量を示す特徴量情報とを、上記音声認識装置に送信する送信手段と、上記送信手段によって送信した特徴量情報と位置情報とによって上記音声認識装置が音声認識した結果である文字情報を取得する情報取得手段と、を備えている構成である。 In addition, the mobile terminal according to the present invention transmits the speech information to the speech recognition apparatus, and the mobile terminal that receives the character information indicated by the transmitted speech information from the speech recognition apparatus acquires location information indicating the position. An acquisition unit, a feature amount extraction unit that extracts a feature amount of speech used for speech recognition from input speech, position information acquired by the position information acquisition unit, and a feature amount extracted by the feature amount extraction unit Transmitting means for transmitting feature amount information to the speech recognition device, and information acquisition means for acquiring character information as a result of speech recognition by the speech recognition device based on the feature amount information and position information transmitted by the transmitting means. It is the structure equipped with these.

これにより、自装置に音声認識手段、およびデータベースが備えられていなくても、位置に応じたデータベースを用いた音声認識を行うことができる。 Thereby, even if the own apparatus is not equipped with the voice recognition means and the database, the voice recognition using the database corresponding to the position can be performed.

〔実施の形態１〕
本発明の一実施の形態について図１から図３に基づいて説明すれば、以下のとおりである。また、以下に示す音声認識装置１は、携帯電話機に備えることも可能である。 [Embodiment 1]
An embodiment of the present invention will be described below with reference to FIGS. Further, the voice recognition device 1 shown below can be provided in a mobile phone.

図１は、本実施の形態にかかる音声認識装置１のブロック図である。図１に示すように、音声認識装置１は、音声入力部１０、特徴量算出部（特徴量算出手段）１１、音声認識部（音声認識手段）１２、表示部１３、場所情報取得部（位置情報取得手段）１４、地図情報記憶部１５、ＧＰＳ（Global Positioning System）１６、ＧＰＳアンテナ１７、使用データベース決定部（データベース選択手段）１８、対応テーブル記憶部１９、データベース部２０、および入力部２１を含む構成である。 FIG. 1 is a block diagram of a speech recognition apparatus 1 according to the present embodiment. As shown in FIG. 1, the speech recognition apparatus 1 includes a speech input unit 10, a feature amount calculation unit (feature amount calculation unit) 11, a speech recognition unit (speech recognition unit) 12, a display unit 13, a location information acquisition unit (position Information acquisition means) 14, map information storage section 15, GPS (Global Positioning System) 16, GPS antenna 17, use database determination section (database selection means) 18, correspondence table storage section 19, database section 20, and input section 21. It is the composition which includes.

音声入力部１０は、マイク等から入力された音声を受け付け、受け付けた音声を音声データに変換する。そして、音声データを特徴量算出部１１へ送信する。 The voice input unit 10 receives voice input from a microphone or the like, and converts the received voice into voice data. Then, the audio data is transmitted to the feature amount calculation unit 11.

特徴量算出部１１は、受信した音声データから、音声認識部１２で音声認識するために必要な特徴量を算出し、特徴量情報として音声認識部１２へ送信する。特徴量の例としては、ＭＦＣＣ（Mel Frequency Cepstrum Coefficient）、ＬＰＣ（Linear Prediction Coefficient）ケプストラムやパワー、それらの一次や二次の回帰係数の他、それらの値を主成分分析や判別分析により次元圧縮したものなどの多次元ベクトルが挙げられるが、本実施形態ではこれに限定されるものではない。 The feature amount calculation unit 11 calculates a feature amount necessary for voice recognition by the voice recognition unit 12 from the received voice data, and transmits the feature amount information to the voice recognition unit 12 as feature amount information. Examples of feature quantities include MFCC (Mel Frequency Cepstrum Coefficient), LPC (Linear Prediction Coefficient) cepstrum and power, their primary and secondary regression coefficients, and their values are dimensionally compressed by principal component analysis and discriminant analysis. However, the present embodiment is not limited to this.

音声認識部１２は、特徴量算出部１１から受信した特徴量情報から、使用データベース決定部１８が決定したデータベースを用いて音声認識を行い、表示データとして表示部１３に送信する。この音声認識については、従来の技術を用いる。 The voice recognition unit 12 performs voice recognition from the feature amount information received from the feature amount calculation unit 11 using the database determined by the use database determination unit 18 and transmits the display data to the display unit 13 as display data. For this voice recognition, a conventional technique is used.

表示部１３は、音声認識部１２から表示データを受信し、表示データが示す内容を表示する。表示部１３としては、文字等が表示できる表示装置であれば何でもよいが、例えば、ＬＣＤ（Liquid Crystal Display）が挙げられる。 The display unit 13 receives display data from the voice recognition unit 12 and displays the content indicated by the display data. The display unit 13 may be any display device that can display characters and the like, and examples thereof include an LCD (Liquid Crystal Display).

場所情報取得部１４は、ＧＰＳ１６から受信した緯度・経度情報（位置情報）と地図情報記憶部１５に記憶されている地図情報とから、地図情報中の、受信した緯度・経度に該当する地点（現在位置）の場所情報（位置情報）を取得する。そして、取得した場所情報を使用データベース決定部１８へ送信する。ここで、場所情報は、音響・言語モデルに対応した音声認識を行う場合は、現在位置に当たる地方（例えば、関西地方、東北地方など）を示す情報であり、施設に対応した音声認識を行う場合は、現在位置に当たる施設の名称（例えば、駅、空港レストランなど）を示す情報である。なお、場所情報は、データベースを特定できるものであれば任意に設定できる。すなわち、地図上での区割りや名称も自由に設定できる。 The location information acquisition unit 14 uses the latitude / longitude information (position information) received from the GPS 16 and the map information stored in the map information storage unit 15 in the map information corresponding to the received latitude / longitude ( Location information (position information) of current position) is acquired. Then, the acquired location information is transmitted to the usage database determination unit 18. Here, the location information is information indicating the region (for example, Kansai region, Tohoku region, etc.) corresponding to the current position when performing speech recognition corresponding to the acoustic / language model, and performing speech recognition corresponding to the facility. Is information indicating the name of a facility corresponding to the current position (for example, a station, an airport restaurant, etc.). The location information can be arbitrarily set as long as the database can be specified. That is, divisions and names on the map can be set freely.

地図情報記憶部１５は、場所情報取得部１４で使用する地図情報を記憶している。また、地図情報には、その地点がどの地方に属するか、その地点にどのような施設があるかについての情報も含まれる。 The map information storage unit 15 stores map information used by the location information acquisition unit 14. Further, the map information includes information about which region the point belongs to and what kind of facility exists at the point.

ＧＰＳ１６は、ＧＰＳアンテナ１７を用いて受信した電波から、緯度・経度情報を生成する。そして、生成した緯度・経度情報を場所情報取得部１４へ送信する。 The GPS 16 generates latitude / longitude information from radio waves received using the GPS antenna 17. Then, the generated latitude / longitude information is transmitted to the location information acquisition unit 14.

使用データベース決定部１８は、対応テーブル記憶部１９に記憶されている対応テーブルに従って、音声認識装置１で使用するデータベースをデータベース部２０の中から決定する。 The use database determination unit 18 determines a database to be used in the speech recognition apparatus 1 from the database unit 20 in accordance with the correspondence table stored in the correspondence table storage unit 19.

また、ユーザが場所等を移動し、使用するデータベースが変更される場合に、使用データベース決定部１８は、新たに使用するデータベースを決定してよいかを、表示部１３に表示させるものであってもよい。 Further, when the user moves from place to place and the database to be used is changed, the use database determination unit 18 displays on the display unit 13 whether or not a new database to be used may be determined. Also good.

対応テーブル記憶部１９は、音声認識部１２で使用するデータベースを使用データベース決定部１８で決定するときに用いる、場所情報と使用データベースとを対応させた対応テーブルを記憶している。図２に対応テーブルの例を示す。図２（ａ）は、音響・言語モデルに対応した音声認識を行う場合の対応テーブル２５を示すものであり、図２（ｂ）は、施設に対応した音声認識を行う場合の対応テーブル２６を示すものである。例えば、図２（a）に示す対応テーブル２５を用いた場合、場所情報が「関西地方」であれば、使用するデータベースは、「関西弁モデル」となる。 The correspondence table storage unit 19 stores a correspondence table in which the location information and the usage database are associated with each other, which is used when the database used by the speech recognition unit 12 is determined by the usage database determination unit 18. FIG. 2 shows an example of the correspondence table. FIG. 2A shows a correspondence table 25 when performing speech recognition corresponding to an acoustic / language model, and FIG. 2B shows a correspondence table 26 when performing speech recognition corresponding to a facility. It is shown. For example, when the correspondence table 25 shown in FIG. 2A is used and the location information is “Kansai region”, the database to be used is “Kansai dialect model”.

データベース部２０は、音声認識装置１で音声認識を行う際に、音声認識部１２で使用するデータベースを記憶している。データベースは、入力された音声の特徴量に対応する文字情報を出力するものである。データベースは、音響・言語モデル対応データベース２０１と施設対応データベース２０２とに別れている。そして、音響・言語モデル対応データベース２０１は、データベースＡＡ、データベースＡＢ、…データベースＡＺを備え、施設対応データベース２０２は、データベースＢＡ、データベースＢＢ、…データベースＢＺを備えている。なお、データベースはこれに限られるものではない。 The database unit 20 stores a database used by the speech recognition unit 12 when performing speech recognition by the speech recognition apparatus 1. The database outputs character information corresponding to the input voice feature amount. The database is divided into an acoustic / language model correspondence database 201 and a facility correspondence database 202. The acoustic / language model correspondence database 201 includes a database AA, a database AB,..., A database AZ, and the facility correspondence database 202 includes a database BA, a database BB,. The database is not limited to this.

入力部２１は、音声認識装置１のユーザインタフェースであり、各種操作キーを備え、音声認識装置１に対する様々な指示を受け付ける。また、音響・言語モデルに対応した音声認識を行うか、施設に対応した音声認識を行うかについても受け付け、受け付けた内容を使用データベース決定部１８へ送信する。なお、本実施の形態では、音響・言語モデルに対応した音声認識を行うか、施設に対応した音声認識を行うかについて記載しているが、これに限られるものではない。 The input unit 21 is a user interface of the voice recognition device 1 and includes various operation keys and accepts various instructions to the voice recognition device 1. Also, whether to perform voice recognition corresponding to the acoustic / language model or voice recognition corresponding to the facility is received, and the received content is transmitted to the use database determination unit 18. In the present embodiment, it is described whether voice recognition corresponding to an acoustic / language model or voice recognition corresponding to a facility is performed, but the present invention is not limited to this.

なお、本実施の形態では、音響・言語モデルに対応した音声認識を行うか、施設に対応した音声認識を行うかについて、いずれか一方を選択し、一方についてのみ対応するものとしたが、両者をともに実施するものであってもよい。 In the present embodiment, either one of the voice recognition corresponding to the acoustic / language model or the voice recognition corresponding to the facility is selected, and only one of them is supported. May be implemented together.

次に、図３を用いて、音声認識装置１における音声処理の流れについて説明する。図３は、音声認識処理の流れを示すフローチャートである。 Next, the flow of speech processing in the speech recognition apparatus 1 will be described using FIG. FIG. 3 is a flowchart showing the flow of the voice recognition process.

まず、音声認識装置１は入力部２１で、ユーザの操作により音声認識の開始を受け付ける。そして、音響・言語モデルに対応した音声認識を行うか、施設に対応した音声認識を行うかについて受け付ける（Ｓ３０１）。そして、ＧＰＳ１６で音声認識装置１の現在の位置を測位する（Ｓ３０１）。次に、場所情報取得部１４は、ＧＰＳ１６から受信した緯度・経度情報を用いて、地図情報記憶部１５に記憶されている地図情報から場所情報を取得する（Ｓ３０２）。そして、使用データベース決定部１８は、場所情報に対応したデータベースを、音声認識装置１で使用するデータベースに決定する（Ｓ３０４）。 First, the voice recognition device 1 receives the start of voice recognition by the user's operation at the input unit 21. Then, whether to perform voice recognition corresponding to the acoustic / language model or voice recognition corresponding to the facility is accepted (S301). Then, the GPS 16 measures the current position of the voice recognition device 1 (S301). Next, the location information acquisition unit 14 acquires location information from the map information stored in the map information storage unit 15 using the latitude / longitude information received from the GPS 16 (S302). Then, the use database determination unit 18 determines a database corresponding to the location information as a database used by the speech recognition apparatus 1 (S304).

例えば、音響・言語モデルに対応した音声認識を行う場合、使用データベース決定部１８は、図２（ａ）の対応テーブル２５を参照し、取得した場所情報が、「関西地方」を示す情報であれば、「関西弁」の音響・言語モデルのデータベースを使用するデータベースとして決定し、「東北地方」を示す情報であれば、「東北弁」の音響・言語モデルのデータベースを使用するデータベースとして決定する。 For example, when performing speech recognition corresponding to an acoustic / language model, the use database determination unit 18 refers to the correspondence table 25 in FIG. 2A, and the acquired location information is information indicating “Kansai region”. For example, it is determined as a database that uses the acoustic / language model database of “Kansai dialect”, and if it is information indicating “Tohoku region”, it is determined as a database that uses the acoustic / language model database of “Tohoku dialect”. .

また、施設に対応した音声認識を行う場合、使用データベース決定部１８は、図２（ｂの）対応テーブル２６を参照し、取得した場所情報が、「空港」であれば、旅行関連の単語が充実している「旅行用辞書」を使用するデータベースとして決定し、「レストラン」であれば、食事関連の単語が充実している「食事用辞書」を使用するデータベースとして決定する。 In addition, when performing speech recognition corresponding to the facility, the use database determination unit 18 refers to the correspondence table 26 in FIG. 2B, and if the acquired location information is “airport”, the travel-related word is If it is “restaurant”, it is determined as a database that uses “meal dictionary” that is rich in meal-related words.

そして、使用するデータベースが決定すると、音声認識装置１は音声入力を受け付け（Ｓ３０５）、音声認識を行い（Ｓ３０６）、音声認識の結果である文字情報を表示部１３に表示する（Ｓ３０７）。 When the database to be used is determined, the speech recognition apparatus 1 accepts speech input (S305), performs speech recognition (S306), and displays character information as a result of speech recognition on the display unit 13 (S307).

以上の構成により、次のような効果を奏する。例えば、音響・言語モデルに対応した音声認識を行う場合で、音声認識装置を所持しているユーザが関西地方にいる場合を考える。この場合、音声認識装置１は、取得する場所情報が「関西地方」となるので、「関西弁」の音響・言語モデルのデータベースを選択する（図２（ａ））。そして、ユーザが関西地方にいる場合、ユーザは関西弁を話す可能性が極めて高い。よって、話す可能性の極めて高い関西弁の音響・言語モデルのデータベースを選ぶことができるようになっていることで、より精度の高い音声認識を実現することができる。 With the above configuration, the following effects can be obtained. For example, let us consider a case where speech recognition corresponding to an acoustic / language model is performed and a user who possesses a speech recognition device is in the Kansai region. In this case, since the location information to be acquired is “Kansai region”, the speech recognition apparatus 1 selects the database of the acoustic / language model of “Kansai dialect” (FIG. 2A). When the user is in the Kansai region, the user is very likely to speak Kansai dialect. Therefore, it is possible to select a Kansai dialect acoustic / language model database that has a very high possibility of speaking, thereby realizing more accurate speech recognition.

また、施設に対応した音声認識を行う場合で、音声認識装置を所持しているユーザが駅にいるときを考える。この場合、音声認識装置１は、取得する場所情報が「空港・駅」となるので、「旅行用辞書」の使用を選択する（図２（ｂ））。そして、ユーザが駅にいる場合、旅行関連の話題を話す可能性が高い。よって、話す可能性の高い話題に関連する単語が充実した辞書を選ぶことができるようになっていることで、より精度の高い音声認識を実現することができる。 Also, consider a case where voice recognition corresponding to a facility is performed and a user who has a voice recognition device is at a station. In this case, since the location information to be acquired is “airport / station”, the voice recognition device 1 selects use of the “travel dictionary” (FIG. 2B). And when a user is in a station, there is a high possibility of talking about travel-related topics. Therefore, since it is possible to select a dictionary that is enriched with words related to a topic that is highly likely to be spoken, speech recognition with higher accuracy can be realized.

〔実施の形態２〕
本発明の他の実施の形態について図４から図６に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、前記の実施の形態１において示した部材と同一の機能を有する部材には、同一の符号を付し、その説明を省略する。 [Embodiment 2]
The following will describe another embodiment of the present invention with reference to FIGS. For convenience of explanation, members having the same functions as those shown in the first embodiment are given the same reference numerals, and explanation thereof is omitted.

図４は、本実施の形態に係る音声認識装置２のブロック図である。音声認識装置２において、実施の形態１と異なる点は、使用データベース決定部（データベース選択手段）４２、実行機能送信部（実行機能情報取得手段）４３、実行機能記憶部４４、対応テーブル記憶部４５、およびデータベース部４６を備えている点である。また、音声認識装置２は、音声認識に加え、複数の機能（ＧＰＳや音楽プレーヤ等）を有している。 FIG. 4 is a block diagram of the speech recognition apparatus 2 according to the present embodiment. The speech recognition apparatus 2 differs from the first embodiment in that a use database determination unit (database selection unit) 42, an execution function transmission unit (execution function information acquisition unit) 43, an execution function storage unit 44, and a correspondence table storage unit 45. And a database unit 46. The voice recognition device 2 has a plurality of functions (GPS, music player, etc.) in addition to voice recognition.

そして、本実施の形態では、実施の形態１と異なり、音声認識装置２が実行している機能、または直近に実行していた機能に対応して、音声認識に使用するデータベースを決定する。 In this embodiment, unlike the first embodiment, a database used for speech recognition is determined in accordance with the function executed by the speech recognition apparatus 2 or the function executed most recently.

以下、音声認識装置２の特徴的な構成について、詳細に説明する。 Hereinafter, a characteristic configuration of the speech recognition apparatus 2 will be described in detail.

実行機能送信部４３は、音声認識装置２において、現在、実行されている機能を判断し、その機能を示す実行機構情報を使用データベース決定部４２へ送信する。例えば、音声認識装置２が音楽プレーヤとしての機能している場合、音楽プレーヤとして機能しているという内容を使用データベース決定部４２へ送信する。また、現在実行されている機能がなければ、実行機能記憶部４４に記憶されている直近に実行した機能を示す情報（実行機能情報）を使用データベース決定部４２へ送信する。 The execution function transmission unit 43 determines a function currently being executed in the speech recognition apparatus 2 and transmits execution mechanism information indicating the function to the use database determination unit 42. For example, when the voice recognition device 2 functions as a music player, the content indicating that it functions as a music player is transmitted to the use database determination unit 42. If there is no currently executed function, information (executed function information) indicating the most recently executed function stored in the executed function storage unit 44 is transmitted to the use database determining unit 42.

実行機能記憶部４４は、音声認識装置２で実行された機能を記憶している。 The execution function storage unit 44 stores functions executed by the voice recognition device 2.

使用データベース決定部４２は、実行機能送信部４３から受信した、実行機能情報から、対応テーブル記憶部４５に記憶されている対応テーブル５１を用いて、音声認識で使用するデータベースを決定する。 The use database determination unit 42 determines a database to be used for speech recognition from the execution function information received from the execution function transmission unit 43 using the correspondence table 51 stored in the correspondence table storage unit 45.

対応テーブル記憶部４５は、音声認識部１２で使用するデータベースを使用データベース決定部４２で決定するときに用いる、場所情報と使用データベースとを対応させた対応テーブルを記憶している。図５に対応テーブルの例を示す。図５は、実行機能情報に対応した音声認識を行う場合の対応テーブル５１を示すものである。例えば、図５に示す対応テーブル５１を用いた場合、実行機能情報が「音楽プレーヤ」であれば、使用するデータベースは、「音楽用辞書」となる。 The correspondence table storage unit 45 stores a correspondence table in which the location information and the use database are associated with each other, which is used when the use database determination unit 42 determines the database used by the speech recognition unit 12. FIG. 5 shows an example of the correspondence table. FIG. 5 shows a correspondence table 51 when performing speech recognition corresponding to the execution function information. For example, when the correspondence table 51 shown in FIG. 5 is used, if the execution function information is “music player”, the database to be used is “music dictionary”.

データベース部４６は、音声認識装置２で音声認識を行う際に、音声認識部１２で使用するデータベースを記憶している。データベースは、入力された音声の特徴量に対応する文字情報を出力するものである。データベース４６は、データベースＣＡ、データベースＣＢ、…データベースＣＺを備えている。なお、データベースはこれに限られるものではない。 The database unit 46 stores a database used by the speech recognition unit 12 when performing speech recognition by the speech recognition apparatus 2. The database outputs character information corresponding to the input voice feature amount. The database 46 includes a database CA, a database CB,... A database CZ. The database is not limited to this.

次に図６を用いて、音声認識装置２における処理の流れを説明する。図６は、音声認識処理の流れを示すフローチャートである。 Next, the flow of processing in the speech recognition apparatus 2 will be described with reference to FIG. FIG. 6 is a flowchart showing the flow of the speech recognition process.

まず、音声認識装置２は、入力部２１で、ユーザの操作により音声認識を受け付ける（Ｓ６０１）。次に、実行機能送信部４３は、音声認識装置２で実行されている機能、または音声認識装置２が直近に実行していた機能を示す実行機能情報を使用データベース決定部４２へ送信する（Ｓ６０２）。そして、使用データベース決定部４２は、受信した実行機能情報から、対応テーブル５１を用いて、音声認識に使用するデータベースを決定する（Ｓ６０３）。 First, the voice recognition device 2 receives voice recognition by the user's operation at the input unit 21 (S601). Next, the execution function transmission unit 43 transmits to the use database determination unit 42 execution function information indicating a function executed by the speech recognition apparatus 2 or a function that the speech recognition apparatus 2 has executed most recently (S602). ). Then, the use database determination unit 42 determines a database to be used for speech recognition using the correspondence table 51 from the received execution function information (S603).

例えば、受信した情報が「ＧＰＳとしての機能」であれば、旅行用辞書を使用データベースとして決定する。 For example, if the received information is “function as GPS”, the travel dictionary is determined as the use database.

そして、音声認識装置２は、使用するデータベースを決定すると、音声入力を受け付け（Ｓ６０４）、音声認識を行い（Ｓ６０５）、音声認識の結果である文字情報を表示部１３に表示する（Ｓ６０６）。 When the speech recognition apparatus 2 determines the database to be used, it accepts speech input (S604), performs speech recognition (S605), and displays character information as a result of speech recognition on the display unit 13 (S606).

以上の構成により、音声認識装置２は、自装置で現在または直近に実行していた機能に応じたデータベースを用いた音声認識を行うことができる。 With the above configuration, the speech recognition device 2 can perform speech recognition using a database corresponding to the function currently or most recently executed by the device itself.

〔実施の形態３〕
本発明のさらに他の実施の形態について図７に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、前記の実施の形態１および２において示した部材と同一の機能を有する部材には、同一の符号を付し、その説明を省略する。 [Embodiment 3]
The following will describe still another embodiment of the present invention with reference to FIG. For convenience of explanation, members having the same functions as those shown in the first and second embodiments are given the same reference numerals, and explanation thereof is omitted.

図７に、本実施の形態に係る音声認識システム３のブロック図を示す。図３に示すように、本実施の形態では、携帯端末５で音声入力および表示が行われ、音声認識は音声認識サーバ４で行われる。 FIG. 7 shows a block diagram of the speech recognition system 3 according to the present embodiment. As shown in FIG. 3, in this embodiment, voice input and display are performed on the portable terminal 5, and voice recognition is performed on the voice recognition server 4.

すなわち、携帯端末５は、音声入力部１０、特徴量算出部（特徴量算出手段）１１、表示部１３、場所情報取得部（位置情報取得手段）１４、地図情報記憶部１５、ＧＰＳ（Global Positioning System）１６、ＧＰＳアンテナ１７に加えて、送信部（音声／位置情報送信手段）７１、受信部（文字情報取得手段）７２を備えて構成されている。また、音声認識サーバ４は、音声認識部（音声認識手段）１２、使用データベース決定部（データベース選択手段）１８、対応テーブル記憶部１９、データベース部２０に加えて、送受信部（文字情報送信手段、音声／位置情報受信手段）７５を備えて構成されている。 That is, the portable terminal 5 includes a voice input unit 10, a feature amount calculation unit (feature amount calculation unit) 11, a display unit 13, a location information acquisition unit (position information acquisition unit) 14, a map information storage unit 15, a GPS (Global Positioning). In addition to the (System) 16 and the GPS antenna 17, a transmission unit (voice / position information transmission unit) 71 and a reception unit (character information acquisition unit) 72 are provided. In addition to the voice recognition unit (speech recognition unit) 12, the use database determination unit (database selection unit) 18, the correspondence table storage unit 19, and the database unit 20, the voice recognition server 4 includes a transmission / reception unit (character information transmission unit, Voice / position information receiving means) 75.

そして、図７に示すように、音声認識システム３では、携帯端末５において、音声入力と特徴量の算出、および場所情報の取得が行われ、特徴量情報および場所情報が送信部７１、基地局７３、網７４を介して音声認識サーバ４に送信される。そして、音声認識サーバ４は、送受信部７５で特徴量情報および場所情報を受信し、特徴量情報は音声認識部１２へ、場所情報は使用データベース決定部１８へ送信する。 As shown in FIG. 7, in the speech recognition system 3, the mobile terminal 5 performs speech input, feature amount calculation, and location information acquisition, and the feature amount information and location information are transmitted to the transmission unit 71, base station 73, and transmitted to the voice recognition server 4 via the network 74. Then, the voice recognition server 4 receives the feature amount information and the location information by the transmission / reception unit 75, and transmits the feature amount information to the voice recognition unit 12 and the location information to the usage database determination unit 18.

場所情報を受信した使用データベース決定部１８は、上述した方法で使用データベースを決定し、音声認識部１２へ送信する。音声認識部１２は、上述した方法で音声認識を行い、結果データを、送受信部７５を介して携帯端末５の受信部７２へ送信する。結果データを受信した受信部７２は、該データを表示部１３へ送信し、表示部１３は受信した結果データが示す文字情報を表示する。 The use database determination unit 18 that has received the location information determines the use database by the method described above and transmits it to the voice recognition unit 12. The voice recognition unit 12 performs voice recognition by the method described above, and transmits the result data to the reception unit 72 of the mobile terminal 5 via the transmission / reception unit 75. The receiving unit 72 that has received the result data transmits the data to the display unit 13, and the display unit 13 displays the character information indicated by the received result data.

なお、本実施の形態では、場所情報を携帯端末５内で取得したが、ＧＰＳ１６で取得した緯度・経度情報を音声認識サーバ４へ送信し、音声認識サーバ４内で場所情報を取得するものであってもよい。 In the present embodiment, the location information is acquired in the mobile terminal 5, but the latitude / longitude information acquired by the GPS 16 is transmitted to the speech recognition server 4, and the location information is acquired in the speech recognition server 4. There may be.

本発明は上述した各実施の形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the embodiments can be obtained by appropriately combining technical means disclosed in different embodiments. The form is also included in the technical scope of the present invention.

最後に、音声認識装置１、２および音声認識システム３の各ブロック、特に音声入力部１０、特徴量算出部１１、音声認識部１２、場所情報取得部１４、使用データベース決定部１８、使用データベース決定部４２、および実行機能送信部４３は、ハードウェアロジックによって構成してもよいし、次のようにＣＰＵを用いてソフトウェアによって実現してもよい。 Finally, each block of the speech recognition apparatuses 1 and 2 and the speech recognition system 3, particularly the speech input unit 10, the feature amount calculation unit 11, the speech recognition unit 12, the location information acquisition unit 14, the use database determination unit 18, and the use database determination The unit 42 and the execution function transmission unit 43 may be configured by hardware logic, or may be realized by software using a CPU as follows.

すなわち、音声認識装置１、２および音声認識システム３は、各機能を実現する制御プログラムの命令を実行するＣＰＵ、上記プログラムを格納したＲＯＭ（read only memory）、上記プログラムを展開するＲＡＭ（random access memory）、上記プログラムおよび各種データを格納するメモリ等の記憶装置（記録媒体）などを備えている。そして、本発明の目的は、上述した機能を実現するソフトウェアである音声認識装置１、２および音声認識システム３の制御プログラムのプログラムコード（実行形式プログラム、中間コードプログラム、ソースプログラム）をコンピュータで読み取り可能に記録した記録媒体を、上記音声認識装置１、２および音声認識システム３に供給し、そのコンピュータ（またはＣＰＵやＭＰＵ（microprocessor unit））が記録媒体に記録されているプログラムコードを読み出し実行することによっても、達成可能である。 In other words, the speech recognition apparatuses 1 and 2 and the speech recognition system 3 include a CPU that executes instructions of a control program that realizes each function, a ROM (read only memory) that stores the program, and a RAM (random access) that expands the program. memory), a storage device (recording medium) such as a memory for storing the program and various data. The object of the present invention is to read the program codes (execution format program, intermediate code program, source program) of the control programs of the speech recognition apparatuses 1 and 2 and the speech recognition system 3 which are software for realizing the functions described above by a computer. The recorded recording medium is supplied to the voice recognition apparatuses 1 and 2 and the voice recognition system 3, and the computer (or CPU or MPU (microprocessor unit)) reads and executes the program code recorded on the recording medium. Can also be achieved.

上記記録媒体としては、例えば、磁気テープやカセットテープ等のテープ系、フロッピー（登録商標）ディスク／ハードディスク等の磁気ディスクやＣＤ−ＲＯＭ（compact disc read-only memory）／ＭＯ（magneto-optical）／ＭＤ（Mini Disc）／ＤＶＤ（digital video disk）／ＣＤ−Ｒ（CD Recordable）等の光ディスクを含むディスク系、ＩＣカード（メモリカードを含む）／光カード等のカード系、あるいはマスクＲＯＭ／ＥＰＲＯＭ（erasable programmable read-only memory）／ＥＥＰＲＯＭ（electrically erasable and programmable read-only memory）／フラッシュＲＯＭ等の半導体メモリ系などを用いることができる。 Examples of the recording medium include a tape system such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy (registered trademark) disk / hard disk, a CD-ROM (compact disc read-only memory) / MO (magneto-optical) / Disc systems including optical disks such as MD (Mini Disc) / DVD (digital video disk) / CD-R (CD Recordable), card systems such as IC cards (including memory cards) / optical cards, or mask ROM / EPROM ( An erasable programmable read-only memory) / EEPROM (electrically erasable and programmable read-only memory) / semiconductor memory system such as a flash ROM can be used.

また、音声認識装置１、２および音声認識システム３を通信ネットワークと接続可能に構成し、上記プログラムコードを通信ネットワークを介して供給してもよい。この通信ネットワークとしては、特に限定されず、例えば、インターネット、イントラネット、エキストラネット、ＬＡＮ（local area network）、ＩＳＤＮ（integrated services digital network）、ＶＡＮ（value-added network）、ＣＡＴＶ（community antenna television）通信網、仮想専用網（virtual private network）、電話回線網、移動体通信網、衛星通信網等が利用可能である。また、通信ネットワークを構成する伝送媒体としては、特に限定されず、例えば、ＩＥＥＥ（institute of electrical and electronic engineers）１３９４、ＵＳＢ、電力線搬送、ケーブルＴＶ回線、電話線、ＡＤＳＬ（asynchronous digital subscriber loop）回線等の有線でも、ＩｒＤＡ（infrared data association）やリモコンのような赤外線、Ｂｌｕｅｔｏｏｔｈ（登録商標）、８０２．１１無線、ＨＤＲ（high data rate）、携帯電話網、衛星回線、地上波デジタル網等の無線でも利用可能である。なお、本発明は、上記プログラムコードが電子的な伝送で具現化された、搬送波に埋め込まれたコンピュータデータ信号の形態でも実現され得る。 Further, the voice recognition devices 1 and 2 and the voice recognition system 3 may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited. For example, the Internet, intranet, extranet, LAN (local area network), ISDN (integrated services digital network), VAN (value-added network), CATV (community antenna television) communication. A network, a virtual private network, a telephone line network, a mobile communication network, a satellite communication network, etc. can be used. In addition, the transmission medium constituting the communication network is not particularly limited. For example, IEEE (institute of electrical and electronic engineers) 1394, USB, power line carrier, cable TV line, telephone line, ADSL (asynchronous digital subscriber loop) line Wireless such as IrDA (infrared data association) and remote control such as remote control, Bluetooth (registered trademark), 802.11 wireless, HDR (high data rate), mobile phone network, satellite line, terrestrial digital network, etc. But it is available. The present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.

場所や状況に対応したデータベースを用いた音声認識を行うことができるので、例えば、様々な場所において入力される音声を認識をする場合に好適である。 Since voice recognition using a database corresponding to a place or situation can be performed, for example, it is suitable for recognition of voice input at various places.

本発明の実施の形態を示すものであり、音声認識装置の要部構成を示すブロック図である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1, showing an embodiment of the present invention, is a block diagram illustrating a main configuration of a voice recognition device. 上記実施の形態における対応テーブルを示すものであり、（ａ）は、音響・言語モデルに対応した音声認識を行う場合の対応テーブルを示すものであり、（ｂ）は、施設に対応した音声認識を行う場合の対応テーブルを示すものである。The correspondence table in the said embodiment is shown, (a) shows the correspondence table in the case of performing speech recognition corresponding to an acoustic / language model, and (b) is speech recognition corresponding to a facility. The correspondence table in the case of performing is shown. 上記実施の形態における音声認識処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the speech recognition process in the said embodiment. 本発明の他の実施の形態を示すものであり、音声認識装置の要部構成を示すブロック図である。FIG. 11 is a block diagram illustrating a main configuration of a speech recognition apparatus according to another embodiment of the present invention. 上記他の実施の形態における対応テーブルを示すものである。The correspondence table in said other embodiment is shown. 上記他の実施の形態における音声認識処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the speech recognition process in the said other embodiment. 本発明のさらに他の実施の形態を示すものであり、音声認識システムの要部構成を示すブロック図である。FIG. 24 is a block diagram illustrating still another embodiment of the present invention and illustrating a configuration of a main part of a voice recognition system.

Explanation of symbols

１、２音声認識装置
３音声認識システム
４音声認識サーバ
５携帯端末
１０音声入力部
１１特徴量算出部（特徴量算出手段）
１２音声認識部（音声認識手段）
１３表示部
１４場所情報取得部（位置情報取得手段）
１５地図情報記憶部
１８、４２使用データベース決定部（データベース選択手段）
１９、４５対応テーブル記憶部
２０、４６データベース部
４３実行機能送信部（実行機能情報取得手段）
７１送信部（音声／位置情報送信手段）
７２受信部（文字情報取得手段）
７５送受信部（文字情報送信手段、音声／位置情報受信手段） DESCRIPTION OF SYMBOLS 1, 2 Voice recognition apparatus 3 Voice recognition system 4 Voice recognition server 5 Portable terminal 10 Voice input part 11 Feature-value calculation part (Feature-value calculation means)
12 Voice recognition unit (voice recognition means)
13 Display Unit 14 Location Information Acquisition Unit (Position Information Acquisition Unit)
15 Map information storage unit 18, 42 Use database determination unit (database selection means)
19, 45 Corresponding table storage unit 20, 46 Database unit 43 Execution function transmission unit (execution function information acquisition means)
71 Transmitter (voice / position information transmitting means)
72 Receiving part (character information acquisition means)
75 Transmitter / receiver (character information transmitting means, voice / position information receiving means)

Claims

Database selection means for selecting a database that outputs character information corresponding to the feature amount of the voice, associated with the position information indicating the position;
A speech recognition apparatus comprising speech recognition means for performing speech recognition using the database selected by the database selection means.

The speech recognition apparatus according to claim 1, further comprising position information acquisition means for acquiring position information.

Voice / position information receiving means for receiving position information indicating the position of the portable terminal and feature amount information of voice input to the portable terminal from portable terminals connected by a communication path;
The speech recognition apparatus according to claim 2, further comprising: character information transmission means for transmitting speech recognition result information, which is a result of speech recognition by the speech recognition means, to the portable terminal.

The database selection means displays a confirmation screen for confirming whether or not a new database can be selected on the display unit when selecting a database that uses a database different from the currently used database. Item 4. The speech recognition device according to any one of Items 1 to 3.

Position information acquisition means for acquiring position information indicating the position of the own device;
Feature amount extraction means for extracting feature amounts used for speech recognition from input speech;
Voice / position information transmission means for transmitting the position information acquired by the position information acquisition means and feature quantity information indicating the feature quantity extracted by the feature quantity extraction means to a voice recognition device;
Character information receiving means for receiving, from the voice recognition device, character information that is a result of voice recognition by the voice recognition device based on the feature amount information and position information transmitted by the transmission means. Mobile terminal.

A voice recognition system comprising the voice recognition device according to claim 4 and the portable terminal according to claim 5.

In a mobile terminal having multiple functions,
Execution function information acquisition means for acquiring execution function information indicating a function being executed on the own device or a function executed most recently;
Database selection means for selecting a database that outputs character information corresponding to the feature amount of the input speech, associated with the execution function information acquired by the execution function information acquisition means;
A portable terminal comprising voice recognition means for performing voice recognition using the database selected by the database selection means.

A control program for operating any of the speech recognition apparatus according to claim 1 and the portable terminal according to claim 5 or 7, wherein the control program causes a computer to function as each of the above-described means.

A computer-readable recording medium on which the control program according to claim 8 is recorded.

A method for controlling a speech recognition apparatus that recognizes speech using a database and outputs character information as a recognition result,
A database selection step for selecting a database that outputs character information corresponding to the feature amount of the voice, which is associated with the position information indicating the position;
And a speech recognition step for performing speech recognition using the database selected in the database selection step.

In a mobile terminal having multiple functions,
An execution function information acquisition step for acquiring execution function information indicating a function being executed on the own device or a function executed most recently;
A database selection step for selecting a database that outputs character information corresponding to the feature amount of the input speech, associated with the execution function information acquired in the execution function information acquisition step;
And a voice recognition step of performing voice recognition using the database selected in the database selection step.